# Capstone Project
## The Battle of the Neighborhoods
---

### Table of Contents
* [Introduction/Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction/Business Problem <a name="introduction"></a>

In this capstone project, I will try to predict the best neighborhood to open a fitness studio (specifically, a pilates studio) in Toronto, Canada. I will use the Toronto neighborhoods dataset from the previous lessons. I will decide which neighborhood to place the fitness studio in based on the proximity of other health-centric venues.

### Problem

We need to identify neighborhoods of Toronto that already offer other fitness/health focused venues (other types of gyms, health restaurants such as salad, smoothies, juice, vegetarian) **but** have minimal pilates studios. This will maximize value of the pilates studio while already being in an area that caters to health-focused clientele.

### Background

I'll use the skills learned throughout this data science course and during the capstone project to appropriately recommend a suitable location. Neighborhoods will be as previously defined in the last exercises

## Data <a name="data"></a>

Determining where to place the pilates studio depends on the following factors:
* the number of other gyms in the neighborhood
* the number of pilates studio in the neighborhood (aim to minimize)
* the number of health-conscious restaurants in the neighborhood (aim to maximize)

First, I will find the number of total gyms in the neighborhood and the total number of pilates gyms in the neighborhood. Then I will calculate what percentage of gyms are NOT pilates gyms, aiming to maximize the %.
Next, I will find the number of health-conscious restaurants and venues in the neighborhood, based on specific venue categories within Foursquare (smoothie shop, juice bar, salad place, sandwich place, soup place, vegetarian/vegan restaurant, gluten-free restaurant, health food store, farmers market, organic grocery)
Finally, I will assign a weight to total number of gyms, total percent not pilates gyms, and total number of health-related venues to determine the optimal neighborhood for the new pilates studio to be located.

### Importing the data

In [20]:
# These are the same steps as previously completed in the prior weeks' capstone assignments
# I'm importing the toronto neighborhood data, turning it into a dataframe, and then transforming the data to group by postal code-based neighborhoods

import pandas as pd
import numpy as np

url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
t = pd.read_html(url)
table = t[0]

In [21]:
# rename column
table = table.rename(columns={'Postcode':'PostalCode'})

In [22]:
# remove entries where borough is not assigned
n = table[table.Borough != 'Not assigned']

In [23]:
# Assigns Borough as Neighborhood name if name not assigned
n.loc[n['Neighbourhood'] == ('Not assigned'), 'Neighbourhood'] = n['Borough']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


In [24]:
# Groups by postcode and saves as new dataframe
df = n.groupby(['PostalCode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()

In [25]:
# display dataframe to show transformation
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [26]:
# add in latitude and longitude data
latlong = pd.read_csv('http://cocl.us/Geospatial_data')

In [27]:
#rename column to match table that's being merged
latlong = latlong.rename({'Postal Code' : 'PostalCode'},axis=1)

In [28]:
#merge latlong table into df
df = df.merge(latlong)

In [29]:
#display df to show lat and long have been added
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


We'll plot the neighborhoods on the map as a visual aid

In [30]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



In [31]:
# Uses Nominatim to gather the coords of Toronto

address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_neighborhoods")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Toronto are 43.653963, -79.387207.


In [32]:
# Creates map of Toronto with dataframe points

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Next, let's narrow down which types of venues are part of our examination.
(taken from https://developer.foursquare.com/docs/resources/categories)

There are a few different venue categories and venues we'll be looking at
* Shop & Service
 * Food & Drink Shop
  * Farmers Market
  * Health Food Store
  * Organic Grocery
 * Fruit & Vegetable Store
 * Massage Studio
 * Sauna/Steam Room
 * Smoothie Shop
 * Supplement Shop
* Professional & Other Places
 * Medical Center
  * Acupuncturist
  * Alternative Healer
  * Chiropractor
  * Nutritionist
* Outdoors & Recreation
 * Gym / Fitness Center
  * Boxing Gym
  * Climbing Gym
  * Cycle Studio
  * Gym Pool
  * Gymnastics Gym
  * Gym
  * Martial Arts Dojo
  * Outdoor Gym
  * **Pilates Studio**
  * Track
  * Weight Loss Center
  * Yoga Studio
* Food
 * Vegetarian/Vegan Restaurant
 * Soup Place
 * Salad Place
 * Sandwich Place
 * Juice Bar
 * Gluten-free Restaurant

## Methodology <a name="methodology"></a>

We're going to obtain two lists: 1 list of all health-conscious venues in a 1.2 mile radius, and another list of all currently existing pilates studios in the same radius. We will then exclude any neighborhoods that currently have a pilates studio and choose a location from the remaining neighborhoods.

In [38]:
# let's obtain a list of these venue categories so we're only displaying these specific venues for each neighborhood
import requests
# step 1: foursquare credentials
CLIENT_ID = 'FYM0N3BASRK35GKK3QBIENVZPL2K5MAKI5MI5OPKIQILFSBE' # your Foursquare ID
CLIENT_SECRET = 'GVND25IIPY1HWC4DASL40ZEUP0ZW2CN3ALABFTMSTYD0E55R' # your Foursquare Secret
VERSION = '20180605'

venue_cats = ['5744ccdfe4b0c0459246b4b2', '4bf58dd8d48988d1fa941735', '50aa9e744b90af0d42d5de0e', '52f2ab2ebcbc57f1066b8b45','52f2ab2ebcbc57f1066b8b1c',
              '52f2ab2ebcbc57f1066b8b3c','58daa1558bbb0b01f18ec1ae','52f2ab2ebcbc57f1066b8b41','4bf58dd8d48988d1ed941735','5744ccdfe4b0c0459246b4cd','52e81612bcbc57f1066b7a3b',
             '52e81612bcbc57f1066b7a3c','52e81612bcbc57f1066b7a3a','58daa1558bbb0b01f18ec1d0','4bf58dd8d48988d175941735','52f2ab2ebcbc57f1066b8b47',
             '503289d391d4c4b30a586d6a', '52f2ab2ebcbc57f1066b8b49','4bf58dd8d48988d105941735','52f2ab2ebcbc57f1066b8b48','4bf58dd8d48988d176941735','4bf58dd8d48988d101941735',
              '58daa1558bbb0b01f18ec203','4bf58dd8d48988d106941735','590a0744340a5803fd8508c3','4bf58dd8d48988d102941735']
pilates_cat = ['5744ccdfe4b0c0459246b4b2']

In [60]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        # url = 'https://api.foursquare.com/v2/venues/explore?&client_id=FYM0N3BASRK35GKK3QBIENVZPL2K5MAKI5MI5OPKIQILFSBE&client_secret=GVND25IIPY1HWC4DASL40ZEUP0ZW2CN3ALABFTMSTYD0E55R&v=20180605&ll=43.653963,-79.387207&intent=browse&categoryId=5744ccdfe4b0c0459246b4b2,4bf58dd8d48988d1fa941735,50aa9e744b90af0d42d5de0e,52f2ab2ebcbc57f1066b8b45,52f2ab2ebcbc57f1066b8b1c,52f2ab2ebcbc57f1066b8b3c,58daa1558bbb0b01f18ec1ae,52f2ab2ebcbc57f1066b8b41,4bf58dd8d48988d1ed941735,5744ccdfe4b0c0459246b4cd,52e81612bcbc57f1066b7a3b,52e81612bcbc57f1066b7a3c,52e81612bcbc57f1066b7a3a,58daa1558bbb0b01f18ec1d0,4bf58dd8d48988d175941735,52f2ab2ebcbc57f1066b8b47,503289d391d4c4b30a586d6a,52f2ab2ebcbc57f1066b8b49,4bf58dd8d48988d105941735,52f2ab2ebcbc57f1066b8b48,4bf58dd8d48988d176941735,4bf58dd8d48988d101941735,4bf58dd8d48988d106941735,590a0744340a5803fd8508c3,4bf58dd8d48988d102941735&radius=2000&limit=100'
        # url is currently hardcoded because of a rogue keyerror that ocassionally happens for some reason.
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&intent=browse&categoryId=5744ccdfe4b0c0459246b4b2,4bf58dd8d48988d1fa941735,50aa9e744b90af0d42d5de0e,52f2ab2ebcbc57f1066b8b45,52f2ab2ebcbc57f1066b8b1c,52f2ab2ebcbc57f1066b8b3c,58daa1558bbb0b01f18ec1ae,52f2ab2ebcbc57f1066b8b41,4bf58dd8d48988d1ed941735,5744ccdfe4b0c0459246b4cd,52e81612bcbc57f1066b7a3b,52e81612bcbc57f1066b7a3c,52e81612bcbc57f1066b7a3a,58daa1558bbb0b01f18ec1d0,4bf58dd8d48988d175941735,52f2ab2ebcbc57f1066b8b47,503289d391d4c4b30a586d6a,52f2ab2ebcbc57f1066b8b49,4bf58dd8d48988d105941735,52f2ab2ebcbc57f1066b8b48,4bf58dd8d48988d176941735,4bf58dd8d48988d101941735,4bf58dd8d48988d106941735,590a0744340a5803fd8508c3,4bf58dd8d48988d102941735&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    

    return(nearby_venues)

In [61]:

LIMIT = 100
#2000m is approx a 1.2 mile radius, which anecdotally seems like a reasonable distance to walk
radius = 2000

tor_venues = getNearbyVenues(names=df['Neighbourhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

In [62]:
#return head of new venues table to confirm appropriate venues are being returned
tor_venues.head(20)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa
1,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Simply Align Rehabilitation,43.766634,-79.192275,Chiropractor
2,Cedarbrae,43.773136,-79.239476,Olympian Martial Arts Studio,43.774686,-79.240908,Martial Arts Dojo
3,Cedarbrae,43.773136,-79.239476,Xplosion Fitness Resolutions,43.77506,-79.239952,Gym
4,Cedarbrae,43.773136,-79.239476,Y.U. Oriental,43.775122,-79.239793,Spa
5,Cedarbrae,43.773136,-79.239476,Oriental Acupuncture & Massage Clinic,43.774937,-79.240825,Acupuncturist
6,Cedarbrae,43.773136,-79.239476,Oclinic,43.774937,-79.240825,Acupuncturist
7,Cedarbrae,43.773136,-79.239476,ACTIVE CARE CHIROPRACTIC CLINIC,43.775038,-79.240834,Chiropractor
8,Cedarbrae,43.773136,-79.239476,Supreme Fitness,43.77659,-79.237579,Gym / Fitness Center
9,Scarborough Village,43.744734,-79.239476,in balance chiropractic + acupuncture clinic,43.745801,-79.24019,Chiropractor


In [129]:
tor_venues.groupby('Neighbourhood').count().sort_values(by='Venue Category', ascending=False).head(10)

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"First Canadian Place, Underground city",47,47,47,47,47,47
"Adelaide, King, Richmond",45,45,45,45,45,45
"Design Exchange, Toronto Dominion Centre",45,45,45,45,45,45
"Commerce Court, Victoria Hotel",45,45,45,45,45,45
Church and Wellesley,43,43,43,43,43,43
St. James Town,41,41,41,41,41,41
"Ryerson, Garden District",34,34,34,34,34,34
Central Bay Street,33,33,33,33,33,33
Stn A PO Boxes 25 The Esplanade,32,32,32,32,32,32
"Chinatown, Grange Park, Kensington Market",27,27,27,27,27,27


### Analysis

Here we can see the 10 neighborhoods with the most health/lifestyle related venues.

In [65]:
# one hot encoding
tor_onehot = pd.get_dummies(tor_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
tor_onehot['Neighbourhood'] = tor_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [tor_onehot.columns[-1]] + list(tor_onehot.columns[:-1])
tor_onehot = tor_onehot[fixed_columns]

tor_onehot.head(20)

Unnamed: 0,Neighbourhood,Acupuncturist,Alternative Healer,Bath House,Boxing Gym,Breakfast Spot,Bubble Tea Shop,Chiropractor,Climbing Gym,College Gym,...,Pilates Studio,Pool,Residential Building (Apartment / Condo),Restaurant,Salon / Barbershop,Smoothie Shop,Spa,Supplement Shop,Tanning Salon,Yoga Studio
0,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
1,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Cedarbrae,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Cedarbrae,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Cedarbrae,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
5,Cedarbrae,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Cedarbrae,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Cedarbrae,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Cedarbrae,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Scarborough Village,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [67]:
# groups entries so neighborhoods aren't redudant
tor_grouped = tor_onehot.groupby('Neighbourhood').mean().reset_index()
tor_grouped

Unnamed: 0,Neighbourhood,Acupuncturist,Alternative Healer,Bath House,Boxing Gym,Breakfast Spot,Bubble Tea Shop,Chiropractor,Climbing Gym,College Gym,...,Pilates Studio,Pool,Residential Building (Apartment / Condo),Restaurant,Salon / Barbershop,Smoothie Shop,Spa,Supplement Shop,Tanning Salon,Yoga Studio
0,"Adelaide, King, Richmond",0.000000,0.0000,0.000000,0.0,0.022222,0.000000,0.022222,0.000,0.0,...,0.000000,0.0,0.022222,0.000000,0.022222,0.022222,0.155556,0.044444,0.000000,0.022222
1,Agincourt,0.000000,0.0000,0.000000,0.0,0.000000,0.000000,0.000000,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.000000,0.0000,0.000000,0.0,0.000000,0.000000,0.000000,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.000000,0.0000,0.000000,0.0,0.000000,0.000000,0.000000,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,"Alderwood, Long Branch",0.166667,0.0000,0.000000,0.0,0.000000,0.000000,0.166667,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.166667,0.000000,0.000000,0.000000
5,"Bathurst Manor, Downsview North, Wilson Heights",0.000000,0.0000,0.000000,0.0,0.000000,0.000000,0.000000,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.833333,0.000000,0.000000,0.166667
6,Bayview Village,0.500000,0.0000,0.000000,0.0,0.000000,0.000000,0.000000,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.500000,0.000000,0.000000,0.000000
7,"Bedford Park, Lawrence Manor East",0.000000,0.0000,0.000000,0.0,0.000000,0.000000,0.000000,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.571429,0.000000,0.000000,0.000000
8,Berczy Park,0.000000,0.0000,0.000000,0.0,0.000000,0.000000,0.000000,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.071429,0.000000,0.000000,0.071429
9,"Bloordale Gardens, Eringate, Markland Wood, Ol...",0.000000,0.0000,0.000000,0.0,0.000000,0.000000,0.000000,0.000,0.0,...,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [69]:
# shows top 5 venues in each neighborhood
num_top_venues = 5

for hood in tor_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = tor_grouped[tor_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue  freq
0              Gym  0.40
1              Spa  0.16
2   Massage Studio  0.04
3  Doctor's Office  0.04
4  Supplement Shop  0.04


----Agincourt----
               venue  freq
0  Martial Arts Dojo  0.67
1                Gym  0.33
2      Acupuncturist  0.00
3       Home Service  0.00
4         Hotel Pool  0.00


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
           venue  freq
0            Gym   1.0
1  Acupuncturist   0.0
2   Home Service   0.0
3     Hotel Pool   0.0
4      Juice Bar   0.0


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
           venue  freq
0  Grocery Store   1.0
1  Acupuncturist   0.0
2   Home Service   0.0
3     Hotel Pool   0.0
4      Juice Bar   0.0


----Alderwood, Long Branch----
                     venue  freq
0            Acupuncturist  0.17
1                      Spa  0.17
2  Health & Beauty Service  0.17
3      

In [70]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [72]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = tor_grouped['Neighbourhood']

for ind in np.arange(tor_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tor_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Gym,Spa,Supplement Shop,Doctor's Office,Gym / Fitness Center,Health & Beauty Service,Massage Studio,Yoga Studio,Medical Center,Breakfast Spot
1,Agincourt,Martial Arts Dojo,Gym,Yoga Studio,Health & Beauty Service,Gym / Fitness Center,Grocery Store,Fruit & Vegetable Store,Food Court,Farmers Market,Doctor's Office
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Gym,Yoga Studio,Tanning Salon,Gym Pool,Gym / Fitness Center,Grocery Store,Fruit & Vegetable Store,Food Court,Farmers Market,Doctor's Office
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Yoga Studio,Tanning Salon,Gym Pool,Gym / Fitness Center,Gym,Fruit & Vegetable Store,Food Court,Farmers Market,Doctor's Office
4,"Alderwood, Long Branch",Acupuncturist,Gym / Fitness Center,Spa,Gym,Chiropractor,Health & Beauty Service,Cycle Studio,Grocery Store,Fruit & Vegetable Store,Food Court
5,"Bathurst Manor, Downsview North, Wilson Heights",Spa,Yoga Studio,Cosmetics Shop,Gym / Fitness Center,Gym,Grocery Store,Fruit & Vegetable Store,Food Court,Farmers Market,Doctor's Office
6,Bayview Village,Acupuncturist,Spa,Cosmetics Shop,Gym / Fitness Center,Gym,Grocery Store,Fruit & Vegetable Store,Food Court,Farmers Market,Doctor's Office
7,"Bedford Park, Lawrence Manor East",Spa,Massage Studio,Grocery Store,Yoga Studio,Cosmetics Shop,Gym / Fitness Center,Gym,Fruit & Vegetable Store,Food Court,Farmers Market
8,Berczy Park,Farmers Market,Gym / Fitness Center,Gym,Yoga Studio,Organic Grocery,Gym Pool,Spa,Breakfast Spot,Doctor's Office,Alternative Healer
9,"Bloordale Gardens, Eringate, Markland Wood, Ol...",Massage Studio,Yoga Studio,Health & Beauty Service,Gym / Fitness Center,Gym,Grocery Store,Fruit & Vegetable Store,Food Court,Farmers Market,Doctor's Office


Now, repeat the process to find which neighborhoods already have a pilates studio

In [73]:
def getPilatesVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        # url = 'https://api.foursquare.com/v2/venues/explore?&client_id=FYM0N3BASRK35GKK3QBIENVZPL2K5MAKI5MI5OPKIQILFSBE&client_secret=GVND25IIPY1HWC4DASL40ZEUP0ZW2CN3ALABFTMSTYD0E55R&v=20180605&ll=43.653963,-79.387207&intent=browse&categoryId=5744ccdfe4b0c0459246b4b2,4bf58dd8d48988d1fa941735,50aa9e744b90af0d42d5de0e,52f2ab2ebcbc57f1066b8b45,52f2ab2ebcbc57f1066b8b1c,52f2ab2ebcbc57f1066b8b3c,58daa1558bbb0b01f18ec1ae,52f2ab2ebcbc57f1066b8b41,4bf58dd8d48988d1ed941735,5744ccdfe4b0c0459246b4cd,52e81612bcbc57f1066b7a3b,52e81612bcbc57f1066b7a3c,52e81612bcbc57f1066b7a3a,58daa1558bbb0b01f18ec1d0,4bf58dd8d48988d175941735,52f2ab2ebcbc57f1066b8b47,503289d391d4c4b30a586d6a,52f2ab2ebcbc57f1066b8b49,4bf58dd8d48988d105941735,52f2ab2ebcbc57f1066b8b48,4bf58dd8d48988d176941735,4bf58dd8d48988d101941735,4bf58dd8d48988d106941735,590a0744340a5803fd8508c3,4bf58dd8d48988d102941735&radius=2000&limit=100'
        # url is currently hardcoded because of a rogue keyerror that ocassionally happens for some reason.
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&intent=browse&categoryId=5744ccdfe4b0c0459246b4b2&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    

    return(nearby_venues)

In [76]:
LIMIT = 100
#2000m is approx a 1.2 mile radius, which anecdotally seems like a reasonable distance to walk
radius = 2000

tor_pilates = getPilatesVenues(names=df['Neighbourhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

In [77]:
#return head of new venues table to confirm appropriate venues are being returned
tor_pilates.head(20)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"The Danforth West, Riverdale",43.679557,-79.352188,Riverdale Pilates,43.677929,-79.350365,Pilates Studio
1,"The Danforth West, Riverdale",43.679557,-79.352188,Pilates Process,43.677828,-79.350255,Pilates Studio
2,North Toronto West,43.715383,-79.405678,Essence Pilates,43.714151,-79.399942,Pilates Studio
3,Church and Wellesley,43.66586,-79.38316,Studio Pilates,43.665388,-79.380991,Pilates Studio
4,St. James Town,43.651494,-79.375418,Go Pilates,43.649002,-79.370656,Pilates Studio
5,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,Shas Yoga & Pilates,43.67552,-79.403171,Pilates Studio
6,Stn A PO Boxes 25 The Esplanade,43.646435,-79.374846,Go Pilates,43.649002,-79.370656,Pilates Studio


It's clear from these results that, at 6 total studios, there may be a need for pilates studios in many of the neighborhoods.

In [78]:
tor_pilates.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Church and Wellesley,1,1,1,1,1,1
North Toronto West,1,1,1,1,1,1
St. James Town,1,1,1,1,1,1
Stn A PO Boxes 25 The Esplanade,1,1,1,1,1,1
"The Annex, North Midtown, Yorkville",1,1,1,1,1,1
"The Danforth West, Riverdale",2,2,2,2,2,2


In [139]:
pilates = tor_pilates.groupby('Neighbourhood').count()

In [179]:
pilates = pilates.reset_index()

In [180]:
pilates.rename({'Venue Category':'Num Pilates Studios'}, axis=1, inplace=True)

In [181]:
pilates

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Num Pilates Studios
0,Church and Wellesley,1,1,1,1,1,1
1,North Toronto West,1,1,1,1,1,1
2,St. James Town,1,1,1,1,1,1
3,Stn A PO Boxes 25 The Esplanade,1,1,1,1,1,1
4,"The Annex, North Midtown, Yorkville",1,1,1,1,1,1
5,"The Danforth West, Riverdale",2,2,2,2,2,2


By performing a count on the pilates studios in the general Toronto area, it's also clear that it may be beneficial to avoid The Danford West/Riverdale neighborhood. It may be useful but not as necessary to avoid the other 5 neighborhoods that already have a studio

In [183]:
#as a reminder, here's the list of the top 10 neighborhoods
top10 = tor_venues.groupby('Neighbourhood').count().sort_values(by='Venue Category', ascending=False).head(10)
top10 = top10.reset_index()
top10

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"First Canadian Place, Underground city",47,47,47,47,47,47
1,"Adelaide, King, Richmond",45,45,45,45,45,45
2,"Design Exchange, Toronto Dominion Centre",45,45,45,45,45,45
3,"Commerce Court, Victoria Hotel",45,45,45,45,45,45
4,Church and Wellesley,43,43,43,43,43,43
5,St. James Town,41,41,41,41,41,41
6,"Ryerson, Garden District",34,34,34,34,34,34
7,Central Bay Street,33,33,33,33,33,33
8,Stn A PO Boxes 25 The Esplanade,32,32,32,32,32,32
9,"Chinatown, Grange Park, Kensington Market",27,27,27,27,27,27


In [186]:
prospects = pd.merge(top10,pilates,on='Neighbourhood',how='left')
prospects

Unnamed: 0,Neighbourhood,Neighbourhood Latitude_x,Neighbourhood Longitude_x,Venue_x,Venue Latitude_x,Venue Longitude_x,Venue Category,Neighbourhood Latitude_y,Neighbourhood Longitude_y,Venue_y,Venue Latitude_y,Venue Longitude_y,Num Pilates Studios
0,"First Canadian Place, Underground city",47,47,47,47,47,47,,,,,,
1,"Adelaide, King, Richmond",45,45,45,45,45,45,,,,,,
2,"Design Exchange, Toronto Dominion Centre",45,45,45,45,45,45,,,,,,
3,"Commerce Court, Victoria Hotel",45,45,45,45,45,45,,,,,,
4,Church and Wellesley,43,43,43,43,43,43,1.0,1.0,1.0,1.0,1.0,1.0
5,St. James Town,41,41,41,41,41,41,1.0,1.0,1.0,1.0,1.0,1.0
6,"Ryerson, Garden District",34,34,34,34,34,34,,,,,,
7,Central Bay Street,33,33,33,33,33,33,,,,,,
8,Stn A PO Boxes 25 The Esplanade,32,32,32,32,32,32,1.0,1.0,1.0,1.0,1.0,1.0
9,"Chinatown, Grange Park, Kensington Market",27,27,27,27,27,27,,,,,,


In [187]:
prospectsfinal = prospects.drop(['Neighbourhood Latitude_x', 'Neighbourhood Longitude_x','Venue Latitude_x','Venue Longitude_x','Neighbourhood Latitude_y','Venue_y','Venue Latitude_y','Venue Longitude_y','Neighbourhood Longitude_y','Venue_x'],axis=1)

## Results <a name="Results"></a>

In [189]:
prospectsfinal

Unnamed: 0,Neighbourhood,Venue Category,Num Pilates Studios
0,"First Canadian Place, Underground city",47,
1,"Adelaide, King, Richmond",45,
2,"Design Exchange, Toronto Dominion Centre",45,
3,"Commerce Court, Victoria Hotel",45,
4,Church and Wellesley,43,1.0
5,St. James Town,41,1.0
6,"Ryerson, Garden District",34,
7,Central Bay Street,33,
8,Stn A PO Boxes 25 The Esplanade,32,1.0
9,"Chinatown, Grange Park, Kensington Market",27,


It's clear now from the merged table that 7 of the top 10 prospective locations do not currently have a pilates studio (rows where num pilates studio equals NaN) and may be in need of one.

By this analysis, choosing to open a pilates studio in any of the following neighborhoods would likely be a profitable experience, as they currently do not have one and they also currently have 40 or more health-conscious establishments within approximately 1.2 miles.
* First Canadian Place, Underground City
* Adelaide, King, Richmond
* Design Exchange, Toronto Dominion Centre
* Commerce Court, Victoria Hotel

In [152]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [220]:
tor_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa
1,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Simply Align Rehabilitation,43.766634,-79.192275,Chiropractor
2,Cedarbrae,43.773136,-79.239476,Olympian Martial Arts Studio,43.774686,-79.240908,Martial Arts Dojo
3,Cedarbrae,43.773136,-79.239476,Xplosion Fitness Resolutions,43.77506,-79.239952,Gym
4,Cedarbrae,43.773136,-79.239476,Y.U. Oriental,43.775122,-79.239793,Spa


In [232]:
#from the venues table, take a subset of venues based on the top prospects list
ven = tor_venues.loc[tor_venues['Neighbourhood'].isin(['First Canadian Place, Underground city',
                                                'Adelaide, King, Richmond',
                                                'Design Exchange, Toronto Dominion Centre',
                                                'Commerce Court, Victoria Hotel',
                                                'Ryerson, Garden District',
                                                'Central Bay Street',
                                                'Chinatown, Grange Park, Kensington Market'])]

In [233]:
ven.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
195,"Ryerson, Garden District",43.657162,-79.378937,Ryerson Athletics Centre,43.658434,-79.379296,College Rec Center
196,"Ryerson, Garden District",43.657162,-79.378937,Elmwood Spa,43.657759,-79.382586,Spa
197,"Ryerson, Garden District",43.657162,-79.378937,Solei Tanning Salon,43.654734,-79.380248,Tanning Salon
198,"Ryerson, Garden District",43.657162,-79.378937,Booster Juice,43.656318,-79.382765,Juice Bar
199,"Ryerson, Garden District",43.657162,-79.378937,Hard Candy Fitness,43.659556,-79.38244,Gym / Fitness Center


In [235]:
# repeat one-hot encoding step
ven.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",45,45,45,45,45,45
Central Bay Street,33,33,33,33,33,33
"Chinatown, Grange Park, Kensington Market",27,27,27,27,27,27
"Commerce Court, Victoria Hotel",45,45,45,45,45,45
"Design Exchange, Toronto Dominion Centre",45,45,45,45,45,45
"First Canadian Place, Underground city",47,47,47,47,47,47
"Ryerson, Garden District",34,34,34,34,34,34


In [237]:
# one hot encoding
ven_onehot = pd.get_dummies(ven[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ven_onehot['Neighbourhood'] = ven['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [ven_onehot.columns[-1]] + list(ven_onehot.columns[:-1])
ven_onehot = ven_onehot[fixed_columns]

ven_onehot.head()

Unnamed: 0,Neighbourhood,Breakfast Spot,Bubble Tea Shop,Chiropractor,College Rec Center,Doctor's Office,Farmers Market,Food Court,Gym,Gym / Fitness Center,...,Massage Studio,Medical Center,Organic Grocery,Residential Building (Apartment / Condo),Salon / Barbershop,Smoothie Shop,Spa,Supplement Shop,Tanning Salon,Yoga Studio
195,"Ryerson, Garden District",0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
196,"Ryerson, Garden District",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
197,"Ryerson, Garden District",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
198,"Ryerson, Garden District",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
199,"Ryerson, Garden District",0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [238]:
ven_grouped = ven_onehot.groupby('Neighbourhood').mean().reset_index()
ven_grouped

Unnamed: 0,Neighbourhood,Breakfast Spot,Bubble Tea Shop,Chiropractor,College Rec Center,Doctor's Office,Farmers Market,Food Court,Gym,Gym / Fitness Center,...,Massage Studio,Medical Center,Organic Grocery,Residential Building (Apartment / Condo),Salon / Barbershop,Smoothie Shop,Spa,Supplement Shop,Tanning Salon,Yoga Studio
0,"Adelaide, King, Richmond",0.022222,0.0,0.022222,0.0,0.044444,0.022222,0.0,0.4,0.044444,...,0.044444,0.022222,0.0,0.022222,0.022222,0.022222,0.155556,0.044444,0.0,0.022222
1,Central Bay Street,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.151515,0.30303,...,0.0,0.030303,0.0,0.0,0.0,0.090909,0.151515,0.0,0.0,0.090909
2,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.148148,0.037037,0.0,0.0,...,0.074074,0.0,0.037037,0.0,0.0,0.074074,0.444444,0.0,0.0,0.074074
3,"Commerce Court, Victoria Hotel",0.022222,0.0,0.022222,0.0,0.044444,0.0,0.0,0.333333,0.111111,...,0.022222,0.022222,0.0,0.0,0.022222,0.066667,0.177778,0.022222,0.0,0.044444
4,"Design Exchange, Toronto Dominion Centre",0.022222,0.0,0.022222,0.0,0.044444,0.022222,0.0,0.288889,0.088889,...,0.044444,0.022222,0.0,0.0,0.0,0.111111,0.222222,0.022222,0.0,0.0
5,"First Canadian Place, Underground city",0.021277,0.0,0.021277,0.0,0.042553,0.0,0.0,0.361702,0.042553,...,0.06383,0.021277,0.0,0.021277,0.0,0.085106,0.212766,0.021277,0.0,0.0
6,"Ryerson, Garden District",0.0,0.0,0.0,0.029412,0.029412,0.029412,0.0,0.176471,0.147059,...,0.029412,0.029412,0.0,0.0,0.0,0.0,0.205882,0.0,0.029412,0.088235


In [239]:
num_top_venues = 5

for hood in ven_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = ven_grouped[ven_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue  freq
0              Gym  0.40
1              Spa  0.16
2   Massage Studio  0.04
3  Supplement Shop  0.04
4  Doctor's Office  0.04


----Central Bay Street----
                  venue  freq
0  Gym / Fitness Center  0.30
1                   Spa  0.15
2                   Gym  0.15
3         Smoothie Shop  0.09
4           Yoga Studio  0.09


----Chinatown, Grange Park, Kensington Market----
            venue  freq
0             Spa  0.44
1  Farmers Market  0.15
2     Yoga Studio  0.07
3   Smoothie Shop  0.07
4  Massage Studio  0.07


----Commerce Court, Victoria Hotel----
                  venue  freq
0                   Gym  0.33
1                   Spa  0.18
2  Gym / Fitness Center  0.11
3     Health Food Store  0.07
4         Smoothie Shop  0.07


----Design Exchange, Toronto Dominion Centre----
                  venue  freq
0                   Gym  0.29
1                   Spa  0.22
2         Smoothie Shop  0.11
3  Gym / Fitness Cen

In [241]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
ven_sorted = pd.DataFrame(columns=columns)
ven_sorted['Neighbourhood'] = ven_grouped['Neighbourhood']

for ind in np.arange(ven_grouped.shape[0]):
    ven_sorted.iloc[ind, 1:] = return_most_common_venues(ven_grouped.iloc[ind, :], num_top_venues)

ven_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Gym,Spa,Supplement Shop,Doctor's Office,Gym / Fitness Center,Massage Studio,Health & Beauty Service,Yoga Studio,Health Food Store,Chiropractor
1,Central Bay Street,Gym / Fitness Center,Spa,Gym,Yoga Studio,Smoothie Shop,Gym Pool,Bubble Tea Shop,Medical Center,Farmers Market,Martial Arts Dojo
2,"Chinatown, Grange Park, Kensington Market",Spa,Farmers Market,Yoga Studio,Smoothie Shop,Massage Studio,Gym Pool,Organic Grocery,Martial Arts Dojo,Food Court,Herbs & Spices Store
3,"Commerce Court, Victoria Hotel",Gym,Spa,Gym / Fitness Center,Health Food Store,Smoothie Shop,Yoga Studio,Doctor's Office,Chiropractor,Health & Beauty Service,Breakfast Spot
4,"Design Exchange, Toronto Dominion Centre",Gym,Spa,Smoothie Shop,Gym / Fitness Center,Health Food Store,Massage Studio,Doctor's Office,Breakfast Spot,Farmers Market,Medical Center


In [246]:
# set number of clusters
kclusters = 5

ven_c = ven_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ven_c)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 4, 1, 0, 0, 0, 2], dtype=int32)

In [250]:
# add clustering labels
ven_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

ven_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
ven_merged = ven_merged.join(ven_sorted.set_index('Neighbourhood'), on='Neighbourhood', how = 'right')

ven_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,2,Spa,Gym,Gym / Fitness Center,Yoga Studio,Juice Bar,Hotel Pool,College Rec Center,Doctor's Office,Farmers Market,Gym Pool
57,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,4,Gym / Fitness Center,Spa,Gym,Yoga Studio,Smoothie Shop,Gym Pool,Bubble Tea Shop,Medical Center,Farmers Market,Martial Arts Dojo
58,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.650571,-79.384568,3,Gym,Spa,Supplement Shop,Doctor's Office,Gym / Fitness Center,Massage Studio,Health & Beauty Service,Yoga Studio,Health Food Store,Chiropractor
60,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.647177,-79.381576,0,Gym,Spa,Smoothie Shop,Gym / Fitness Center,Health Food Store,Massage Studio,Doctor's Office,Breakfast Spot,Farmers Market,Medical Center
61,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,0,Gym,Spa,Gym / Fitness Center,Health Food Store,Smoothie Shop,Yoga Studio,Doctor's Office,Chiropractor,Health & Beauty Service,Breakfast Spot


In [252]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=15)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ven_merged['Latitude'], ven_merged['Longitude'], ven_merged['Neighbourhood'], ven_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

This map shows that 3 of the top 4 prospective areas are grouped very close to each other, which further proves that the top picks could be a very promising area to bring a new pilates studio.

## Discussion <a name="discussion"></a>

This was a good first pass at determining a location to place an up-and-coming pilates studio based on existing venues that may draw the correct crowd. I think good next steps include selecting the top 5 prospects and taking a deeper dive into the ratios of the venues already exist there to see if any particular neighborhood may be more adept to welcoming a pilates studio - for example, the top neighborhood may simply have the highest number of produce stores, but not many gyms, which may indicate a pilates studio may not do as well in that ares. It may also be of interest to adjust the radiuses - without being deeply familiar with the Toronto metropolitan area, it's difficult to estimate what a realistic radius is, but I assume a smaller radius would be worthwhile to investigate.

## Conclusion <a name="conclusion"></a>

The purpose of this notebook was to help a potential business owner decide the most profitable and successful location for their up and coming pilates studio, based on prior existence of health-conscious type venues and competing pilates studios. I used the foursquare API to filter results by venue types that were health-conscious, and then sorted the data by the number of venues in a 1.2 miles radius of the center of a given neighborhood.