# Segmenting and Clustering Neighborhoods in Toronto

First import the needed python packages

In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import folium 
import requests
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors


Scrape Toronto neighborhoods from the Wikipedia. First download the page using the `requests` package. Then use the `BeautifulSoup` web scraping package to get a list of Toronto neighborhoods.

In [2]:
wikipedia_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
soup = BeautifulSoup(requests.get(wikipedia_url).text, 'html.parser')

# find first table on the html
postal_codes = soup.find_all('table')[0]

# iterate over rows in the table body
table_body = postal_codes.find('tbody')

data = []
rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    df_row = tuple([elem.text.strip() for elem in cols])
    
    if cols:
        data.append(df_row)


Use a list of tuples `(postal_code, borough, neighborhood)` to create a Pandas dataframe. Clean up the dataframe by removing all the rows in which borough is not assigned. After that replace not assigned neighborhood with the borough name. 

In [3]:
# create Pandas dataframe
df = pd.DataFrame(data, columns=["PostalCode", "Borough", "Neighborhood"])

# drop if borough is not assigned 
df = df[df["Borough"] != "Not assigned"]

# if neighborhood is not assigned us the borough name
df["Neighborhood"] = np.where(df["Neighborhood"] == "Not assigned", df["Borough"], df["Neighborhood"])

Finally combine neighborhoods with the same postal code.

In [4]:
# combine postal codes
combined = df.groupby(["PostalCode"])["Neighborhood"].agg(lambda n: ','.join(n))
df = df.drop(["Neighborhood"], axis=1)
df.drop_duplicates(subset="PostalCode", inplace=True)

toronto_df = df.join(combined, on="PostalCode")

# print out shape
toronto_df.shape

(103, 3)

In [5]:
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Harbourfront,Regent Park"
6,M6A,North York,"Lawrence Heights,Lawrence Manor"
8,M7A,Queen's Park,Queen's Park


Then use the provided CSV file to append the coordinates into the dataframe

In [6]:
# append coordinates
df_coord = pd.read_csv("https://cocl.us/Geospatial_data")
df_coord = df_coord.rename(columns={"Postal Code": "PostalCode"})
toronto_df = toronto_df.merge(df_coord, on="PostalCode", how="left")

toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


## Exploring and clustering of the Toronto neighborhoods

First get the coordinates for the city of Toronto

In [7]:
address = 'Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Then let's create a map of Toronto with the neighborhoods marked

In [8]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto

## Define foursquare credentials

In [1]:
CLIENT_ID = '###########' # your Foursquare ID
CLIENT_SECRET = '###########' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ###########
CLIENT_SECRET:###########


## Select only part of the neighborhood data
Let's filter the data a bit. Select only the entries with borough containing word "toronto".

In [34]:
toronto_df = toronto_df.loc[toronto_df["Borough"].str.contains('Toronto')]
toronto_df.shape

(38, 5)

Let's see the map with the reduced data set

In [11]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto

## Analyse the neighborhoods
First lets download the venues within different neighborhoods using the function defined in the exercise laboratory.

In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # handle empty response
        if not results:
            venues_list.append([(name, lat, lng)])
            
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [42]:
toronto_venues = getNearbyVenues(names=toronto_df['Neighborhood'], latitudes=toronto_df['Latitude'], longitudes=toronto_df['Longitude'], radius=150)
toronto_venues.head()

Harbourfront,Regent Park
Ryerson,Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide,King,Richmond
Dovercourt Village,Dufferin
Harbourfront East,Toronto Islands,Union Station
Little Portugal,Trinity
The Danforth West,Riverdale
Design Exchange,Toronto Dominion Centre
Brockton,Exhibition Place,Parkdale Village
The Beaches West,India Bazaar
Commerce Court,Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North,Forest Hill West
High Park,The Junction South
North Toronto West
The Annex,North Midtown,Yorkville
Parkdale,Roncesvalles
Davisville
Harbord,University of Toronto
Runnymede,Swansea
Moore Park,Summerhill East
Chinatown,Grange Park,Kensington Market
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown,St. James Town
First Canadian Place,Underground city


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Harbourfront,Regent Park",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Harbourfront,Regent Park",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Harbourfront,Regent Park",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Harbourfront,Regent Park",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Harbourfront,Regent Park",43.65426,-79.360636,The Extension Room,43.653313,-79.359725,Gym / Fitness Center


In [43]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",17,17,17,17,17,17
Berczy Park,1,1,0,0,0,0
"Brockton,Exhibition Place,Parkdale Village",1,1,0,0,0,0
Business Reply Mail Processing Centre 969 Eastern,1,1,1,1,1,1
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",1,1,1,1,1,1
"Cabbagetown,St. James Town",7,7,7,7,7,7
Central Bay Street,6,6,6,6,6,6
"Chinatown,Grange Park,Kensington Market",19,19,19,19,19,19
Christie,1,1,1,1,1,1
Church and Wellesley,6,6,6,6,6,6


Venue categories:

In [44]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 97 uniques categories.


One-hot encode venues

In [45]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,American Restaurant,Arepa Restaurant,Art Gallery,Asian Restaurant,Auto Workshop,Bakery,Bank,Bar,...,Sushi Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Harbourfront,Regent Park",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Harbourfront,Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Harbourfront,Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Harbourfront,Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Harbourfront,Regent Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Next group the rows by the nieighborhood and take the mean of the frequency of occurrence of each category

In [46]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Adult Boutique,American Restaurant,Arepa Restaurant,Art Gallery,Asian Restaurant,Auto Workshop,Bakery,Bank,Bar,...,Sushi Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,...,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.105263,...,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.105263,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Lets print top 5 venus for each of the neighborhoods

In [47]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
                 venue  freq
0           Steakhouse  0.12
1  Japanese Restaurant  0.06
2          Opera House  0.06
3           Food Court  0.06
4       General Travel  0.06


----Berczy Park----
                   venue  freq
0         Adult Boutique   0.0
1    American Restaurant   0.0
2            Pizza Place   0.0
3               Pharmacy   0.0
4  Performing Arts Venue   0.0


----Brockton,Exhibition Place,Parkdale Village----
                   venue  freq
0         Adult Boutique   0.0
1    American Restaurant   0.0
2            Pizza Place   0.0
3               Pharmacy   0.0
4  Performing Arts Venue   0.0


----Business Reply Mail Processing Centre 969 Eastern----
            venue  freq
0   Auto Workshop   1.0
1  Adult Boutique   0.0
2     Music Venue   0.0
3     Pizza Place   0.0
4        Pharmacy   0.0


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
                   venue  freq
0 

And store the data into a dataframe

In [48]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [49]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Steakhouse,Greek Restaurant,Food Court,Coffee Shop,Opera House,Concert Hall,Dessert Shop,Breakfast Spot,Japanese Restaurant,Bar
1,Berczy Park,Yoga Studio,Wine Bar,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,Dumpling Restaurant
2,"Brockton,Exhibition Place,Parkdale Village",Yoga Studio,Wine Bar,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,Dumpling Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Auto Workshop,Yoga Studio,Fish & Chips Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,Dumpling Restaurant
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Performing Arts Venue,Yoga Studio,Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner


## Cluster neighborhoods

Run k-means to cluster neighborhoods 

In [62]:
# set number of clusters
kclusters = 6

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([1, 1, 1, 3, 5, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1,
       1, 1, 4, 1, 1, 4, 2, 1, 1, 1, 1, 0, 1, 1, 4, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [64]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Harbourfront,Regent Park",43.65426,-79.360636,1,Gym / Fitness Center,Italian Restaurant,Breakfast Spot,Bakery,Spa,Coffee Shop,Fast Food Restaurant,Comfort Food Restaurant,Concert Hall,Convenience Store
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937,1,Coffee Shop,Café,Sandwich Place,College Rec Center,Clothing Store,Pub,Ramen Restaurant,Burrito Place,Movie Theater,Pizza Place
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Gastropub,Japanese Restaurant,Poke Place,Italian Restaurant,Performing Arts Venue,Hostel,Gym,General Entertainment,College Rec Center
19,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Trail,Other Great Outdoors,Yoga Studio,Fast Food Restaurant,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Yoga Studio,Wine Bar,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,Dumpling Restaurant
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Sandwich Place,Smoothie Shop,Pharmacy,Dumpling Restaurant,College Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564,1,Nightclub,Yoga Studio,Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner
30,M5H,Downtown Toronto,"Adelaide,King,Richmond",43.650571,-79.384568,1,Steakhouse,Greek Restaurant,Food Court,Coffee Shop,Opera House,Concert Hall,Dessert Shop,Breakfast Spot,Japanese Restaurant,Bar
31,M6H,West Toronto,"Dovercourt Village,Dufferin",43.669005,-79.442259,1,Bank,Music Venue,Yoga Studio,Fast Food Restaurant,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner
36,M5J,Downtown Toronto,"Harbourfront East,Toronto Islands,Union Station",43.640816,-79.381752,0,Coffee Shop,Pizza Place,Tea Room,Sports Bar,Gym,Asian Restaurant,College Gym,Comfort Food Restaurant,Concert Hall,Convenience Store


Visualize clusters on map

In [65]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Investigate the clusters

print out the clusters

In [66]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Downtown Toronto,0,Coffee Shop,Sandwich Place,Smoothie Shop,Pharmacy,Dumpling Restaurant,College Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store
36,Downtown Toronto,0,Coffee Shop,Pizza Place,Tea Room,Sports Bar,Gym,Asian Restaurant,College Gym,Comfort Food Restaurant,Concert Hall,Convenience Store
42,Downtown Toronto,0,Coffee Shop,Restaurant,Gastropub,Deli / Bodega,Café,Salad Place,Bank,Gym,Soup Place,Art Gallery
54,East Toronto,0,Coffee Shop,Café,New American Restaurant,College Gym,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner
80,Downtown Toronto,0,Coffee Shop,College Gym,Grocery Store,Greek Restaurant,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop
86,Central Toronto,0,Convenience Store,Supermarket,Coffee Shop,Yoga Studio,Fast Food Restaurant,College Rec Center,Comfort Food Restaurant,Concert Hall,Deli / Bodega,Dessert Shop


In [67]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,1,Gym / Fitness Center,Italian Restaurant,Breakfast Spot,Bakery,Spa,Coffee Shop,Fast Food Restaurant,Comfort Food Restaurant,Concert Hall,Convenience Store
9,Downtown Toronto,1,Coffee Shop,Café,Sandwich Place,College Rec Center,Clothing Store,Pub,Ramen Restaurant,Burrito Place,Movie Theater,Pizza Place
15,Downtown Toronto,1,Coffee Shop,Gastropub,Japanese Restaurant,Poke Place,Italian Restaurant,Performing Arts Venue,Hostel,Gym,General Entertainment,College Rec Center
19,East Toronto,1,Trail,Other Great Outdoors,Yoga Studio,Fast Food Restaurant,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop
20,Downtown Toronto,1,Yoga Studio,Wine Bar,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,Dumpling Restaurant
25,Downtown Toronto,1,Nightclub,Yoga Studio,Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner
30,Downtown Toronto,1,Steakhouse,Greek Restaurant,Food Court,Coffee Shop,Opera House,Concert Hall,Dessert Shop,Breakfast Spot,Japanese Restaurant,Bar
31,West Toronto,1,Bank,Music Venue,Yoga Studio,Fast Food Restaurant,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner
37,West Toronto,1,Pizza Place,Bar,French Restaurant,Mac & Cheese Joint,Coffee Shop,Record Shop,Salon / Barbershop,Brewery,Yoga Studio,Bakery
41,East Toronto,1,Yoga Studio,Wine Bar,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,Dumpling Restaurant


In [68]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,2,Health & Beauty Service,Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,Dumpling Restaurant


In [69]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
100,East Toronto,3,Auto Workshop,Yoga Studio,Fish & Chips Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner,Dumpling Restaurant


In [70]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
47,East Toronto,4,Park,Fish & Chips Shop,Yoga Studio,Farmers Market,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop
83,Central Toronto,4,Park,Yoga Studio,Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner
91,Downtown Toronto,4,Park,Yoga Studio,Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner


In [71]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
87,Downtown Toronto,5,Performing Arts Venue,Yoga Studio,Gym,College Rec Center,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Dessert Shop,Diner


## Some comments regarding the clusters
The neighborhoods are quite close by some the same yoga studio might be included in multiple neighborhood even though I decreased the venue search radius.

* Cluster 1 looks like that it is driven by coffee shops & restaurants. 
* Cluster 2 could be categorized as "fitness and wine". 
* Cluster 3 car service
* Cluster 4 park.
* Finally Cluster 5 performing arts.

