# Cluster Analysis in Toronto

Webscraping a Wikipedia page and creating a datafarme using pandas. Adding Foursquare location data using latitude and longitude coordinates of each neighborhood. Cluster the neighbourhoods via k-means.

## Code

In [98]:
import pandas as pd
import pgeocode
import numpy as np
import folium
import requests
import json
from sklearn.cluster import KMeans


In [4]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

#### 1 Scrape Wikipedia
The first table on the Wikipedia page about Toronto Postal Codes is imported.

In [7]:
df_filtered = df[df['Borough'] != 'Not assigned'] 
df_filtered.reset_index(inplace = True)
df_filtered.drop(['index'], axis = 1, inplace = True)
df_filtered.rename(columns = {"Postal Code": "PostalCode"}, inplace = True)
print(df_filtered.head())

  PostalCode           Borough                                Neighbourhood
0        M3A        North York                                    Parkwoods
1        M4A        North York                             Victoria Village
2        M5A  Downtown Toronto                    Regent Park, Harbourfront
3        M6A        North York             Lawrence Manor, Lawrence Heights
4        M7A  Downtown Toronto  Queen's Park, Ontario Provincial Government


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Not assigned postal codes are dropped from the table. The index of the dataframe has been reset; the old index was dropped.

In [6]:
df_filtered.shape

(103, 3)

The shape of the table is 103 rows and 3 columns.

#### 2 Coordinates
Add geographical coordinates to each postal code.

In [9]:
ca = pgeocode.Nominatim('ca')
postal_codes = df_filtered['PostalCode'].tolist()
df_pgeocode = ca.query_postal_code(postal_codes)
#print(df_pgeocode)

Now clean the table and merge with 'df_filtered'.

In [10]:
df_coord = df_pgeocode[['postal_code', 'latitude', 'longitude']]
df_coord.rename(columns = {"postal_code": "PostalCode", "latitude": "Latitude", "longitude": "Longitude"}, inplace = True)
#print(df_coord)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [19]:
df_merged = pd.merge(df_filtered, df_coord, on = 'PostalCode')
pd.set_option('display.max_rows', None)
print(df_merged.head())
print(df_merged[75:78])

  PostalCode           Borough                                Neighbourhood  \
0        M3A        North York                                    Parkwoods   
1        M4A        North York                             Victoria Village   
2        M5A  Downtown Toronto                    Regent Park, Harbourfront   
3        M6A        North York             Lawrence Manor, Lawrence Heights   
4        M7A  Downtown Toronto  Queen's Park, Ontario Provincial Government   

   Latitude  Longitude  
0   43.7545   -79.3300  
1   43.7276   -79.3148  
2   43.6555   -79.3626  
3   43.7223   -79.4504  
4   43.6641   -79.3889  
   PostalCode       Borough  \
75        M6R  West Toronto   
76        M7R   Mississauga   
77        M9R     Etobicoke   

                                        Neighbourhood  Latitude  Longitude  
75                             Parkdale, Roncesvalles   43.6469   -79.4521  
76              Canada Post Gateway Processing Centre       NaN        NaN  
77  Kingsview Villa

There is a problem with the row with index 76. No coordinates are extracted. It turns out that this postal code is exclusively assigned to a branch of Canadian Post. The missing coordinates were therefore entered manually.   

In [20]:
df_merged["Latitude"].fillna(43.635060, inplace = True)
df_merged["Longitude"].fillna(-79.618030, inplace = True) 
print(df_merged)

    PostalCode           Borough  \
0          M3A        North York   
1          M4A        North York   
2          M5A  Downtown Toronto   
3          M6A        North York   
4          M7A  Downtown Toronto   
5          M9A         Etobicoke   
6          M1B       Scarborough   
7          M3B        North York   
8          M4B         East York   
9          M5B  Downtown Toronto   
10         M6B        North York   
11         M9B         Etobicoke   
12         M1C       Scarborough   
13         M3C        North York   
14         M4C         East York   
15         M5C  Downtown Toronto   
16         M6C              York   
17         M9C         Etobicoke   
18         M1E       Scarborough   
19         M4E      East Toronto   
20         M5E  Downtown Toronto   
21         M6E              York   
22         M1G       Scarborough   
23         M4G         East York   
24         M5G  Downtown Toronto   
25         M6G  Downtown Toronto   
26         M1H       Scarbor

#### 3 Map Neighbourhoods
Create a map of Tornoto Cluster neighbourhoods of Toronto.

In [27]:
map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=11)

for lat, long, post, borough, neighbourhood in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['PostalCode'], df_merged['Borough'], df_merged['Neighbourhood']):
    label = "{}, {}".format(neighbourhood, borough)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
map_toronto

In [29]:
unique_boroughs = df_merged['Borough'].unique()
print(unique_boroughs)

['North York' 'Downtown Toronto' 'Etobicoke' 'Scarborough' 'East York'
 'York' 'East Toronto' 'West Toronto' 'Central Toronto' 'Mississauga']


#### 4 Narrow down Neighbourhoods
There are 10 different boroughs in Toronto. I will focus on those which have 'Toronto' in their name.

In [33]:
borough_names = list(df_merged.Borough.unique())

borough_central_df = []

for x in borough_names:
    if "toronto" in x.lower():
        borough_central_df.append(x)
        
print(borough_central_df)

toronto_central_df = df_merged[df_merged['Borough'].isin(borough_central_df)].reset_index(drop=True)
print(toronto_central_df.shape)
toronto_central_df.head()
toronto_central_df.shape

['Downtown Toronto', 'East Toronto', 'West Toronto', 'Central Toronto']
(39, 5)


(39, 5)

In [34]:
map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=11)

for lat, long, post, borough, neighbourhood in zip(toronto_central_df['Latitude'], toronto_central_df['Longitude'], toronto_central_df['PostalCode'], toronto_central_df['Borough'], toronto_central_df['Neighbourhood']):
    label = "{}, {}".format(neighbourhood, borough)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
map_toronto

Use Foursquare API to explore the boroughs and segment them.

In [35]:
CLIENT_ID = 'BD2UK4A1FFBLLHLCWHTBQKJLCBVGWN5DCCKGW0X15EQM4ILT'
CLIENT_SECRET = 'GEZSONRDNNHU3QSLGFDOLYSV3YSIB5KKM1C2IDNUKIHEYJIL'
VERSION = '20200731'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BD2UK4A1FFBLLHLCWHTBQKJLCBVGWN5DCCKGW0X15EQM4ILT
CLIENT_SECRET:GEZSONRDNNHU3QSLGFDOLYSV3YSIB5KKM1C2IDNUKIHEYJIL


In [37]:
radius = 500 # Within a radius of 500 meters ...
limit = 100 # ... the top 100 venues

venues = []

for lat, long, post, borough, neighborhood in zip(toronto_central_df['Latitude'], 
                                                  toronto_central_df['Longitude'], 
                                                  toronto_central_df['PostalCode'], 
                                                  toronto_central_df['Borough'], 
                                                  toronto_central_df['Neighbourhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        limit)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            neighbourhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [40]:
venues_df = pd.DataFrame(venues)
venues_df.columns = ['PostalCode', 'Borough', 'Neighbourhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.head())
venues_df.shape

  PostalCode           Borough  \
0        M5A  Downtown Toronto   
1        M5A  Downtown Toronto   
2        M5A  Downtown Toronto   
3        M5A  Downtown Toronto   
4        M5A  Downtown Toronto   

                                       Neighbourhood  BoroughLatitude  \
0  Business reply mail Processing Centre, South C...          43.6555   
1  Business reply mail Processing Centre, South C...          43.6555   
2  Business reply mail Processing Centre, South C...          43.6555   
3  Business reply mail Processing Centre, South C...          43.6555   
4  Business reply mail Processing Centre, South C...          43.6555   

   BoroughLongitude               VenueName  VenueLatitude  VenueLongitude  \
0          -79.3626           Tandem Coffee      43.653559      -79.361809   
1          -79.3626        Roselle Desserts      43.653447      -79.362017   
2          -79.3626  Figs Breakfast & Lunch      43.655675      -79.364503   
3          -79.3626         The Yoga Lounge 

(1545, 9)

Analyze the venues dataframe. Size, venues per neighbourhood, and unique categories.

In [42]:
print(venues_df.shape)

(1545, 9)


In [43]:
print(venues_df.head())

  PostalCode           Borough  \
0        M5A  Downtown Toronto   
1        M5A  Downtown Toronto   
2        M5A  Downtown Toronto   
3        M5A  Downtown Toronto   
4        M5A  Downtown Toronto   

                                       Neighbourhood  BoroughLatitude  \
0  Business reply mail Processing Centre, South C...          43.6555   
1  Business reply mail Processing Centre, South C...          43.6555   
2  Business reply mail Processing Centre, South C...          43.6555   
3  Business reply mail Processing Centre, South C...          43.6555   
4  Business reply mail Processing Centre, South C...          43.6555   

   BoroughLongitude               VenueName  VenueLatitude  VenueLongitude  \
0          -79.3626           Tandem Coffee      43.653559      -79.361809   
1          -79.3626        Roselle Desserts      43.653447      -79.362017   
2          -79.3626  Figs Breakfast & Lunch      43.655675      -79.364503   
3          -79.3626         The Yoga Lounge 

In [47]:
print(venues_df.groupby(["PostalCode", "Borough", "Neighbourhood"]).count())

                                                                                BoroughLatitude  \
PostalCode Borough          Neighbourhood                                                         
M4E        East Toronto     Business reply mail Processing Centre, South Ce...                8   
M4K        East Toronto     Business reply mail Processing Centre, South Ce...               34   
M4L        East Toronto     Business reply mail Processing Centre, South Ce...               19   
M4M        East Toronto     Business reply mail Processing Centre, South Ce...                8   
M4N        Central Toronto  Business reply mail Processing Centre, South Ce...                2   
M4P        Central Toronto  Business reply mail Processing Centre, South Ce...                6   
M4R        Central Toronto  Business reply mail Processing Centre, South Ce...                4   
M4S        Central Toronto  Business reply mail Processing Centre, South Ce...               22   
M4T       

In [48]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 215 uniques categories.


#### 5 Analyze Central Toronto

In [51]:
# one hot encoding
toronto_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
toronto_onehot['PostalCode'] = venues_df['PostalCode'] 
toronto_onehot['Borough'] = venues_df['Borough'] 
toronto_onehot['Neighbourhoods'] = venues_df['Neighbourhood'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(toronto_onehot.columns[-3:]) + list(toronto_onehot.columns[:-3])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(1545, 218)


Unnamed: 0,PostalCode,Borough,Neighbourhoods,Accessories Store,Afghan Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M5A,Downtown Toronto,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,Downtown Toronto,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,Downtown Toronto,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,Downtown Toronto,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
4,M5A,Downtown Toronto,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [53]:
toronto_grouped = toronto_onehot.groupby(["PostalCode", "Borough", "Neighbourhoods"]).mean().reset_index()

print(toronto_grouped.shape)
toronto_grouped.head()

(38, 218)


Unnamed: 0,PostalCode,Borough,Neighbourhoods,Accessories Store,Afghan Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M4E,East Toronto,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,East Toronto,"Business reply mail Processing Centre, South C...",0.0,0.0,0.029412,0.0,0.0,0.0,0.0,...,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412
2,M4L,East Toronto,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,East Toronto,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M4N,Central Toronto,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Print each neighborhood along with the top 20 most common venues.

In [55]:
num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Borough', 'Neighbourhoods']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']
neighbourhoods_venues_sorted['Borough'] = toronto_grouped['Borough']
neighbourhoods_venues_sorted['Neighbourhoods'] = toronto_grouped['Neighbourhoods']

for ind in np.arange(toronto_grouped.shape[0]):
    row_categories = toronto_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighbourhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

print(neighbourhoods_venues_sorted.shape)
neighbourhoods_venues_sorted.head()

(38, 23)


Unnamed: 0,PostalCode,Borough,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,M4E,East Toronto,"Business reply mail Processing Centre, South C...",Pub,Health Food Store,Trail,Gastropub,Neighborhood,Bakery,Cheese Shop,...,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Doner Restaurant,Discount Store,Distribution Center,Fish & Chips Shop
1,M4K,East Toronto,"Business reply mail Processing Centre, South C...",Greek Restaurant,Ice Cream Shop,Italian Restaurant,Café,Restaurant,Yoga Studio,Cosmetics Shop,...,Pub,Lounge,Dessert Shop,Spa,Liquor Store,Juice Bar,Fruit & Vegetable Store,Indian Restaurant,Brewery,Grocery Store
2,M4L,East Toronto,"Business reply mail Processing Centre, South C...",Pet Store,Sushi Restaurant,Steakhouse,Brewery,Burrito Place,Sandwich Place,Restaurant,...,Movie Theater,Liquor Store,Light Rail Station,Fast Food Restaurant,Fish & Chips Shop,Italian Restaurant,Ice Cream Shop,Board Shop,Gym,Beer Store
3,M4M,East Toronto,"Business reply mail Processing Centre, South C...",Baseball Field,Park,Coffee Shop,Diner,Garden Center,Gym,Performing Arts Venue,...,Dog Run,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Ethiopian Restaurant,Yoga Studio,Distribution Center,Discount Store
4,M4N,Central Toronto,"Business reply mail Processing Centre, South C...",Photography Studio,Park,Distribution Center,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,...,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Hawaiian Restaurant,Diner,Dessert Shop,Department Store,Deli / Bodega


#### 6 Cluster Neighborhoods

In [204]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop(["PostalCode", "Borough", "Neighbourhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 2, 0, 1, 0, 1, 0])

In [207]:
neighbourhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

# create a new dataframe that includes the cluster as well as the top 20 venues for each neighborhood.
toronto_merged = toronto_central_df.copy()

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.drop(["Borough", "Neighbourhoods"], 1).set_index("PostalCode"), on="PostalCode")

print(toronto_merged.shape)
toronto_merged

(39, 27)


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,0.0,4.0,Coffee Shop,Breakfast Spot,Yoga Studio,...,Pub,Restaurant,Spa,Beer Store,Gym / Fitness Center,Theater,Bakery,Bar,Eastern European Restaurant,Flower Shop
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0.0,4.0,Coffee Shop,Dance Studio,College Theater,...,Japanese Restaurant,Sushi Restaurant,Portuguese Restaurant,Beer Bar,Indian Restaurant,Café,Distribution Center,Creperie,Theater,Diner
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,0.0,4.0,Coffee Shop,Clothing Store,Café,...,Plaza,Pizza Place,Fast Food Restaurant,Diner,Furniture / Home Store,Middle Eastern Restaurant,Theater,Lingerie Store,Movie Theater,Music Venue
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,0.0,4.0,Café,Coffee Shop,Seafood Restaurant,...,Hotel,Park,Beer Bar,Lingerie Store,Creperie,Moroccan Restaurant,Diner,Breakfast Spot,Department Store,Gastropub
4,M4E,East Toronto,The Beaches,43.6784,-79.2941,0.0,4.0,Pub,Health Food Store,Trail,...,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Doner Restaurant,Discount Store,Distribution Center,Fish & Chips Shop
5,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,0.0,4.0,Coffee Shop,Hotel,Café,...,Pub,Italian Restaurant,Cheese Shop,Cocktail Bar,Deli / Bodega,Breakfast Spot,Art Gallery,Shopping Mall,Department Store,Molecular Gastronomy Restaurant
6,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,0.0,4.0,Coffee Shop,Italian Restaurant,Sandwich Place,...,Gastropub,Park,Shopping Mall,Shoe Store,Seafood Restaurant,Modern European Restaurant,Miscellaneous Shop,Electronics Store,Spa,Japanese Restaurant
7,M6G,Downtown Toronto,Christie,43.6683,-79.4205,0.0,0.0,Café,Grocery Store,Candy Store,...,Yoga Studio,Ethiopian Restaurant,Doner Restaurant,Event Space,Falafel Restaurant,Farmers Market,Donut Shop,Distribution Center,Dog Run,Fish & Chips Shop
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.6496,-79.3833,0.0,4.0,Café,Coffee Shop,Hotel,...,Concert Hall,Bookstore,Sushi Restaurant,Bar,Seafood Restaurant,Thai Restaurant,Breakfast Spot,Pizza Place,Vegetarian / Vegan Restaurant,Gastropub
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.6655,-79.4378,0.0,4.0,Bakery,Grocery Store,Park,...,Bus Line,Pharmacy,Pet Store,Art Gallery,Gym,Brazilian Restaurant,Donut Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant


In [209]:
toronto_merged['marker color'] = pd.cut(toronto_merged['Cluster_Labels'], bins=4, 
                              labels=['red', 'blue', 'green', 'purple'])
toronto_merged                                      

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster_Labels,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,...,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue,marker color
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626,0.0,4.0,Coffee Shop,Breakfast Spot,Yoga Studio,...,Restaurant,Spa,Beer Store,Gym / Fitness Center,Theater,Bakery,Bar,Eastern European Restaurant,Flower Shop,red
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889,0.0,4.0,Coffee Shop,Dance Studio,College Theater,...,Sushi Restaurant,Portuguese Restaurant,Beer Bar,Indian Restaurant,Café,Distribution Center,Creperie,Theater,Diner,red
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783,0.0,4.0,Coffee Shop,Clothing Store,Café,...,Pizza Place,Fast Food Restaurant,Diner,Furniture / Home Store,Middle Eastern Restaurant,Theater,Lingerie Store,Movie Theater,Music Venue,red
3,M5C,Downtown Toronto,St. James Town,43.6513,-79.3756,0.0,4.0,Café,Coffee Shop,Seafood Restaurant,...,Park,Beer Bar,Lingerie Store,Creperie,Moroccan Restaurant,Diner,Breakfast Spot,Department Store,Gastropub,red
4,M4E,East Toronto,The Beaches,43.6784,-79.2941,0.0,4.0,Pub,Health Food Store,Trail,...,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Doner Restaurant,Discount Store,Distribution Center,Fish & Chips Shop,red
5,M5E,Downtown Toronto,Berczy Park,43.6456,-79.3754,0.0,4.0,Coffee Shop,Hotel,Café,...,Italian Restaurant,Cheese Shop,Cocktail Bar,Deli / Bodega,Breakfast Spot,Art Gallery,Shopping Mall,Department Store,Molecular Gastronomy Restaurant,red
6,M5G,Downtown Toronto,Central Bay Street,43.6564,-79.386,0.0,4.0,Coffee Shop,Italian Restaurant,Sandwich Place,...,Park,Shopping Mall,Shoe Store,Seafood Restaurant,Modern European Restaurant,Miscellaneous Shop,Electronics Store,Spa,Japanese Restaurant,red
7,M6G,Downtown Toronto,Christie,43.6683,-79.4205,0.0,0.0,Café,Grocery Store,Candy Store,...,Ethiopian Restaurant,Doner Restaurant,Event Space,Falafel Restaurant,Farmers Market,Donut Shop,Distribution Center,Dog Run,Fish & Chips Shop,red
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.6496,-79.3833,0.0,4.0,Café,Coffee Shop,Hotel,...,Bookstore,Sushi Restaurant,Bar,Seafood Restaurant,Thai Restaurant,Breakfast Spot,Pizza Place,Vegetarian / Vegan Restaurant,Gastropub,red
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.6655,-79.4378,0.0,4.0,Bakery,Grocery Store,Park,...,Pharmacy,Pet Store,Art Gallery,Gym,Brazilian Restaurant,Donut Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,red


In [212]:
toronto_cluster1 = toronto_merged[toronto_merged['Cluster_Labels'] == 0]
toronto_cluster2 = toronto_merged[toronto_merged['Cluster_Labels'] == 1]
toronto_cluster3 = toronto_merged[toronto_merged['Cluster_Labels'] == 2]
toronto_cluster4 = toronto_merged[toronto_merged['Cluster_Labels'] == 3]

In [215]:
# create map
m = folium.Map(location=[43.653963, -79.387207], zoom_start=11)


# add markers cluster 1 to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster, color in zip(toronto_cluster1['Latitude'], toronto_cluster1['Longitude'], toronto_cluster1['PostalCode'], toronto_cluster1['Borough'], toronto_cluster1['Neighbourhood'], toronto_cluster1['Cluster Labels'], toronto_cluster1['marker color']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7).add_to(m)

# add markers cluster 2 to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster, color in zip(toronto_cluster2['Latitude'], toronto_cluster2['Longitude'], toronto_cluster2['PostalCode'], toronto_cluster2['Borough'], toronto_cluster2['Neighbourhood'], toronto_cluster2['Cluster Labels'], toronto_cluster2['marker color']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='pruple',
        fill=True,
        fill_color='purple',
        fill_opacity=0.7).add_to(m)
    
# add markers cluster 3 to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster, color in zip(toronto_cluster3['Latitude'], toronto_cluster3['Longitude'], toronto_cluster3['PostalCode'], toronto_cluster3['Borough'], toronto_cluster3['Neighbourhood'], toronto_cluster3['Cluster Labels'], toronto_cluster3['marker color']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7).add_to(m)

# add markers cluster 4 to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster, color in zip(toronto_cluster4['Latitude'], toronto_cluster4['Longitude'], toronto_cluster4['PostalCode'], toronto_cluster4['Borough'], toronto_cluster4['Neighbourhood'], toronto_cluster4['Cluster Labels'], toronto_cluster4['marker color']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(m)
    
       
m

Toronto city center is relatively homogenous with respect to the venues. 