# Segmenting and Clustering Neighborhoods of Toronto Canada


## Table of contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Part 1 - Download and wrangle the postal code data for Toronto</a>

2. <a href="#item2">Part 2 - Add geographic coordinates of the postal codes</a>

3. <a href="#item3">Part 3 - Explore and cluster postal code areas in Toronto</a>
  
</font>
</div>




## Part 1 - Download and wrangle the postal code data for Toronto

Steps:
* Using Pandas to read the postal code tables from the wiki page 
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M .

* Remove rows that do not have an assigned Borough.

* Rename columns to PostalCode, Borough, and Neighborhood.

* Replace neighborhoods "Not assigned" with borough name.

* Combine neighborhoods with same postal code.

In [94]:
# Required imports
import pandas as pd

In [95]:
# Get postal codes from the wiki link using pandas
# pd.read_html returns a list of dataframes, selecting the first df that matches the postalcode table 
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df_postal = pd.read_html(url)[0]


In [96]:
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned
# Reset the df index.
df_1 = df_postal[df_postal['Borough'] != 'Not assigned']
df_1.reset_index(drop=True, inplace=True)
df_1

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
...,...,...,...
205,M8Z,Etobicoke,Kingsway Park South West
206,M8Z,Etobicoke,Mimico NW
207,M8Z,Etobicoke,The Queensway West
208,M8Z,Etobicoke,Royal York South West


In [97]:
# Rename columns to PostalCode, Borough, and Neighborhood
df_2 = df_1.rename(columns = {'Postcode': 'PostalCode','Neighbourhood': 'Neighborhood'})
print(df_2.columns)

Index(['PostalCode', 'Borough', 'Neighborhood'], dtype='object')


In [98]:
# If a cell has a borough but a "Not assigned" neighborhood, then the neighborhood name 
# will be the same as the borough.

def fix_neighborhood(row):
    curr_neighborhood = row['Neighborhood']
    if (curr_neighborhood == 'Not assigned'):
        return row['Borough']
    else:
        return curr_neighborhood


# before change - "M9A	Queen's Park	Not assigned"
# print(df_2[df_2['PostalCode'] == "M9A"])

# apply the change
df_2.loc[:, 'Neighborhood'] = df_2.apply(lambda row: fix_neighborhood(row), axis=1)

# after change
# print(df_2[df_2['PostalCode'] == "M9A"])

In [99]:
df_2

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
...,...,...,...
205,M8Z,Etobicoke,Kingsway Park South West
206,M8Z,Etobicoke,Mimico NW
207,M8Z,Etobicoke,The Queensway West
208,M8Z,Etobicoke,Royal York South West


In [100]:
# Combine neighborhoods with same postal code

df_3 = df_2.groupby(['PostalCode', 'Borough']).agg({'Neighborhood': ','.join})

df_3.reset_index(inplace=True)

df_3.head(20)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [101]:
print(df_3.shape)
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_3['Borough'].unique()),
        df_3.shape[0]
    )
)

(103, 3)
The dataframe has 11 boroughs and 103 neighborhoods.


## Part 2 - Add geographic coordinates of the postal codes

* Using the Geospatial_Coordinates.csv data set downloaded from http://cocl.us/Geospatial_data

In [102]:
# Read the csv file and change column name
df_coordinates = pd.read_csv('Geospatial_Coordinates.csv')
df_coordinates.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)

In [103]:
# inner join on the "PostalCode" column
df_4 = pd.merge(df_3, df_coordinates, on='PostalCode', how='inner')
df_4

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.739416,-79.588437


In [104]:
df_4.head(100)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
95,M9C,Etobicoke,"Bloordale Gardens,Eringate,Markland Wood,Old B...",43.643515,-79.577201
96,M9L,North York,Humber Summit,43.756303,-79.565963
97,M9M,North York,"Emery,Humberlea",43.724766,-79.532242
98,M9N,York,Weston,43.706876,-79.518188


## Part 3 - Explore and cluster postal code areas in Toronto

This part is replicating the analysis done for New York City data.

* Only boroughs that contain the work Toronto/toronto are extracted.

* The geo coordinates from part 2 are associated with the Postal Code rather than an individual Neighborhood. One Postal Code can in fact represent several neighborhoods. So the analysis will essentially replace "Neighborhood" with "Postal Code" to emphasize this. 

* For the venue exploration I selected a radius of 750m and a limit of 100 venue hits.



In [105]:
# imports

import numpy as np

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [128]:
LIMIT=100
radius=750

In [106]:
# Get the geo coordinates of Toronto

address = 'Toronto'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto 43.653963, -79.387207.


### Extract boroughs that contain "Toronto"

In [107]:
# Select boroughs that contains Toronto or toronto
df_toronto = df_4[df_4['Borough'].str.contains("Toronto|toronto")==True]
df_toronto.reset_index(drop=True, inplace=True)

In [108]:
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879



### Plot the postal code geo coordinates on the Toronto map

The markers show the "Borough: postal code - neighborhoods". The circles represent the radius around the markers.

In [129]:
# create map of Toronto
map_toronto= folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map of "borough: postal code - neighborhoods"
for lat, lng, postalcode, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['PostalCode'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}: {} - {}'.format(borough, postalcode, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

    

# add radius circles to see the overlap
for lat, lng in zip(df_toronto['Latitude'], df_toronto['Longitude']):
    folium.Circle([lat, lng],radius=radius).add_to(map_toronto)

map_toronto

### Explore the venues around postal code's geo coordinates from Foursquare API

In [130]:
# Foursquare credentials

CLIENT_ID = 'EXENEWRKIY0ZQPXVOZCD3RWEXTNNQC1113GGBXYXJVHERV0J' # your Foursquare ID
CLIENT_SECRET = 'KNWZ5LZITB5OJIAJVWA22LTLQ5OQHT1NX0VJQBOLFW3O1RDM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)


In [131]:
# Borrowed the function that extracts the category of the venue.
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    

In [132]:
# Borrowed the function that gets nearby venues of all provided postal codes. Note that we have geo coordinates 
# associated with postal codes and not an individual neighborhood. 

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['PostalCode', 
                  'PostalCode Latitude', 
                  'PostalCode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [134]:
toronto_venues = getNearbyVenues(names=df_toronto['PostalCode'],
                                   latitudes=df_toronto['Latitude'],
                                   longitudes=df_toronto['Longitude'],
                                  radius=radius)

M4E
M4K
M4L
M4M
M4N
M4P
M4R
M4S
M4T
M4V
M4W
M4X
M4Y
M5A
M5B
M5C
M5E
M5G
M5H
M5J
M5K
M5L
M5N
M5P
M5R
M5S
M5T
M5V
M5W
M5X
M6G
M6H
M6J
M6K
M6P
M6R
M6S
M7A
M7Y


### Get some statistics about the venues results

In [135]:
print(toronto_venues.shape)
toronto_venues.head()

(2676, 7)


Unnamed: 0,PostalCode,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M4E,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,43.676357,-79.293031,Tori's Bakeshop,43.672114,-79.290331,Vegetarian / Vegan Restaurant
2,M4E,43.676357,-79.293031,Beaches Bake Shop,43.680363,-79.289692,Bakery
3,M4E,43.676357,-79.293031,The Fox Theatre,43.672801,-79.287272,Indie Movie Theater
4,M4E,43.676357,-79.293031,The Beech Tree,43.680493,-79.288846,Gastropub


In [136]:
toronto_venues.groupby('PostalCode').count()

Unnamed: 0_level_0,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4E,46,46,46,46,46,46
M4K,98,98,98,98,98,98
M4L,56,56,56,56,56,56
M4M,94,94,94,94,94,94
M4N,5,5,5,5,5,5
M4P,33,33,33,33,33,33
M4R,44,44,44,44,44,44
M4S,70,70,70,70,70,70
M4T,14,14,14,14,14,14
M4V,57,57,57,57,57,57


In [137]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 288 uniques categories.


### Analyze venues for each postal code

In [138]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['PostalCode'] = toronto_venues['PostalCode'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,PostalCode,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M4E,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
2,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Group rows by postal codes and take the mean of the frequency of occurrence of each category

In [139]:
toronto_grouped = toronto_onehot.groupby('PostalCode').mean().reset_index()
toronto_grouped

Unnamed: 0,PostalCode,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.021277,0.010638,0.010638,0.0,0.0,0.010638
4,M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M4P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M4R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727
7,M4S,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M4T,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M4V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.017544


In [140]:
toronto_grouped.shape

(39, 289)

Print each postal code along with top 5 most common venues

In [141]:
num_top_venues = 5
for postalcode in toronto_grouped['PostalCode']:
    print("----"+postalcode+"----")
    temp = toronto_grouped[toronto_grouped['PostalCode'] == postalcode].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq':2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M4E----
                 venue  freq
0                  Pub  0.11
1       Breakfast Spot  0.04
2      Thai Restaurant  0.04
3  Japanese Restaurant  0.04
4       Sandwich Place  0.04


----M4K----
                  venue  freq
0      Greek Restaurant  0.13
1           Coffee Shop  0.07
2                   Pub  0.06
3                  Café  0.04
4  Fast Food Restaurant  0.03


----M4L----
                  venue  freq
0     Indian Restaurant  0.11
1        Sandwich Place  0.05
2  Fast Food Restaurant  0.04
3               Brewery  0.04
4         Grocery Store  0.04


----M4M----
         venue  freq
0         Café  0.06
1  Coffee Shop  0.06
2          Bar  0.05
3        Diner  0.04
4       Bakery  0.04


----M4N----
              venue  freq
0       Swim School   0.2
1              Park   0.2
2       Coffee Shop   0.2
3  Business Service   0.2
4          Bus Line   0.2


----M4P----
         venue  freq
0  Pizza Place  0.09
1  Coffee Shop  0.09
2         Park  0.06
3         Café  0.

### Extract top 5 venues for each postal code area

In [150]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [151]:
num_top_venues = 5
indicators = ['st', 'nd', 'rd']

columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
postalcodes_venues_sorted = pd.DataFrame(columns=columns)
postalcodes_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']

for ind in np.arange(toronto_grouped.shape[0]):
    postalcodes_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

postalcodes_venues_sorted.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M4E,Pub,Coffee Shop,Bar,Breakfast Spot,Sandwich Place
1,M4K,Greek Restaurant,Coffee Shop,Pub,Café,Italian Restaurant
2,M4L,Indian Restaurant,Sandwich Place,Coffee Shop,Grocery Store,Gym
3,M4M,Café,Coffee Shop,Bar,Diner,Bakery
4,M4N,Park,Business Service,Coffee Shop,Swim School,Bus Line


### Clustering postal code areas

In [152]:
# number of clusters
kclusters = 5
toronto_grouped_clustering = toronto_grouped.drop('PostalCode', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

kmeans.labels_[0:10]

array([0, 0, 0, 0, 2, 0, 0, 0, 0, 0], dtype=int32)

In [153]:
postalcodes_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge the dataframe from part 2) with the clustering labels and most common venues 
toronto_merged = df_toronto

toronto_merged = toronto_merged.join(postalcodes_venues_sorted.set_index('PostalCode'), on='PostalCode')
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Pub,Coffee Shop,Bar,Breakfast Spot,Sandwich Place
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Pub,Café,Italian Restaurant
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,0,Indian Restaurant,Sandwich Place,Coffee Shop,Grocery Store,Gym
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Bar,Diner,Bakery
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,2,Park,Business Service,Coffee Shop,Swim School,Bus Line


### Visualize the resulting clusters

In [155]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['PostalCode'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine each cluster



### Cluster 0
Most of the postal code areas of Toronto belong to this cluster. The distinguishible feature seems to be food and drink and places for leasure like parks and gyms. 

In [156]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,East Toronto,0,Pub,Coffee Shop,Bar,Breakfast Spot,Sandwich Place
1,East Toronto,0,Greek Restaurant,Coffee Shop,Pub,Café,Italian Restaurant
2,East Toronto,0,Indian Restaurant,Sandwich Place,Coffee Shop,Grocery Store,Gym
3,East Toronto,0,Café,Coffee Shop,Bar,Diner,Bakery
5,Central Toronto,0,Pizza Place,Coffee Shop,Gym,Park,Café
6,Central Toronto,0,Coffee Shop,Café,Clothing Store,Sporting Goods Shop,Restaurant
7,Central Toronto,0,Coffee Shop,Italian Restaurant,Pizza Place,Sandwich Place,Café
8,Central Toronto,0,Park,Grocery Store,Playground,Sushi Restaurant,Café
9,Central Toronto,0,Coffee Shop,Sushi Restaurant,Italian Restaurant,Restaurant,Gym
11,Downtown Toronto,0,Coffee Shop,Grocery Store,Café,Bakery,Restaurant


### Cluster 1

In [157]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,Downtown Toronto,1,Park,Trail,Playground,Candy Store,Yoga Studio


### Cluster 2

In [158]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Central Toronto,2,Park,Business Service,Coffee Shop,Swim School,Bus Line


### Cluster 3

In [159]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
22,Central Toronto,3,Playground,Home Service,Business Service,IT Services,Garden


### Cluster 4

In [160]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
23,Central Toronto,4,Park,Asian Restaurant,Jewelry Store,Gym / Fitness Center,Sushi Restaurant
