# Segmenting and Clustering Neighborhoods in the city of Toronto, Canada

For this lab qe will use urllib and BeautifulSoup library, lets import it:

In [2]:
import urllib.request
from bs4 import BeautifulSoup

Now lets define the URL from Wikipedia page with postal codes from Toronto and read it into a variable:

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = urllib.request.urlopen(url)

parse the HTML from our URL into the BeautifulSoup parse tree format

In [4]:
soup = BeautifulSoup(page, 'lxml')

Lets use the method 'finda_all' to bring back all the instances of tag 'table'

In [5]:
all_tables = soup.find_all('table')
all_tables

[<table class="wikitable sortable">
 <tbody><tr>
 <th>Postal Code
 </th>
 <th>Borough
 </th>
 <th>Neighborhood
 </th></tr>
 <tr>
 <td>M1A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M2A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M3A
 </td>
 <td>North York
 </td>
 <td>Parkwoods
 </td></tr>
 <tr>
 <td>M4A
 </td>
 <td>North York
 </td>
 <td>Victoria Village
 </td></tr>
 <tr>
 <td>M5A
 </td>
 <td>Downtown Toronto
 </td>
 <td>Regent Park, Harbourfront
 </td></tr>
 <tr>
 <td>M6A
 </td>
 <td>North York
 </td>
 <td>Lawrence Manor, Lawrence Heights
 </td></tr>
 <tr>
 <td>M7A
 </td>
 <td>Downtown Toronto
 </td>
 <td>Queen's Park, Ontario Provincial Government
 </td></tr>
 <tr>
 <td>M8A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>
 <tr>
 <td>M9A
 </td>
 <td>Etobicoke
 </td>
 <td>Islington Avenue, Humber Valley Village
 </td></tr>
 <tr>
 <td>M1B
 </td>
 <td>Scarborough
 </td>
 <td>Malvern, Rouge
 </td></tr>
 <tr>
 <td>M2B


We can se that we have more than one table, we can use  the 'table class' to save the correct table in a variable

In [6]:
corr_table = soup.find('table', class_='wikitable sortable')
corr_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighborhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td>

Now we can read each line of the table and store theis values in lists, one for each column

In [7]:
# defining the empity lists to store the table information
A=[]
B=[]
C=[]

# loop to interact with table's rows
for row in corr_table.find_all('tr'):
    cells = row.find_all('td')
    if len(cells) == 3:
        A.append(cells[0].get_text()[:-1])
        B.append(cells[1].get_text()[:-1])
        C.append(cells[2].get_text()[:-1])

With this we can now generate our pandas DataFrame with the lists that we just scraped from the page

In [8]:
# importinf the pandas library as pd
import pandas as pd

#creating our dataframe
df_tor = pd.DataFrame(A, dtype='str', columns=['Postal Code'])
df_tor['Borough'] = B
df_tor['Neighborhood'] = C

#checking how the df looks like
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Lets clean our dataframe and remove all the rows without a Borough assingned

In [9]:
# df shape before cleaning
print('The shape of df before cleaning is:',df_tor.shape)

# colecting the index of the rows
row_erase = df_tor[df_tor['Borough'].str.contains('Not assigned')].index

# droping the rows from df and reset the index
df_tor.drop(row_erase, inplace=True)
df_tor.reset_index(drop=True, inplace=True)

# df shape after cleaning
print('The shape of df after cleaning is:', df_tor.shape)

The shape of df before cleaning is: (180, 3)
The shape of df after cleaning is: (103, 3)


In [10]:
df_tor.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


Lets get the coordinates from each postal code using the geocoder package

In [18]:
# installing the package with pip
!pip install geocoder



In [30]:
# importing the package
import geocoder

# creating the variables to store the data
latitude=[]
longitude=[]

# the loop to interact with all rows in data frame
for postalcode in df_tor['Postal Code']:
    
    print('Colecting coordinates for Postal Code {}'.format(postalcode))
    
    # variable to control the while loop
    lat_long_coord = None
    
    # while loop to guarantie that the result is not none, needs because this package is unreliable
    while (lat_long_coord is None):
        #print('send request')
        g = geocoder.arcgis('{}, Toronto ON'.format(postalcode))
        lat_long_coord = g.latlng
        print(lat_long_coord)
    
    # storing the latitudes and longitudes 
    latitude.append(lat_long_coord[0])
    longitude.append(lat_long_coord[1])

Colecting coordinates for Postal Code M3A
[43.75293455500008, -79.33564142299997]
Colecting coordinates for Postal Code M4A
[43.72810248500008, -79.31188987099995]
Colecting coordinates for Postal Code M5A
[43.65096410900003, -79.35304116399999]
Colecting coordinates for Postal Code M6A
[43.723265465000054, -79.45121077799996]
Colecting coordinates for Postal Code M7A
[43.66179000000005, -79.38938999999993]
Colecting coordinates for Postal Code M9A
[43.66748067300006, -79.52895286499995]
Colecting coordinates for Postal Code M1B
[43.80862623100006, -79.18991284599997]
Colecting coordinates for Postal Code M3B
[43.74890000000005, -79.35721999999998]
Colecting coordinates for Postal Code M4B
[43.70719267700008, -79.31152927299996]
Colecting coordinates for Postal Code M5B
[43.65749059800004, -79.37752923699998]
Colecting coordinates for Postal Code M6B
[43.70727872700007, -79.44750009299997]
Colecting coordinates for Postal Code M9B
[43.65002250300006, -79.55408903099999]
Colecting coord

In [31]:
# including the latitude and longitude to df
df_tor['Latitude'] = latitude
df_tor['Longitude'] = longitude

# check how the df is now
df_tor.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.752935,-79.335641
1,M4A,North York,Victoria Village,43.728102,-79.31189
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.650964,-79.353041
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.723265,-79.451211
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66179,-79.38939
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667481,-79.528953
6,M1B,Scarborough,"Malvern, Rouge",43.808626,-79.189913
7,M3B,North York,Don Mills,43.7489,-79.35722
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.707193,-79.311529
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657491,-79.377529


## Clustering the Neighborhoods

Once we have the Neighborhood data in dataframe we now can cluster them by similarity

In [32]:
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files
import requests as rq # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                       

Defining the Foursquare Credentials

In [34]:
# The code was removed by Watson Studio for sharing.

Defining the Limit of venues returned and the api version

In [35]:
VERSION = '20200628'
LIMIT = 100

Creating a function to return the venues in a range of 500m of each Neighborhood

In [38]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = rq.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Let's use this function to get the venues for each Neighborhood

In [39]:
toronto_venues = getNearbyVenues(names=df_tor['Neighborhood'],
                                   latitudes=df_tor['Latitude'],
                                   longitudes=df_tor['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

Let's check the shape of the result and how its look like

In [42]:
print(toronto_venues.shape)
toronto_venues.head(10)

(2281, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.752935,-79.335641,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.752935,-79.335641,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.728102,-79.31189,Portugril,43.725819,-79.312785,Portuguese Restaurant
3,Victoria Village,43.728102,-79.31189,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.728102,-79.31189,Eglinton Ave E & Sloane Ave/Bermondsey Rd,43.726086,-79.31362,Intersection
5,Victoria Village,43.728102,-79.31189,Pizza Nova,43.725824,-79.31286,Pizza Place
6,Victoria Village,43.728102,-79.31189,Wigmore Park,43.731023,-79.310771,Park
7,Victoria Village,43.728102,-79.31189,The Frig,43.727051,-79.317418,French Restaurant
8,"Regent Park, Harbourfront",43.650964,-79.353041,Souk Tabule,43.653756,-79.35439,Mediterranean Restaurant
9,"Regent Park, Harbourfront",43.650964,-79.353041,Young Centre for the Performing Arts,43.650825,-79.357593,Performing Arts Venue


Let's check how many venues returned for each neighborhood and how many unique categories we have

In [44]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",7,7,7,7,7,7
"Bathurst Manor, Wilson Heights, Downsview North",21,21,21,21,21,21
Bayview Village,2,2,2,2,2,2
"Bedford Park, Lawrence Manor East",22,22,22,22,22,22
Berczy Park,67,67,67,67,67,67
"Birch Cliff, Cliffside West",4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",43,43,43,43,43,43
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",65,65,65,65,65,65


In [45]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 266 uniques categories.


Let's encode the categories in a new data frame

In [65]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# define a list of column names
cols = toronto_onehot.columns.tolist()
cols

# move the column name to the beggining
cols.insert(0, cols.pop(cols.index('Neighborhood')))
cols

#then use .reindex() function to reorder
toronto_onehot = toronto_onehot.reindex(columns= cols)

#check result
toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now let's group the rows by Neighborhood and take the mean frequency for each category

In [66]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Agincourt,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,"Alderwood, Long Branch",0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,Bayview Village,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.500000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,Berczy Park,0.000000,0.000000,0.000000,0.000000,0.0,0.014925,0.000000,0.000000,0.000000,...,0.000000,0.00,0.014925,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.014925
6,"Birch Cliff, Cliffside West",0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.023256,0.000000,0.000000,0.000000,0.0,0.023256,0.000000,0.023256,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,"Business reply mail Processing Centre, South C...",0.000000,0.000000,0.000000,0.020000,0.0,0.010000,0.000000,0.010000,0.020000,...,0.000000,0.00,0.020000,0.000000,0.000000,0.000000,0.010000,0.000000,0.000000,0.000000
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.015385,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


Let's create a dataframe with top 10 venues for each neigborhood

In [67]:
# function to sorte the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [111]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Breakfast Spot,Supermarket,Badminton Court,Skating Rink,Sushi Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Falafel Restaurant
1,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Gas Station,Sandwich Place,Pub,Gym,Convenience Store,Falafel Restaurant,Ethiopian Restaurant,Distribution Center
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Pizza Place,Park,Deli / Bodega,Diner,Restaurant,Mobile Phone Shop,Middle Eastern Restaurant,Sandwich Place
3,Bayview Village,Trail,Construction & Landscaping,Falafel Restaurant,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Yoga Studio
4,"Bedford Park, Lawrence Manor East",Sandwich Place,Coffee Shop,Italian Restaurant,Pizza Place,Thai Restaurant,Butcher,Comfort Food Restaurant,Breakfast Spot,Liquor Store,Café


Now we can cluster the neigborhoods

In [112]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 2, 1, 1, 1, 1, 1, 1], dtype=int32)

Creatinng a new dataframe wiht the cluster labels and top 10 venues

In [113]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_tor

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how='right')
toronto_merged['Cluster Labels'].astype('int32')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.752935,-79.335641,4,Food & Drink Shop,Park,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Falafel Restaurant,Distribution Center
1,M4A,North York,Victoria Village,43.728102,-79.31189,1,Pizza Place,Portuguese Restaurant,Intersection,Park,French Restaurant,Coffee Shop,Falafel Restaurant,Ethiopian Restaurant,Farm,Discount Store
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.650964,-79.353041,1,Pub,Café,Athletics & Sports,Performing Arts Venue,Theater,Mediterranean Restaurant,Food Truck,French Restaurant,Mexican Restaurant,Distribution Center
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.723265,-79.451211,1,Clothing Store,Women's Store,Restaurant,Sushi Restaurant,Food Court,Bookstore,Toy / Game Store,Furniture / Home Store,American Restaurant,Men's Store
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66179,-79.38939,1,Coffee Shop,Café,Sushi Restaurant,Yoga Studio,Diner,Smoothie Shop,Sandwich Place,Park,Bookstore,Fried Chicken Joint


Let's plot it into a map

In [114]:
# geting the coordinates from Toronto
# variable to control the while loop
lat_long_coord = None
    
# while loop to guarantie that the result is not none, needs because this package is unreliable
while (lat_long_coord is None):
    g = geocoder.arcgis('{}, Toronto ON'.format(postalcode))
    lat_long_coord = g.latlng
    
# storing the latitudes and longitudes 
tor_lat = lat_long_coord[0]
tor_long = lat_long_coord[1]
    
# create map
map_clusters = folium.Map(location=[tor_lat, tor_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

We can see that the most neighborhoods in Toronto are similar having the following top 10 venues

In [115]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,Pizza Place,Portuguese Restaurant,Intersection,Park,French Restaurant,Coffee Shop,Falafel Restaurant,Ethiopian Restaurant,Farm,Discount Store
2,Downtown Toronto,1,Pub,Café,Athletics & Sports,Performing Arts Venue,Theater,Mediterranean Restaurant,Food Truck,French Restaurant,Mexican Restaurant,Distribution Center
3,North York,1,Clothing Store,Women's Store,Restaurant,Sushi Restaurant,Food Court,Bookstore,Toy / Game Store,Furniture / Home Store,American Restaurant,Men's Store
4,Downtown Toronto,1,Coffee Shop,Café,Sushi Restaurant,Yoga Studio,Diner,Smoothie Shop,Sandwich Place,Park,Bookstore,Fried Chicken Joint
7,North York,1,Athletics & Sports,Restaurant,Park,Bank,Trail,Gym,Other Great Outdoors,Burger Joint,Falafel Restaurant,Farm
13,North York,1,Athletics & Sports,Restaurant,Park,Bank,Trail,Gym,Other Great Outdoors,Burger Joint,Falafel Restaurant,Farm
8,East York,1,Pizza Place,Fast Food Restaurant,Breakfast Spot,Bank,Intersection,Café,Rock Climbing Spot,Athletics & Sports,Gastropub,Gym / Fitness Center
9,Downtown Toronto,1,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Cosmetics Shop,Café,Bar,Tea Room,Tanning Salon,Bakery,Ramen Restaurant
10,North York,1,Grocery Store,Pub,Fast Food Restaurant,Pizza Place,Gas Station,Mediterranean Restaurant,Japanese Restaurant,Sushi Restaurant,Asian Restaurant,Dog Run
11,Etobicoke,1,Pizza Place,Chinese Restaurant,Sandwich Place,Tea Room,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Ethiopian Restaurant,Distribution Center
