# Segmenting and Clustering Neighborhoods in Toronto

## Table of Contents

1. <a href="#item1">Download and Prepare Dataset</a>
2. <a href="#item2">Explore Neighborhoods</a>
3. <a href="#item4">Cluster Neighborhoods</a>


First step, import the required libraries.

In [1]:
from bs4 import BeautifulSoup #web scraping
import requests# handle requests

import pandas as pd#handle dataframes
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import re#text cleaning (special characters)
import numpy as np

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import geocoder #extract geographical coordinates
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. Download and Explore Dataset

I used BeautifulSoup for scraping the data from the wiki.

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
r  = requests.get(url, timeout=5).text
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
r  = requests.get(url, timeout=5).text
soup = BeautifulSoup(r, 'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className=document.documentElement.className.replace(/(^|\s)client-nojs(\s|$)/,"$1client-js$2");RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":900271985,"wgRevisionId":900271985,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June",

In [3]:
table = soup.find('table', {'class' : 'wikitable sortable'})
table

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>
<tr>
<td>M6A</td>

In [4]:
def clean_text(text):
    bchar =  ['(', ')', ',', '\n', '\r', 'Toronto', '}', '{', "'", "'"]
    big_regex = re.compile('|'.join(map(re.escape, bchar)))
    text = big_regex.sub("", text)
    return text.strip()

In [5]:
tr = table.findAll('tr')
tr = tr[1:]

postal_codes = []
boroughs = []
neighbourhoods = []
        
for r in tr:
    postal_code = r.contents[1]
    postal_codes.append(postal_code.contents[0])
    
    borough = r.contents[3]
    if borough.a == None:
        boroughs.append(None)
    else:
        borough = clean_text(borough.a.get("title"))
        boroughs.append(borough)
    
    neighbourhood = r.contents[5]
    if neighbourhood.a == None:
         neighbourhoods.append(None)
    else:
         neighbourhood = clean_text(neighbourhood.a.get("title"))
         neighbourhoods.append(neighbourhood)     

In [6]:
toronto_data = pd.DataFrame()
toronto_data['Postcode'] = postal_codes
toronto_data['Borough'] = boroughs
toronto_data['Neighborhood'] = neighbourhoods

The assignement instructions specify that we only must process the cells that have an assigned borough. So, I drop the cells with a borough that is Not assigned.

In [7]:
toronto_data = toronto_data.dropna(subset=['Borough'])

From now, I'm working with cells with an assigned borough only. However, a cell can have a Not assigned neighborhood, in this case, the neighborhood will be the same as the borough, as suggested in the assignement instructions. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

In [9]:
idx = pd.Index(toronto_data['Neighborhood']) 
for n in toronto_data['Neighborhood']:
    i = idx.get_loc(n)
    if n == None:
        toronto_data.loc[i,'Neighborhood'] = toronto_data.loc[i,'Borough']

More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page,  M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows should be combined into one row with the neighborhoods separated with a comma. For this, I group data by postal code, to avoid duplicates I aggregate values in a Set, and then I remove useless characters.

In [10]:
toronto_data = toronto_data.groupby('Postcode').agg(set).reset_index()

In [11]:
toronto_data['Borough'] = toronto_data['Borough'].apply(lambda x: re.sub("{|}|'|'", "", str(x)))
toronto_data['Neighborhood'] = toronto_data['Neighborhood'].apply(lambda x: re.sub("{|}|'|'", "", str(x)))

In [12]:
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Highland Creek, Port Union"
2,M1E,Scarborough,"Morningside, Scarborough, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Scarborough


In [13]:
toronto_data.shape

(100, 3)

For extracting location data (latitude and the longitude coordinates of each neighborhood), initially I tried to use the Google Maps Geocoding API. However, I confirm that this package is totally unreliable, as Alex A. mentioned before. 
I was not able to obtain the geographical coordinates. So, I finished by using the provided .csv file. 

In [14]:
# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('Mountain View, CA')
  lat_lng_coords = g.latlng
  print(g)  
    
latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geocode [empty]>
<[REQUEST_DENIED] Google - Geoco

KeyboardInterrupt: 

In [15]:
dataset_path = ''
toronto_lat_lng_coords = pd.read_csv(dataset_path, delimiter=',', encoding='latin1')

In [16]:
toronto_lat_lng_coords.rename(columns={'Postal Code':'Postcode'}, inplace=True)
toronto_lat_lng_coords.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [17]:
toronto_data = pd.merge(toronto_data, toronto_lat_lng_coords, on='Postcode')

In [18]:
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Highland Creek, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Morningside, Scarborough, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Scarborough,43.773136,-79.239476


## 2. Explore Neighborhoods

I create a map of Neighborhoods in Toronto using latitude and longitude values, and using Folium library.

In [19]:
# create map of Toronto using latitude and longitude values
latitude = 43.653226
longitude = -79.3831843
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

I define Foursquare credentials and version for getting data about the venues in Toronto.

In [20]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

In [21]:
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Malvern, Rouge are 43.806686299999996, -79.19435340000001.


In [23]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
#url # display URL

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
   
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [25]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
        )

In [26]:
print(toronto_venues.shape)
toronto_venues.head()

(2244, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Highland Creek, Port Union",43.784535,-79.160497,RIGHT WAY TO GOLF,43.785177,-79.161108,Golf Course
2,"Rouge Hill, Highland Creek, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Morningside, Scarborough, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Morningside, Scarborough, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


In [27]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
Bayview Village,4,4,4,4,4,4
Berczy Park,58,58,58,58,58,58
"CFB, North York",2,2,2,2,2,2
"CN Tower, King and Spadina, South Niagara, Railway Lands, Downtown",17,17,17,17,17,17
"Cabbagetown, St. James Town",43,43,43,43,43,43
Central,58,58,58,58,58,58
"Central, Moore Park",3,3,3,3,3,3
Church and Wellesley,86,86,86,86,86,86
"Commerce Court, Downtown",100,100,100,100,100,100


In [28]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 275 uniques categories.


## 3.  Cluster Neighborhoods

I prepare the data for clustering latter.

In [29]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
toronto_onehot.shape

(2244, 275)

In [31]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,Bayview Village,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,Berczy Park,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.017241,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,"CFB, North York",0.000000,0.0,0.000000,0.500000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,"CN Tower, King and Spadina, South Niagara, Rai...",0.000000,0.0,0.000000,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,"Cabbagetown, St. James Town",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
6,Central,0.017241,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,"Central, Moore Park",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.333333,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,Church and Wellesley,0.011628,0.0,0.011628,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.000000,0.011628,0.000000,0.011628,0.000000,0.000000,0.011628,0.000000
9,"Commerce Court, Downtown",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.00,0.020000,0.000000,0.000000,0.000000,0.000000,0.010000,0.000000,0.000000


In [32]:
toronto_grouped.shape

(80, 275)

In [33]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
            venue  freq
0          Lounge  0.25
1  Breakfast Spot  0.25
2    Skating Rink  0.25
3  Clothing Store  0.25
4     Yoga Studio  0.00


----Bayview Village----
                 venue  freq
0  Japanese Restaurant  0.25
1                 Café  0.25
2                 Bank  0.25
3   Chinese Restaurant  0.25
4        Moving Target  0.00


----Berczy Park----
          venue  freq
0   Coffee Shop  0.09
1  Cocktail Bar  0.05
2      Beer Bar  0.03
3          Café  0.03
4        Bakery  0.03


----CFB, North York----
                       venue  freq
0                    Airport   0.5
1                       Park   0.5
2                Yoga Studio   0.0
3  Middle Eastern Restaurant   0.0
4                      Motel   0.0


----CN Tower, King and Spadina, South Niagara, Railway Lands, Downtown----
              venue  freq
0   Airport Service  0.18
1    Airport Lounge  0.12
2  Airport Terminal  0.12
3          Boutique  0.06
4   Harbor / Marina  0.06


----Cabbageto

                venue  freq
0         Pizza Place  0.08
1          Restaurant  0.08
2  Mexican Restaurant  0.08
3  Seafood Restaurant  0.08
4        Liquor Store  0.08


----North York----
                 venue  freq
0          Coffee Shop  0.07
1                 Café  0.05
2  Japanese Restaurant  0.05
3     Ramen Restaurant  0.05
4           Restaurant  0.03


----North York, Bedford Park----
                  venue  freq
0    Italian Restaurant  0.14
1           Coffee Shop  0.09
2  Fast Food Restaurant  0.05
3         Grocery Store  0.05
4             Juice Bar  0.05


----North York, Flemingdon Park----
                   venue  freq
0             Beer Store  0.10
1       Asian Restaurant  0.10
2            Coffee Shop  0.10
3                    Gym  0.10
4  General Entertainment  0.05


----Northwood Park, York University----
                    venue  freq
0          Massage Studio  0.14
1             Coffee Shop  0.14
2  Furniture / Home Store  0.14
3    Caribbean Restaurant  0

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Breakfast Spot,Lounge,Skating Rink,Clothing Store,Drugstore,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
1,Bayview Village,Café,Japanese Restaurant,Bank,Chinese Restaurant,Women's Store,Doner Restaurant,Discount Store,Dive Bar,Dog Run,Donut Shop
2,Berczy Park,Coffee Shop,Cocktail Bar,Cheese Shop,Bakery,Beer Bar,Steakhouse,Farmers Market,Seafood Restaurant,Café,Italian Restaurant
3,"CFB, North York",Park,Airport,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Drugstore
4,"CN Tower, King and Spadina, South Niagara, Rai...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Coffee Shop,Plane,Sculpture Garden,Boutique,Bar,Boat or Ferry


I run the kmeans algorithm for clustering toronto neighborhoods. 

In [36]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

print(toronto_grouped_clustering.head())

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

   Yoga Studio  Accessories Store  Afghan Restaurant   Airport  \
0          0.0                0.0                0.0  0.000000   
1          0.0                0.0                0.0  0.000000   
2          0.0                0.0                0.0  0.000000   
3          0.0                0.0                0.0  0.500000   
4          0.0                0.0                0.0  0.058824   

   Airport Food Court  Airport Gate  Airport Lounge  Airport Service  \
0            0.000000      0.000000        0.000000         0.000000   
1            0.000000      0.000000        0.000000         0.000000   
2            0.000000      0.000000        0.000000         0.000000   
3            0.000000      0.000000        0.000000         0.000000   
4            0.058824      0.058824        0.117647         0.176471   

   Airport Terminal  American Restaurant      ...        Trail  Train Station  \
0          0.000000                  0.0      ...          0.0            0.0   
1       

array([3, 3, 3, 1, 3, 3, 3, 3, 3, 3])

In [37]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,4.0,Fast Food Restaurant,Drugstore,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,College Rec Center
1,M1C,Scarborough,"Rouge Hill, Highland Creek, Port Union",43.784535,-79.160497,0.0,Golf Course,Bar,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Women's Store
2,M1E,Scarborough,"Morningside, Scarborough, West Hill",43.763573,-79.188711,3.0,Electronics Store,Intersection,Rental Car Location,Pizza Place,Medical Center,Breakfast Spot,Mexican Restaurant,Dog Run,Dim Sum Restaurant,Diner
3,M1G,Scarborough,Woburn,43.770992,-79.216917,3.0,Coffee Shop,Korean Restaurant,Insurance Office,Drugstore,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
4,M1H,Scarborough,Scarborough,43.773136,-79.239476,3.0,Fast Food Restaurant,Chinese Restaurant,Hakka Restaurant,Sandwich Place,Bakery,Bank,Japanese Restaurant,Breakfast Spot,Bubble Tea Shop,Camera Store


I preant each of my clusters.

##### Cluster 0

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Rouge Hill, Highland Creek, Port Union",0.0,Golf Course,Bar,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Women's Store
91,"West Deane Park, Islington, Princess Gardens, ...",0.0,Golf Course,Bank,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Women's Store
93,Humber Summit,0.0,Pizza Place,Empanada Restaurant,Women's Store,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Drugstore


##### Cluster 1

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,"Milliken Ontario, Scarborough, Agincourt North",1.0,Park,Playground,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Drugstore
30,"CFB, North York",1.0,Park,Airport,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Drugstore
44,Lawrence Park,1.0,Park,Bus Line,Swim School,Falafel Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
50,Rosedale,1.0,Park,Playground,Building,Trail,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Electronics Store,Donut Shop,Deli / Bodega
87,"The Kingsway, Etobicoke",1.0,Pool,Park,River,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Deli / Bodega
95,Weston,1.0,Park,Convenience Store,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Dessert Shop,Drugstore


##### Cluster 2

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
88,"Fairmont Royal York Hotel, Sunnylea, Etobicoke...",2.0,Construction & Landscaping,Pool,Baseball Field,Women's Store,Drugstore,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
94,"Emery, Humberlea",2.0,Baseball Field,Paper / Office Supplies Store,Women's Store,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore


##### Cluster 3

In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Morningside, Scarborough, West Hill",3.0,Electronics Store,Intersection,Rental Car Location,Pizza Place,Medical Center,Breakfast Spot,Mexican Restaurant,Dog Run,Dim Sum Restaurant,Diner
3,Woburn,3.0,Coffee Shop,Korean Restaurant,Insurance Office,Drugstore,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
4,Scarborough,3.0,Fast Food Restaurant,Chinese Restaurant,Hakka Restaurant,Sandwich Place,Bakery,Bank,Japanese Restaurant,Breakfast Spot,Bubble Tea Shop,Camera Store
5,Scarborough Village,3.0,Pizza Place,Playground,Convenience Store,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run
6,"Kennedy Park, Scarborough, Ionview",3.0,Discount Store,Department Store,Coffee Shop,Convenience Store,Bus Station,Hobby Shop,Empanada Restaurant,Electronics Store,Ethiopian Restaurant,Dim Sum Restaurant
7,"Golden Mile, Oakridge, Clairlea",3.0,Bakery,Bus Line,Park,Intersection,Fast Food Restaurant,Metro Station,Soccer Field,Convenience Store,Cosmetics Shop,Ethiopian Restaurant
8,"Scarborough, Cliffcrest, Cliffside",3.0,Movie Theater,Motel,American Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
9,"Scarborough, Birch Cliff",3.0,College Stadium,Café,Skating Rink,General Entertainment,Doner Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop
10,"Dorset Park, Scarborough Town Centre, Wexford ...",3.0,Indian Restaurant,Chinese Restaurant,Pet Store,Vietnamese Restaurant,Light Rail Station,Latin American Restaurant,Women's Store,Diner,Discount Store,Dive Bar
11,"Wexford, Maryvale",3.0,Auto Garage,Sandwich Place,Smoke Shop,Middle Eastern Restaurant,Breakfast Spot,Bakery,Discount Store,Dive Bar,Dog Run,Doner Restaurant


##### Cluster 4

In [42]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Malvern, Rouge",4.0,Fast Food Restaurant,Drugstore,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,College Rec Center
25,Parkwoods,4.0,Park,Food & Drink Shop,Fast Food Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore
