---
# Segmenting and Clustering Neighborhoods in Toronto
---

Data sources:

In [1]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

Modules:

In [40]:
import pandas
import geocoder
import requests
import json
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import numpy as np

---




### Phase 1: Going Postal

Let's get the table of postal codes:

In [3]:
df = pandas.read_html(url)[0]
df.head(12)

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


All the rows where Borough is "Not assigned" should be removed:

In [4]:
df = df.drop(df[df["Borough"] == "Not assigned"].index).reset_index(drop=True)
df.head(12)

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,Malvern / Rouge
7,M3B,North York,Don Mills
8,M4B,East York,Parkview Hill / Woodbine Gardens
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [5]:
print("There are {} postal codes in this dataframe".format(df.shape[0])) 

There are 103 postal codes in this dataframe


---

### Part 2: The Coordinates

In [6]:
df['Latitude'] = df.index*0.0
df['Longitude'] = df.index*0.0
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,0.0,0.0
1,M4A,North York,Victoria Village,0.0,0.0
2,M5A,Downtown Toronto,Regent Park / Harbourfront,0.0,0.0
3,M6A,North York,Lawrence Manor / Lawrence Heights,0.0,0.0
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,0.0,0.0


(0.0, 0.0) is not a Toronto. It's somewhere in Atlantic near Ghana. All the coordinates should be set right.

In [7]:
g = geocoder.google('{}, Toronto, Ontario'.format("M5A"))
g

<[REQUEST_DENIED] Google - Geocode [empty]>

"Don't be evil"(c), right?
![hahaha classic](https://playmytime.com/u/user/ee521502a2808ddf60f8e220c830ea44.jpg)
Google, you know what? Screw you. And Geocoder too. I'm going to make a deal with the Dark Side. Maybe they have some cookies.

In [8]:
geo_url = "http://dev.virtualearth.net/REST/v1/Locations?CountryRegion=CA&postalCode={}&key={}"

with open("../bin_maps_dev.key") as f:
    key = f.readline().strip()

response = requests.get(geo_url.format("M5A",key)).content
response

b'{"authenticationResultCode":"ValidCredentials","brandLogoUri":"http:\\/\\/dev.virtualearth.net\\/Branding\\/logo_powered_by.png","copyright":"Copyright \xc2\xa9 2020 Microsoft and its suppliers. All rights reserved. This API cannot be accessed and the content and any results may not be used, reproduced or transmitted in any manner without express written permission from Microsoft Corporation.","resourceSets":[{"estimatedTotal":1,"resources":[{"__type":"Location:http:\\/\\/schemas.microsoft.com\\/search\\/local\\/ws\\/rest\\/v1","bbox":[43.632949829101563,-79.376541137695312,43.667598724365234,-79.321456909179688],"name":"M5A, ON","point":{"type":"Point","coordinates":[43.655220031738281,-79.361968994140625]},"address":{"adminDistrict":"ON","adminDistrict2":"Toronto","countryRegion":"Canada","formattedAddress":"M5A, ON","locality":"Toronto","postalCode":"M5A"},"confidence":"Medium","entityType":"Postcode2","geocodePoints":[{"type":"Point","coordinates":[43.655220031738281,-79.36196899

Wow, buckwheat! Let's parse:

In [9]:
json.loads(response)['resourceSets'][0]['resources'][0]['point']['coordinates']

[43.65522003173828, -79.36196899414062]

Heh. Here we go:

In [11]:
for i in range(0, df.shape[0]):
    response = requests.get(geo_url.format(df['Postal code'][i], key)).content
    lat, long = json.loads(response)['resourceSets'][0]['resources'][0]['point']['coordinates']
    df.loc[i, 'Latitude'], df.loc[i, 'Longitude'] = lat, long
df.head(12)

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.751255,-79.329895
1,M4A,North York,Victoria Village,43.729958,-79.314201
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65522,-79.361969
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.722801,-79.450691
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.664486,-79.393021
5,M9A,Etobicoke,Islington Avenue,43.662743,-79.528427
6,M1B,Scarborough,Malvern / Rouge,43.810154,-79.194603
7,M3B,North York,Don Mills,43.749134,-79.362007
8,M4B,East York,Parkview Hill / Woodbine Gardens,43.707577,-79.310913
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657467,-79.377708


Let's accept the Power of the Dark Side!

---

### Part 3: The Map

In [12]:
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.751255,-79.329895
1,M4A,North York,Victoria Village,43.729958,-79.314201
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65522,-79.361969
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.722801,-79.450691
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.664486,-79.393021


In [14]:
map_toronto = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=11)

for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto

What was it done for? It gives nothing. 
But the lab should be done. Toronto is already taken, so let's concentrate on the neighborhoods of Scarborough. 

What the hell is Scarborough? Where is it at all?

In [15]:
map_toronto = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=11)

for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    color = 'blue'
    if borough == 'Scarborough':
        color = 'red'
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto

Eastern part of Toronto. Ok.

In [16]:
df_s = df[df['Borough'] == 'Scarborough'].copy().reset_index(drop=True)
df_s

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.810154,-79.194603
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784672,-79.158958
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.766289,-79.17289
3,M1G,Scarborough,Woburn,43.768288,-79.214111
4,M1H,Scarborough,Cedarbrae,43.76918,-79.23877
5,M1J,Scarborough,Scarborough Village,43.743938,-79.231354
6,M1K,Scarborough,Kennedy Park / Ionview / East Birchmount Park,43.725803,-79.262848
7,M1L,Scarborough,Golden Mile / Clairlea / Oakridge,43.716656,-79.286537
8,M1M,Scarborough,Cliffside / Cliffcrest / Scarborough Village West,43.723576,-79.234451
9,M1N,Scarborough,Birch Cliff / Cliffside West,43.697884,-79.258759


Most of the rows contains more than 1 neighborhood. Let's split it:

In [17]:
df_s = df[df['Borough'] == 'Scarborough'].copy().reset_index(drop=True)


df_T = df_s[df_s['Neighborhood'].str.contains("/")].reset_index()
df_s = df_s.drop(df_s[df_s['Neighborhood'].str.contains("/")].index)
for i in range(0, df_T.shape[0]):
        n_list = df_T.loc[i, 'Neighborhood'].split("/")
        for n in range(0,len(n_list)):
            d_row = df_T[df_T.index == i].copy()
            d_row.loc[d_row.index[0], 'Neighborhood'] = n_list[n].strip()
            df_s = df_s.append(d_row, ignore_index = True)

df_s = df_s.reset_index()
df_s = df_s.drop(columns=['level_0','index'])
df_s

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M1G,Scarborough,Woburn,43.768288,-79.214111
1,M1H,Scarborough,Cedarbrae,43.76918,-79.23877
2,M1J,Scarborough,Scarborough Village,43.743938,-79.231354
3,M1S,Scarborough,Agincourt,43.797707,-79.267708
4,M1X,Scarborough,Upper Rouge,43.834435,-79.21891
5,M1B,Scarborough,Malvern,43.810154,-79.194603
6,M1B,Scarborough,Rouge,43.810154,-79.194603
7,M1C,Scarborough,Rouge Hill,43.784672,-79.158958
8,M1C,Scarborough,Port Union,43.784672,-79.158958
9,M1C,Scarborough,Highland Creek,43.784672,-79.158958


We have 38 neigborhoods here. Let's update their coordinates:

In [18]:
for i in range(0, df_s.shape[0]):
    link = '{}, {}, Toronto, Ontario'.format(df_s['Neighborhood'][i], df_s['Borough'][i])
    coord = geocoder.bing(link, key=key)
    df_s.loc[i, 'Latitude'], df_s.loc[i, 'Longitude'] = coord[0].lat, coord[0].lng
df_s.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M1G,Scarborough,Woburn,43.76496,-79.21669
1,M1H,Scarborough,Cedarbrae,43.74807,-79.23531
2,M1J,Scarborough,Scarborough Village,43.74808,-79.21022
3,M1S,Scarborough,Agincourt,43.78824,-79.28419
4,M1X,Scarborough,Upper Rouge,43.80932,-79.18762


In [19]:
map_scarborough = folium.Map(location=[df_s['Latitude'].mean(), df_s['Longitude'].mean()], zoom_start=12)

for lat, lng, borough, neighborhood in zip(df_s['Latitude'], df_s['Longitude'], df_s['Borough'], df_s['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scarborough)  

map_scarborough

Now we have the map with coordinates of Scarborough neigborhoods given by Bing. Time to explore some caffs.

---

#### Explore neighborhoods in Scarborough, as it was given in lab

In [20]:
CLIENT_ID = "" 
CLIENT_SECRET = "" 
with open("../foursquare.key", "r") as f:
    CLIENT_ID = f.readline().strip() 
    CLIENT_SECRET = f.readline().strip() 
VERSION = '20180605' # Foursquare API version

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pandas.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [26]:
scarborough_venues = getNearbyVenues(names=df_s['Neighborhood'],
                                   latitudes=df_s['Latitude'],
                                   longitudes=df_s['Longitude']
                                  )

Woburn
Cedarbrae
Scarborough Village
Agincourt
Upper Rouge
Malvern
Rouge
Rouge Hill
Port Union
Highland Creek
Guildwood
Morningside
West Hill
Kennedy Park
Ionview
East Birchmount Park
Golden Mile
Clairlea
Oakridge
Cliffside
Cliffcrest
Scarborough Village West
Birch Cliff
Cliffside West
Dorset Park
Wexford Heights
Scarborough Town Centre
Wexford
Maryvale
Clarks Corners
Tam O'Shanter
Sullivan
Milliken
Agincourt North
Steeles East
L'Amoreaux East
Steeles West
L'Amoreaux West


In [27]:
print(scarborough_venues.shape)
scarborough_venues.head()

(354, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Woburn,43.76496,-79.21669,Lucky Hakka,43.76247,-79.214164,Chinese Restaurant
1,Woburn,43.76496,-79.21669,Densgrove Park,43.765397,-79.22013,Park
2,Woburn,43.76496,-79.21669,Aunty Mary's,43.762566,-79.215571,Fast Food Restaurant
3,Woburn,43.76496,-79.21669,Giant Tiger,43.762002,-79.215523,Department Store
4,Woburn,43.76496,-79.21669,Skyland Food Mart,43.760991,-79.218092,Supermarket


In [28]:
scarborough_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,1,1,1,1,1,1
Agincourt North,1,1,1,1,1,1
Birch Cliff,4,4,4,4,4,4
Cedarbrae,3,3,3,3,3,3
Clairlea,4,4,4,4,4,4
Clarks Corners,4,4,4,4,4,4
Cliffcrest,6,6,6,6,6,6
Cliffside,7,7,7,7,7,7
Cliffside West,7,7,7,7,7,7
Dorset Park,10,10,10,10,10,10


In [29]:
print('There are {} uniques categories.'.format(len(scarborough_venues['Venue Category'].unique())))

There are 79 uniques categories.


In [31]:
# one hot encoding
scarborough_onehot = pandas.get_dummies(scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scarborough_onehot['Neighborhood'] = scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scarborough_onehot.columns[-1]] + list(scarborough_onehot.columns[:-1])
scarborough_onehot = scarborough_onehot[fixed_columns]

scarborough_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,...,Sushi Restaurant,Tea Room,Tennis Court,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,Woburn,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Woburn,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Woburn,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Woburn,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Woburn,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
scarborough_onehot.shape

(354, 80)

In [33]:
scarborough_grouped = scarborough_onehot.groupby('Neighborhood').mean().reset_index()
scarborough_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,...,Sushi Restaurant,Tea Room,Tennis Court,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Birch Cliff,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Cedarbrae,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Clairlea,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
5,Clarks Corners,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
6,Cliffcrest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Cliffside,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Cliffside West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Dorset Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0


In [34]:
scarborough_grouped.shape

(37, 80)

In [35]:
num_top_venues = 5

for hood in scarborough_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = scarborough_grouped[scarborough_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
         venue  freq
0  Coffee Shop   1.0
1          Gym   0.0
2     Pharmacy   0.0
3    Pet Store   0.0
4         Park   0.0


----Agincourt North----
         venue  freq
0  Coffee Shop   1.0
1          Gym   0.0
2     Pharmacy   0.0
3    Pet Store   0.0
4         Park   0.0


----Birch Cliff----
         venue  freq
0   Playground  0.25
1  Music Store  0.25
2         Café  0.25
3     Pharmacy  0.25
4        Hotel  0.00


----Cedarbrae----
                   venue  freq
0          Grocery Store  0.33
1                   Park  0.33
2             Playground  0.33
3  Vietnamese Restaurant  0.00
4       Video Game Store  0.00


----Clairlea----
                           venue  freq
0              Convenience Store  0.25
1  Vegetarian / Vegan Restaurant  0.25
2                 Ice Cream Shop  0.25
3                   Soccer Field  0.25
4              Accessories Store  0.00


----Clarks Corners----
                venue  freq
0         Pizza Place  0.25
1         Wings 

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [42]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pandas.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scarborough_grouped['Neighborhood']

for ind in np.arange(scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarborough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Coffee Shop,Women's Store,Discount Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Electronics Store
1,Agincourt North,Coffee Shop,Women's Store,Discount Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Electronics Store
2,Birch Cliff,Pharmacy,Playground,Café,Music Store,Flower Shop,Cosmetics Shop,Food Court,Food & Drink Shop,Clothing Store,Coffee Shop
3,Cedarbrae,Grocery Store,Playground,Park,Food & Drink Shop,Deli / Bodega,Food Court,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store
4,Clairlea,Soccer Field,Vegetarian / Vegan Restaurant,Convenience Store,Ice Cream Shop,Women's Store,Coffee Shop,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store


In [43]:
# set number of clusters
kclusters = 5

scarborough_grouped_clustering = scarborough_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(scarborough_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 0, 0, 1, 1, 1, 1, 1, 1])

In [44]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

scarborough_merged = df_s

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
scarborough_merged = scarborough_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
scarborough_merged = scarborough_merged.dropna(subset=["Cluster Labels"])
scarborough_merged['Cluster Labels'] = scarborough_merged['Cluster Labels'].astype(int)
scarborough_merged.head() # check the last columns!

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1G,Scarborough,Woburn,43.76496,-79.21669,1,Supermarket,Fast Food Restaurant,Sandwich Place,Department Store,Park,Chinese Restaurant,Field,Fish & Chips Shop,Electronics Store,Discount Store
1,M1H,Scarborough,Cedarbrae,43.74807,-79.23531,0,Grocery Store,Playground,Park,Food & Drink Shop,Deli / Bodega,Food Court,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store
2,M1J,Scarborough,Scarborough Village,43.74808,-79.21022,1,Pub,Bus Line,Chinese Restaurant,Women's Store,Dim Sum Restaurant,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
3,M1S,Scarborough,Agincourt,43.78824,-79.28419,3,Coffee Shop,Women's Store,Discount Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Electronics Store
4,M1X,Scarborough,Upper Rouge,43.80932,-79.18762,2,Home Service,Trail,Women's Store,Discount Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store


In [45]:
# create map
map_clusters = folium.Map(location=[df_s['Latitude'].mean(), df_s['Longitude'].mean()], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scarborough_merged['Latitude'], scarborough_merged['Longitude'], scarborough_merged['Neighborhood'], scarborough_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [46]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 0, scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Cedarbrae,0,Grocery Store,Playground,Park,Food & Drink Shop,Deli / Bodega,Food Court,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store
7,Rouge Hill,0,Playground,Construction & Landscaping,Women's Store,Chinese Restaurant,Coffee Shop,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant
15,East Birchmount Park,0,Café,Hookah Bar,Bus Station,Intersection,Women's Store,Dim Sum Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
22,Birch Cliff,0,Pharmacy,Playground,Café,Music Store,Flower Shop,Cosmetics Shop,Food Court,Food & Drink Shop,Clothing Store,Coffee Shop
25,Wexford Heights,0,Intersection,Coffee Shop,Playground,Gym / Fitness Center,Indian Restaurant,Dim Sum Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
27,Wexford,0,Intersection,Coffee Shop,Playground,Gym / Fitness Center,Indian Restaurant,Dim Sum Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega


In [47]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 1, scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Woburn,1,Supermarket,Fast Food Restaurant,Sandwich Place,Department Store,Park,Chinese Restaurant,Field,Fish & Chips Shop,Electronics Store,Discount Store
2,Scarborough Village,1,Pub,Bus Line,Chinese Restaurant,Women's Store,Dim Sum Restaurant,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega
5,Malvern,1,Coffee Shop,Grocery Store,Bistro,Gas Station,Bus Station,Restaurant,Pharmacy,Furniture / Home Store,American Restaurant,Food Court
8,Port Union,1,Pharmacy,Bank,Cosmetics Shop,Park,Supermarket,Food & Drink Shop,Fish & Chips Shop,Field,Fast Food Restaurant,Flower Shop
11,Morningside,1,Bus Stop,Park,Home Service,Dim Sum Restaurant,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store
12,West Hill,1,Pizza Place,Breakfast Spot,Supermarket,Discount Store,Liquor Store,Indian Restaurant,Food & Drink Shop,Bus Station,Pharmacy,Burger Joint
13,Kennedy Park,1,Indian Restaurant,Electronics Store,Light Rail Station,Restaurant,Pet Store,Chinese Restaurant,Vietnamese Restaurant,Cosmetics Shop,Clothing Store,Coffee Shop
14,Ionview,1,Coffee Shop,Pharmacy,Bank,Pizza Place,Sandwich Place,Intersection,Chinese Restaurant,Fast Food Restaurant,Electronics Store,Field
16,Golden Mile,1,Coffee Shop,Pizza Place,Bakery,Bank,Bar,Park,Discount Store,Women's Store,Department Store,Construction & Landscaping
17,Clairlea,1,Soccer Field,Vegetarian / Vegan Restaurant,Convenience Store,Ice Cream Shop,Women's Store,Coffee Shop,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store


In [48]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 2, scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Upper Rouge,2,Home Service,Trail,Women's Store,Discount Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store
6,Rouge,2,Home Service,Trail,Women's Store,Discount Store,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store


In [49]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 3, scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Agincourt,3,Coffee Shop,Women's Store,Discount Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Electronics Store
33,Agincourt North,3,Coffee Shop,Women's Store,Discount Store,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Electronics Store


In [50]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 4, scarborough_merged.columns[[2] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Highland Creek,4,Park,Women's Store,Greek Restaurant,Coffee Shop,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant


fuf. i understood nothing, really. 
it looks like clusters 2-4 are not clusters at all. kmeans suffered badly trying to split this data to 5 clusters.