## Part 1: Creating a data frame

Import libraries and install modules

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from geopy.geocoders import Nominatim 

import requests 
import json, lxml
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

from bs4 import BeautifulSoup

import warnings
warnings.filterwarnings('ignore')

try:
    import folium
except:
    !pip install folium
    import folium
    

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 3.6 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


Scrape data and transform it into a data frame

In [2]:
lt = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

In [3]:
df = lt[0]

Remove non-assigned boroughs and assign non-assigned neighbourhoods

In [4]:
df = df[df.Borough != 'Not assigned']

In [5]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [6]:
df.shape

(103, 3)

## Part 2: Get position data

In [17]:
!pip install geocoder
import geocoder




In [19]:
lats, lons = [], []
count = 0

for postal_code in df['Postal Code'].values:

     lat_lng_coords = None

     while(lat_lng_coords is None):
         g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
         lat_lng_coords = g.latlng
         lats.append(lat_lng_coords[0])
         lons.append(lat_lng_coords[1])

TypeError: 'NoneType' object is not subscriptable

In [20]:
!wget http://cocl.us/Geospatial_data
try:
    df['Latitude'] = lats
    df['Longitude'] = lons
except:
    latlon = pd.read_csv('Geospatial_data')
    df = pd.merge(df, latlon, how= 'inner', on = 'Postal Code')
    
print(df.shape)
df.head(10)

--2020-12-13 19:42:33--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 169.63.96.194, 169.63.96.176
Connecting to cocl.us (cocl.us)|169.63.96.194|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2020-12-13 19:42:34--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|169.63.96.194|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-12-13 19:42:35--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 185.235.236.197
Connecting to ibm.box.com (ibm.box.com)|185.235.236.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-12-13 19:42:35--  https://ibm.box.com/public/static/9afzr8

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## Part 3: Cluster neighbourhoods

Let's make a map of Toronto

In [28]:
map = folium.Map(location=[43.651070, -79.347015], zoom_start=11)

for lat, lng, borough, neighbourhood, postalCode in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood'], df['Postal Code']):
    label = '{}'.format(postalCode)
    label = folium.Popup(label, parse_html=True)

    folium.Circle(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.3)
    
map

FourSquare API connection

In [None]:
CLIENT_ID = 'id' 
CLIENT_SECRET = 'secret' 
VERSION = '20201213' 

Let's get the venues for the first neighbourhood on the dataframe

In [29]:
lat = df.loc[0, 'Latitude'] 
lon = df.loc[0, 'Longitude'] 

neighborhood_name = df.loc[0, 'Neighbourhood']
print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, lat, lon))

LIMIT = 100
radius =1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, CLIENT_SECRET, VERSION, lat,lon, radius, LIMIT)

results = requests.get(url).json()
results

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


{'meta': {'code': 200, 'requestId': '5fd672f01945276ef051de4f'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 28,
  'suggestedBounds': {'ne': {'lat': 43.762258609000014,
    'lng': -79.31721997969855},
   'sw': {'lat': 43.74425859099999, 'lng': -79.34209302030145}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b8991cbf964a520814232e3',
       'name': "Allwyn's Bakery",
       'location': {'address': '81 Underhill drive',
        'lat': 43.75984035203157,
        'lng': -79.32471879917513,
        'labeledLatLngs': [{'label': 'display'

In [31]:
venues = results['response']['groups'][0]['items']
venues_df = json_normalize(venues)
venues_df.head()

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.location.neighborhood,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.crossStreet,venue.venuePage.id
0,e-0-4b8991cbf964a520814232e3-0,0,"[{'summary': 'This spot is popular', 'type': '...",4b8991cbf964a520814232e3,Allwyn's Bakery,81 Underhill drive,43.75984,-79.324719,"[{'label': 'display', 'lat': 43.75984035203157...",833,...,Parkwoods - Donalda,Toronto,ON,Canada,"[81 Underhill drive, Toronto ON M3A 1Z5, Canada]","[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",0,[],,
1,e-0-57e286f2498e43d84d92d34a-1,0,"[{'summary': 'This spot is popular', 'type': '...",57e286f2498e43d84d92d34a,Tim Hortons,215 Brookbanks,43.760668,-79.326368,"[{'label': 'display', 'lat': 43.76066827030228...",866,...,,Toronto,ON,Canada,"[215 Brookbanks (York Miils Rd), Toronto ON M3...","[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",0,[],York Miils Rd,
2,e-0-4e8d9dcdd5fbbbb6b3003c7b-2,0,"[{'summary': 'This spot is popular', 'type': '...",4e8d9dcdd5fbbbb6b3003c7b,Brookbanks Park,Toronto,43.751976,-79.33214,"[{'label': 'display', 'lat': 43.75197604605557...",245,...,,Toronto,ON,Canada,"[Toronto, Toronto ON, Canada]","[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",0,[],,600917367.0
3,e-0-4bafa285f964a5203a123ce3-3,0,"[{'summary': 'This spot is popular', 'type': '...",4bafa285f964a5203a123ce3,Bruno's valu-mart,83 Underhill,43.746143,-79.32463,"[{'label': 'display', 'lat': 43.746143, 'lng':...",889,...,,Don Mills,ON,Canada,"[83 Underhill (at Donwood Plaza), Don Mills ON...","[{'id': '4bf58dd8d48988d118951735', 'name': 'G...",0,[],at Donwood Plaza,
4,e-0-4c422e48e26920a1a4ad5fe7-4,0,"[{'summary': 'This spot is popular', 'type': '...",4c422e48e26920a1a4ad5fe7,Shoppers Drug Mart,1277 York Mills Rd,43.760857,-79.324961,"[{'label': 'display', 'lat': 43.76085733239677...",926,...,,Toronto,ON,Canada,[1277 York Mills Rd (At Parkwoods Village Driv...,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",0,[],At Parkwoods Village Drive,


Let's drop superfluous data from the venues data and rename columns

In [40]:
venues_df = venues_df.loc[:, ['venue.name', 'venue.category', 'venue.location.lat', 'venue.location.lng']]
venues_df = venues_df.rename(columns={'venue.name': 'Venue', 'venue.location.lat': 'Venue Latitude', 'venue.location.lng': 'Venue Longtitude'})
venues_df.head()

KeyError: "None of [Index(['venue.name', 'venue.location.lat', 'venue.location.lng'], dtype='object')] are in the [columns]"

In [41]:
venues_df.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longtitude
0,Allwyn's Bakery,43.75984,-79.324719
1,Tim Hortons,43.760668,-79.326368
2,Brookbanks Park,43.751976,-79.33214
3,Bruno's valu-mart,43.746143,-79.32463
4,Shoppers Drug Mart,43.760857,-79.324961


Let's get venue data for all the neighbourhoods

In [42]:
def get_near_by_venues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'\
        .format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(name, lat, lng, 
                             v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'],
                             v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue in venues_list for item in venue])
    nearby_venues.columns = ['Neighbourhood','Neighbourhood Latitude', 'Neighbourhood Longitude', 
                             'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return nearby_venues

Toronto_venues = get_near_by_venues(names=df['Neighbourhood'],latitudes=df['Latitude'],longitudes=df['Longitude'])

In [53]:
Toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


Now we have a dataframe for all the neighbourhoods and the venues in each one. Now, let's make a dataframe with the most frequent venues for each neighbourhood for clustering. 

In [68]:
category_dummies = pd.get_dummies(Toronto_venues[['Venue Category']], prefix= "", prefix_sep= " ")
category_dummies['Neighbourhood'] = Toronto_venues['Neighbourhood'] 
fixed_columns = [category_dummies.columns[-1]] + list(category_dummies.columns[:-1])
category_dummies = category_dummies[fixed_columns]
category_dummies.shape
category_dummies.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [96]:
Toronto_neigh = category_dummies.groupby('Neighbourhood').mean().reset_index()
CONST_dfColumns = ['Postal Code', 'Borough', 'Neighborhood']
Toronto_neigh.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's define the 3 most frequent types of venue for each neighbourhood and turn them into a dataframa

In [71]:
top_venues = 3

for hood in Toronto_neigh['Neighbourhood']:
    print("----"+hood+"----")
    temp = Toronto_neigh[Toronto_neigh['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = round(temp['freq'].astype(float),2)
    temp = temp.sort_values('freq', ascending=False).reset_index(drop=True)
    print(temp.head(top_venues))
    print('\n')

----Agincourt----
             venue  freq
0   Clothing Store   0.2
1   Breakfast Spot   0.2
2           Lounge   0.2


----Alderwood, Long Branch----
             venue  freq
0      Pizza Place  0.29
1              Gym  0.14
2   Sandwich Place  0.14


----Bathurst Manor, Wilson Heights, Downsview North----
                venue  freq
0                Bank  0.10
1         Coffee Shop  0.10
2   Mobile Phone Shop  0.05


----Bayview Village----
                  venue  freq
0   Japanese Restaurant  0.25
1    Chinese Restaurant  0.25
2                  Bank  0.25


----Bedford Park, Lawrence Manor East----
                 venue  freq
0          Coffee Shop  0.09
1       Sandwich Place  0.09
2   Italian Restaurant  0.09


----Berczy Park----
             venue  freq
0      Coffee Shop  0.09
1         Beer Bar  0.04
2   Farmers Market  0.04


----Birch Cliff, Cliffside West----
                    venue  freq
0   General Entertainment  0.25
1         College Stadium  0.25
2                

          venue  freq
0   Coffee Shop  0.24
1   Yoga Studio  0.03
2           Bar  0.03


----Regent Park, Harbourfront----
          venue  freq
0   Coffee Shop  0.18
1           Pub  0.07
2        Bakery  0.07


----Richmond, Adelaide, King----
          venue  freq
0   Coffee Shop  0.09
1          Café  0.05
2    Restaurant  0.04


----Rosedale----
         venue  freq
0         Park  0.50
1        Trail  0.25
2   Playground  0.25


----Roselawn----
                venue  freq
0         Music Venue   0.5
1              Garden   0.5
2   Accessories Store   0.0


----Rouge Hill, Port Union, Highland Creek----
                         venue  freq
0   Construction & Landscaping   0.5
1                          Bar   0.5
2            Accessories Store   0.0


----Runnymede, Swansea----
               venue  freq
0               Café  0.09
1        Coffee Shop  0.09
2   Sushi Restaurant  0.06


----Runnymede, The Junction North----
                venue  freq
0            Bus Line  0.25
1

In [97]:
def return_most_common_venues(row, top_venues):
    row_categories = row.iloc[len(CONST_dfColumns):]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:top_venues]

In [106]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)

venues_sorted['Neighbourhood'] = Toronto_neigh['Neighbourhood']

for ind in np.arange(Toronto_neigh.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_neigh.iloc[ind, :], top_venues)

venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Agincourt,Breakfast Spot,Clothing Store,Lounge
1,"Alderwood, Long Branch",Pizza Place,Sandwich Place,Coffee Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Frozen Yogurt Shop
3,Bayview Village,Japanese Restaurant,Chinese Restaurant,Bank
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Sandwich Place,Italian Restaurant


In [113]:
df2.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,Park,Food & Drink Shop,Yoga Studio
1,M4A,North York,Victoria Village,43.725882,-79.315572,Pizza Place,Hockey Arena,Intersection
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Coffee Shop,Pub,Bakery
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Clothing Store,Women's Store,Coffee Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Coffee Shop,Yoga Studio,Bank


Now let's cluster the neighbourhoods

In [132]:
k = 5

Toronto_k = Toronto_neigh.drop('Neighbourhood', 1)

kmeans = KMeans(n_clusters=k, random_state=0).fit(Toronto_k)

kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [117]:
Toronto_neigh.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [135]:
venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

ValueError: cannot insert Cluster Labels, already exists

In [127]:
df2.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,Park,Food & Drink Shop,Yoga Studio
1,M4A,North York,Victoria Village,43.725882,-79.315572,Pizza Place,Hockey Arena,Intersection
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Coffee Shop,Pub,Bakery
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Clothing Store,Women's Store,Coffee Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Coffee Shop,Yoga Studio,Bank


In [141]:
df2 = df.join(venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
df2.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Yoga Studio
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Pizza Place,Hockey Arena,Intersection
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Pub,Bakery
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Women's Store,Coffee Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Yoga Studio,Bank


In [150]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, postalCode, borough, neighborhood, cluster in zip(df2['Latitude'], df2['Longitude'], df2['Postal Code'], df2['Borough'], df2['Neighbourhood'], df2['Cluster Labels']):
    label = folium.Popup(str(postalCode) + ' - Cluster ' + str(cluster), parse_html=True)
    cluster = int(cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Let's look into the clusters we created

In [157]:
df2[df2['Cluster Labels'] == 0].head(15)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Yoga Studio
21,M6E,York,Caledonia-Fairbanks,43.689026,-79.453512,0.0,Park,Women's Store,Pool
35,M4J,East York,"East Toronto, Broadview North (Old East York)",43.685347,-79.338106,0.0,Intersection,Park,Convenience Store
49,M6L,North York,"North Park, Maple Leaf Park, Upwood Park",43.713756,-79.490074,0.0,Construction & Landscaping,Park,Bakery
61,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0.0,Park,Swim School,Bus Line
64,M9N,York,Weston,43.706876,-79.518188,0.0,Park,Yoga Studio,Electronics Store
66,M2P,North York,York Mills West,43.752758,-79.400049,0.0,Park,Convenience Store,Yoga Studio
85,M1V,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577,0.0,Intersection,Playground,Bakery
91,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,0.0,Park,Playground,Trail


In [158]:
df2[df2['Cluster Labels'] == 1].head(15)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Pizza Place,Hockey Arena,Intersection
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Pub,Bakery
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Women's Store,Coffee Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Yoga Studio,Bank
7,M3B,North York,Don Mills,43.745906,-79.352188,1.0,Gym,Beer Store,Japanese Restaurant
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,1.0,Pizza Place,Athletics & Sports,Pharmacy
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1.0,Clothing Store,Coffee Shop,Café
10,M6B,North York,Glencairn,43.709577,-79.445073,1.0,Pizza Place,Pub,Japanese Restaurant
12,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,1.0,Construction & Landscaping,Bar,Yoga Studio
13,M3C,North York,Don Mills,43.7259,-79.340923,1.0,Gym,Beer Store,Japanese Restaurant


In [154]:
df2[df2['Cluster Labels'] == 2].head(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
45,M2L,North York,"York Mills, Silver Hills",43.75749,-79.374714,2.0,Martial Arts School,Yoga Studio,Electronics Store


In [155]:
df2[df2['Cluster Labels'] == 3].head(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
11,M9B,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.650943,-79.554724,3.0,Print Shop,Electronics Store,Dog Run


In [156]:
df2[df2['Cluster Labels'] == 4].head(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,4.0,Fast Food Restaurant,Yoga Studio,Discount Store


## Final remarks

Looking at the above map, we could immediately see certain homogeneity over the map - which is likely an indication of excessive amount of clusters being used for the analysis. I used k-value 5, but there seem to exist only two meaningful clusters - 0.0 and 1.0. Clusters 2.0, 3.0 and 4.0 consist of one neighbourhood each, providing little to no real insight. 

In [None]:
Broadly speaking, the model divides Toronto into two zones - a relatively continuous but not closed zone of neighbourhoods labeled 0.0 which represents the interior, and "the rest of neighbourhoods" labeled 1.0 which surround the previous zone on all sides. 



In [None]:
The central zone 0.0 could be described as leisurely zones in the heart of the town. It is dominated by green spaces, a park being the most common venue in the neighbourhoods. There are also stores, but they seem to be relatively sparse. Most likely the label 0.0 stands for the major parks and neighbourhoods built around them. Considering the lack of everyday businesses it could be conjured that these zones are exclusive and preferred by certain brands. Yoga studios seem popular as well, for instance. 

In [None]:
The surrounding zone 1.0 represents mostly commercial and active zones. Most frequent venues seem to be restaurants and shops. These neighbourhoods likely represent the active and bustlin city center of Toronto, as opposed to the calmer and more spacious zone 0.0.