## Battle of Neighborhoods - Toronto
Author: Shwetal More

#### Installing BeautifulSoup

In [1]:
! pip install BeautifulSoup4

Collecting BeautifulSoup4
  Downloading beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
[K     |████████████████████████████████| 115 kB 1.3 MB/s 
[?25hCollecting soupsieve>1.2
  Downloading soupsieve-2.2.1-py3-none-any.whl (33 kB)
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.3 soupsieve-2.2.1


## Importing Data

In [2]:
import pandas as pd
import numpy as np
import requests

from bs4 import BeautifulSoup


source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

soup = BeautifulSoup(source, 'html5lib')

postal_codes_dict = {} # initialize an empty dictionary to save the data in
for table_cell in soup.find_all('td'):
    try:
        postal_code = table_cell.p.b.text # get the postal code
        postal_code_investigate = table_cell.span.text
        neighborhoods_data = table_cell.span.text # get the rest of the data in the cell
        borough = neighborhoods_data.split('(')[0] # get the borough in the cell
        
        # if the cell is not assigned then ignore it
        if neighborhoods_data == 'Not assigned':
            neighborhoods = []
        # else process the data and add it to the dictionary
        else:
            postal_codes_dict[postal_code] = {}
            try:
                neighborhoods = neighborhoods_data.split('(')[1]
            
                # remove parantheses from neighborhoods string
                neighborhoods = neighborhoods.replace('(', ' ')
                neighborhoods = neighborhoods.replace(')', ' ')

                neighborhoods_names = neighborhoods.split('/')
                neighborhoods_clean = ', '.join([name.strip() for name in neighborhoods_names])
            except:
                borough = borough.strip('\n')
                neighborhoods_clean = borough
 
            # add borough and neighborhood to dictionary
            postal_codes_dict[postal_code]['borough'] = borough
            postal_codes_dict[postal_code]['neighborhoods'] = neighborhoods_clean
    except:
        pass
    
# create an empty dataframe
columns = ['PostalCode', 'Borough', 'Neighborhood']
toronto_data = pd.DataFrame(columns=columns)
toronto_data

# populate dataframe with data from dictionary
for ind, postal_code in enumerate(postal_codes_dict):
    borough = postal_codes_dict[postal_code]['borough']
    neighborhood = postal_codes_dict[postal_code]['neighborhoods']
    toronto_data = toronto_data.append({"PostalCode": postal_code, 
                                        "Borough": borough, 
                                        "Neighborhood": neighborhood},
                                        ignore_index=True)

# print number of rows of dataframe
toronto_data.shape[0]

103

In [3]:
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


#### Installing geocoder

In [4]:
! pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 996 kB/s 
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [5]:
import geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [6]:
#get latitude and longitude using geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.arcgis('{}, Toronto, Ontario'.format('Postal Code'))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

print(latitude,longitude )

43.648690000000045 -79.38543999999996


#### Importing CSV File

In [7]:
#read geospatial data file
geotable = pd.read_csv('../input/task1234/Geospatial_Coordinates.csv')
geotable.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
print("The shape of our wiki data is: ", toronto_data.shape)
print("the shape of our csv data is: ", geotable.shape)

The shape of our wiki data is:  (103, 3)
the shape of our csv data is:  (103, 3)


Since the dimensions are the same, we can try to join on the postal codes to get the required data.

Checking the column types of both the dataframes, especially Postal Code column since we are trying to join on it

In [9]:
toronto_data.dtypes

PostalCode      object
Borough         object
Neighborhood    object
dtype: object

In [10]:
geotable.dtypes

Postal Code     object
Latitude       float64
Longitude      float64
dtype: object

In [11]:
combined_data = toronto_data.join(geotable, how='inner')
combined_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M1B,43.806686,-79.194353
1,M4A,North York,Victoria Village,M1C,43.784535,-79.160497
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",M1E,43.763573,-79.188711
3,M6A,North York,"Lawrence Manor, Lawrence Heights",M1G,43.770992,-79.216917
4,M7A,Queen's Park,Ontario Provincial Government,M1H,43.773136,-79.239476
...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",M9N,43.706876,-79.518188
99,M4Y,Downtown Toronto,Church and Wellesley,M9P,43.696319,-79.532242
100,M7Y,East TorontoBusiness reply mail Processing Cen...,Enclave of M4L,M9R,43.688905,-79.554724
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",M9V,43.739416,-79.588437


In [12]:
combined_data.shape

(103, 6)

**Solution**: We get 103 rows as expected when we do a inner join, so we have good data.

Drawing inspiration from the previous lab where we cluster the neighbourhood of NYC, We cluster Toronto based on the similarities of the venues categories using Kmeans clustering and Foursquare API.

In [13]:
from geopy.geocoders import Nominatim

In [14]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The coordinates of Toronto are 43.6534817, -79.3839347.


Let's visualize the map of Toronto

In [15]:
import folium

In [16]:
# Creating the map of Toronto
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# adding markers to map
for latitude, longitude, borough, neighborhood in zip(combined_data['Latitude'], combined_data['Longitude'], combined_data['Borough'], combined_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_Toronto)  
    
map_Toronto

Initializing Foursquare API credentials

In [17]:
CLIENT_ID = 'N2YKCTIS1FDUQF3KBZRP3BHTAVIGIFGR3YJHQBNS134RMLIE' 
CLIENT_SECRET = 'HWOYKAHITQEYLDAIWXHGAUXIYHLXU15ZDMXHOCY3JYCPJBGZ'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: N2YKCTIS1FDUQF3KBZRP3BHTAVIGIFGR3YJHQBNS134RMLIE
CLIENT_SECRET:HWOYKAHITQEYLDAIWXHGAUXIYHLXU15ZDMXHOCY3JYCPJBGZ


Next, we create a function to get all the venue categories in Toronto

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Collecting the venues in Toronto for each Neighbourhood

In [19]:
venues_in_toronto = getNearbyVenues(combined_data['Neighborhood'], combined_data['Latitude'], combined_data['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth

In [20]:
venues_in_toronto.shape

(1320, 5)

So we have 1320 records and 5 columns. Checking sample data

In [21]:
venues_in_toronto.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.806686,-79.194353,Wendy’s,Fast Food Restaurant
1,Victoria Village,43.784535,-79.160497,Royal Canadian Legion,Bar
2,"Regent Park, Harbourfront",43.763573,-79.188711,RBC Royal Bank,Bank
3,"Regent Park, Harbourfront",43.763573,-79.188711,G & G Electronics,Electronics Store
4,"Regent Park, Harbourfront",43.763573,-79.188711,Sail Sushi,Restaurant


Checking the Venues based on Neighbourhood

In [22]:
venues_in_toronto.groupby('Neighbourhood').head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,Parkwoods,43.806686,-79.194353,Wendy’s,Fast Food Restaurant
1,Victoria Village,43.784535,-79.160497,Royal Canadian Legion,Bar
2,"Regent Park, Harbourfront",43.763573,-79.188711,RBC Royal Bank,Bank
3,"Regent Park, Harbourfront",43.763573,-79.188711,G & G Electronics,Electronics Store
4,"Regent Park, Harbourfront",43.763573,-79.188711,Sail Sushi,Restaurant
...,...,...,...,...,...
1315,"Mimico NW, The Queensway West, South of Bloor,...",43.706748,-79.594054,Economy Rent A Car,Rental Car Location
1316,"Mimico NW, The Queensway West, South of Bloor,...",43.706748,-79.594054,Logistics Distribution,Bar
1317,"Mimico NW, The Queensway West, South of Bloor,...",43.706748,-79.594054,Saand Rexdale,Drugstore
1318,"Mimico NW, The Queensway West, South of Bloor,...",43.706748,-79.594054,PC Garden,Garden Center


So there are 405 records for each neighbourhood.

Checking for the maximum venue categories

In [23]:
venues_in_toronto.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,"Wexford, Maryvale",43.718518,-79.464763,Ardene Shoes Outlet
Airport,"Richmond, Adelaide, King",43.737473,-79.394420,Toronto Downsview Airport (YZD)
Airport Food Court,Forest Hill North & West,43.628947,-79.394420,Billy Bishop Café
Airport Gate,Forest Hill North & West,43.628947,-79.394420,Gate 8
Airport Lounge,Forest Hill North & West,43.628947,-79.394420,Porter Lounge
...,...,...,...,...
Warehouse Store,Bayview Village,43.705369,-79.349372,Costco
Wine Bar,"Kingsview Village, St. Phillips, Martin Grove ...",43.653206,-79.400049,Paris Paris Bar
Wings Joint,"Willowdale, Newtonbrook",43.665860,-79.383160,Wingporium
Women's Store,"Wexford, Maryvale",43.718518,-79.453512,Maximum Woman


There are around 227 different types of Venue Categories. Interesting!

## One Hot encoding the venue Categories

In [24]:
toronto_venue_cat = pd.get_dummies(venues_in_toronto[['Venue Category']], prefix="", prefix_sep="")
toronto_venue_cat

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1315,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1316,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1317,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1318,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Adding the neighbourhood to the encoded dataframe

In [25]:
toronto_venue_cat['Neighbourhood'] = venues_in_toronto['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [toronto_venue_cat.columns[-1]] + list(toronto_venue_cat.columns[:-1])
toronto_venue_cat = toronto_venue_cat[fixed_columns]

toronto_venue_cat.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We will group the Neighbourhoods, calculate the mean venue categories in each Neighbourhood

In [26]:
toronto_grouped = toronto_venue_cat.groupby('Neighbourhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04
1,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455
3,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,...,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's make a function to get the top most common venue categories

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
import numpy as np

There are way too many venue categories, we can take the top 10 to cluster the neighbourhoods

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Café,Breakfast Spot,Coffee Shop,Gym,Bakery,Burrito Place,Climbing Gym,Convenience Store,Furniture / Home Store,Grocery Store
1,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Bridal Shop,Sandwich Place,Frozen Yogurt Shop,Restaurant,Supermarket,Diner,Gas Station,Mobile Phone Shop
2,Bayview Village,Indian Restaurant,Gym,Bank,Bus Line,Coffee Shop,Discount Store,Fast Food Restaurant,Gas Station,Grocery Store,Gym / Fitness Center
3,"Bedford Park, Lawrence Manor East",Café,Gastropub,Farmers Market,Coffee Shop,Gym,Jazz Club,Food Truck,Creperie,Restaurant,Cocktail Bar
4,Berczy Park,Martial Arts School,Accessories Store,Plane,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop


Let's make the model to cluster our Neighbourhoods

In [30]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [31]:
# set number of clusters
k_num_clusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(toronto_grouped_clustering)
kmeans

KMeans(n_clusters=5, random_state=0)

Checking the labelling of our model

In [32]:
kmeans.labels_[0:100]

array([3, 0, 0, 3, 0, 3, 3, 0, 3, 0, 4, 0, 0, 3, 0, 0, 3, 3, 0, 0, 3, 3,
       4, 3, 3, 0, 0, 0, 0, 0, 0, 1, 0, 3, 0, 0, 3, 0, 3, 3, 4, 3, 3, 0,
       3, 3, 0, 3, 3, 4, 0, 3, 3, 0, 0, 3, 3, 0, 0, 0, 0, 3, 0, 2, 0, 4,
       1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 4, 3, 0, 0, 4, 3, 0, 0, 3, 0,
       0, 3, 0, 3, 0, 0, 0, 3, 4, 3, 0], dtype=int32)

In [33]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Join toronto_grouped with combined_data on neighbourhood to add latitude & longitude for each neighborhood to prepare it for plotting

In [34]:
toronto_merged = combined_data

toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,M1B,43.806686,-79.194353,2.0,Fast Food Restaurant,Lounge,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
1,M4A,North York,Victoria Village,M1C,43.784535,-79.160497,0.0,Bar,Accessories Store,Motel,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",M1E,43.763573,-79.188711,0.0,Donut Shop,Rental Car Location,Breakfast Spot,Medical Center,Mexican Restaurant,Electronics Store,Intersection,Restaurant,Bank,Modern European Restaurant
3,M6A,North York,"Lawrence Manor, Lawrence Heights",M1G,43.770992,-79.216917,3.0,Coffee Shop,Korean BBQ Restaurant,Accessories Store,Movie Theater,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
4,M7A,Queen's Park,Ontario Provincial Government,M1H,43.773136,-79.239476,0.0,Thai Restaurant,Fried Chicken Joint,Bank,Gas Station,Athletics & Sports,Caribbean Restaurant,Bakery,Hakka Restaurant,Middle Eastern Restaurant,Miscellaneous Shop


Drop all the NaN values to prevent data skew

In [35]:
toronto_merged_nonan = toronto_merged.dropna(subset=['Cluster Labels'])

Plotting the clusters on the map

In [36]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [37]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_nonan['Latitude'], toronto_merged_nonan['Longitude'], toronto_merged_nonan['Neighborhood'], toronto_merged_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

Let's verify each of our clusters

Cluster 1

In [38]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 0, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,-79.160497,0.0,Bar,Accessories Store,Motel,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
2,Downtown Toronto,-79.188711,0.0,Donut Shop,Rental Car Location,Breakfast Spot,Medical Center,Mexican Restaurant,Electronics Store,Intersection,Restaurant,Bank,Modern European Restaurant
4,Queen's Park,-79.239476,0.0,Thai Restaurant,Fried Chicken Joint,Bank,Gas Station,Athletics & Sports,Caribbean Restaurant,Bakery,Hakka Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
5,Etobicoke,-79.239476,0.0,Playground,Jewelry Store,Market,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
7,North York,-79.284577,0.0,Bus Line,Bakery,Intersection,Park,Ice Cream Shop,Bus Station,Accessories Store,Monument / Landmark,Modern European Restaurant,Mobile Phone Shop
8,East York,-79.239476,0.0,American Restaurant,Motel,Intersection,Accessories Store,Movie Theater,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant
10,North York,-79.273304,0.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Accessories Store,Miscellaneous Shop,Monument / Landmark,Modern European Restaurant,Mobile Phone Shop,Mexican Restaurant
11,Etobicoke,-79.295849,0.0,Auto Garage,Middle Eastern Restaurant,Bakery,Sandwich Place,Market,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant
12,Scarborough,-79.262029,0.0,Lounge,Breakfast Spot,Latin American Restaurant,Clothing Store,Accessories Store,Movie Theater,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
13,North York,-79.304302,0.0,Pizza Place,Thai Restaurant,Intersection,Fried Chicken Joint,Bank,Chinese Restaurant,Italian Restaurant,Gas Station,Fast Food Restaurant,Convenience Store


Cluster 2

In [39]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 1, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Downtown Toronto,-79.498509,1.0,Pool,Baseball Field,Accessories Store,Movie Theater,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
97,Downtown Toronto,-79.532242,1.0,Baseball Field,Accessories Store,Motel,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant


Cluster 3

In [40]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 2, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,-79.194353,2.0,Fast Food Restaurant,Lounge,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop


Cluster 4

In [41]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 3, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,North York,-79.216917,3.0,Coffee Shop,Korean BBQ Restaurant,Accessories Store,Movie Theater,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
6,Scarborough,-79.262029,3.0,Hobby Shop,Coffee Shop,Chinese Restaurant,Discount Store,Department Store,Accessories Store,Museum,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
9,Downtown Toronto,-79.264848,3.0,College Stadium,Skating Rink,General Entertainment,Café,Movie Theater,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
18,Scarborough,-79.346556,3.0,Clothing Store,Coffee Shop,Juice Bar,Restaurant,Bank,Theater,Shopping Mall,Food Court,Salon / Barbershop,Electronics Store
19,East Toronto,-79.385975,3.0,Japanese Restaurant,Bank,Chinese Restaurant,Café,Accessories Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
22,Scarborough,-79.408493,3.0,Ramen Restaurant,Coffee Shop,Pizza Place,Café,Pet Store,Bubble Tea Shop,Shopping Mall,Sandwich Place,Electronics Store,Lounge
26,Scarborough,-79.352188,3.0,Gym,Athletics & Sports,Café,Japanese Restaurant,Caribbean Restaurant,Performing Arts Venue,Movie Theater,Mediterranean Restaurant,Men's Store,Mexican Restaurant
27,North York,-79.340923,3.0,Gym,Restaurant,Coffee Shop,Discount Store,Sporting Goods Shop,Bike Shop,Supermarket,Beer Store,Sandwich Place,Chinese Restaurant
29,East York,-79.487262,3.0,Furniture / Home Store,Falafel Restaurant,Coffee Shop,Miscellaneous Shop,Bar,Caribbean Restaurant,Massage Studio,Accessories Store,Mediterranean Restaurant,Men's Store
38,Scarborough,-79.363452,3.0,Coffee Shop,Sporting Goods Shop,Furniture / Home Store,Bank,Burger Joint,Breakfast Spot,Sushi Restaurant,Supermarket,Beer Store,Liquor Store


Cluster 5

In [42]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 4, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,East York,-79.284577,4.0,Playground,Intersection,Park,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop
23,East York,-79.400049,4.0,Park,Convenience Store,Accessories Store,Movie Theater,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
25,Downtown Toronto,-79.329656,4.0,Park,Food & Drink Shop,Accessories Store,Motel,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant
30,Downtown Toronto,-79.464763,4.0,Airport,Park,Motel,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
40,North York,-79.338106,4.0,Park,Convenience Store,Accessories Store,Movie Theater,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant
50,North York,-79.377529,4.0,Park,Playground,Trail,Motel,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant
74,Central Toronto,-79.453512,4.0,Park,Women's Store,Pool,Accessories Store,Motel,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store
98,Etobicoke,-79.518188,4.0,Park,Convenience Store,Jewelry Store,Accessories Store,Motel,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant


We have successfully cluster Toronto neighbourhood based on venue categories!