# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

## Question 1

Let's import all relevant packages...



In [45]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

Now let's search for an url to extract the data from. Please NOTE, the newest version of the WikiPedia Page was probably not the most easy one
to extract data from, that's why I chose for an older version, which provides me with the same data, but in better format.

In [47]:
url = "https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969"
toronto = pd.read_html(requests.get(url).text)
toronto

[    Postal Code           Borough  \
 0           M1A      Not assigned   
 1           M2A      Not assigned   
 2           M3A        North York   
 3           M4A        North York   
 4           M5A  Downtown Toronto   
 ..          ...               ...   
 175         M5Z      Not assigned   
 176         M6Z      Not assigned   
 177         M7Z      Not assigned   
 178         M8Z         Etobicoke   
 179         M9Z      Not assigned   
 
                                          Neighbourhood  
 0                                         Not assigned  
 1                                         Not assigned  
 2                                            Parkwoods  
 3                                     Victoria Village  
 4                            Regent Park, Harbourfront  
 ..                                                 ...  
 175                                       Not assigned  
 176                                       Not assigned  
 177                

As we only need the first table, let's make sure we only extract that one...

In [48]:
toronto = toronto[0]
toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


As we have to get rid of the Not Assigned boroughs, let's drop them right away! Also, to have a clear overview, I will group by Postal Code.

In [49]:
toronto = toronto[toronto['Borough'] != 'Not assigned']
toronto = toronto.groupby(['Postal Code']).head()
toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


To make sure we're ready to go, let's check if there is still a Not Assigned value in the Neighborhood Column...

In [50]:
toronto.Neighbourhood.str.count('Not assigned').sum()

0

It seems like we have no more work to do for the Not Assigned values, then let's finish with the shape of the table!

In [51]:
toronto.shape

(103, 3)

### The answer to question 1 is (103, 3)

## Question 2

Let's first install and import geocoder...

In [52]:
!pip install geocoder

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


Now let's use geocoder to find latitude and longitude for postal codes in Toronto

In [14]:
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None
postal_code = '###'

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

KeyboardInterrupt: 

The above code was running for such a long time without any response, I decided to stop it and go forward with the csv

In [53]:
# Let's start with reading the csv provided by Courseran and check data types for merging...

Data_Coursera = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')
Data_Coursera.dtypes

Postal Code     object
Latitude       float64
Longitude      float64
dtype: object

Great, we're all set, let's merge both tables on the Postal Code, as this value is in both tables!

In [54]:
toronto_complete = toronto.join(Data_Coursera.set_index('Postal Code'), on = 'Postal Code', how = 'inner', sort = 'False')
toronto_complete

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
9,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
18,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
27,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
36,M1G,Scarborough,Woburn,43.770992,-79.216917
45,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
107,M9P,Etobicoke,Westmount,43.696319,-79.532242
116,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
143,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437


I always like to group my data, therefore I again group the table on Postal Code...

In [55]:
toronto_complete = toronto_complete.groupby(['Postal Code']).head()
toronto_complete

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
9,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
18,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
27,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
36,M1G,Scarborough,Woburn,43.770992,-79.216917
45,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
107,M9P,Etobicoke,Westmount,43.696319,-79.532242
116,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
143,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437


## Question 3

#### With the knowledge from the lab in which the neighbourhood of NYC was clustered, I will cluster Toronto based on the similarities of the venues categories using Kmeans and Foursquare

Let's first install folium...

In [56]:
!pip install folium

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


Secondly, let's make sure to import Nominatim and folium...

In [57]:
import folium
from geopy.geocoders import Nominatim

Now we need to know the latitude and longitude from Toronto, hence determined below...

In [58]:
Address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent='toronto_explorer')
location = geolocator.geocode(Address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The coordinates of Toronto are 43.6534817, -79.3839347.


Now let's first create a map of Toronto

In [59]:
Toronto_Map = folium.Map(location=[latitude, longitude], zoom_start=10)

# In addition, we will add the markers to the map
for latitude, longitude, borough, neighbourhood in zip(toronto_complete['Latitude'], toronto_complete['Longitude'], toronto_complete['Borough'], toronto_complete['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(Toronto_Map)  
    
Toronto_Map

To cluster the data, we will need data from Foursquare - first I will make sure to save my API data and the version as a value...

In [60]:
CLIENT_ID = 'TRI02FZMYRQNNRMRO3IJ11UKUFPEA2LL1FGRCVWFXLOKY4BF'
CLIENT_SECRET = 'UAZGC4MYMQRW1NC43REGPIRPDISDZ20GHOY5GXPOJIZUEB0B'
VERSION = '20210418'

To get all the venue data, I will create the same function as in the lab:

In [61]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Now what will be the venues??

In [62]:
Toronto_Venues = getNearbyVenues(toronto_complete['Neighbourhood'], toronto_complete['Latitude'], toronto_complete['Longitude'])

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

Let's check if the table loads correctly by checking the first 5 rows...

In [63]:
Toronto_Venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Chris Effects Painting,Construction & Landscaping
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,Bar
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,Bank
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,Electronics Store


To get a clear overview, let's group by Venue Category...

In [64]:
Toronto_Venues.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Ardene Shoes Outlet
Airport,Downsview,43.737473,-79.394420,Toronto Downsview Airport (YZD)
Airport Food Court,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Billy Bishop Café
Airport Gate,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Gate 8
Airport Lounge,"CN Tower, King and Spadina, Railway Lands, Har...",43.628947,-79.394420,Porter Lounge
...,...,...,...,...
Warehouse Store,Thorncliffe Park,43.705369,-79.349372,Costco
Wine Bar,"Little Portugal, Trinity",43.653206,-79.400049,Paris Paris Bar
Wings Joint,"Queen's Park, Ontario Provincial Government",43.665860,-79.383160,Wingporium
Women's Store,Caledonia-Fairbanks,43.689026,-79.453512,Maximum Woman


Now let's use one hot encoding...

In [65]:
Toronto_Venues_OHE = pd.get_dummies(Toronto_Venues[['Venue Category']], prefix="", prefix_sep="")
Toronto_Venues_OHE

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1313,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1314,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1315,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1316,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [68]:
Toronto_Venues_OHE['Neighbourhood'] = Toronto_Venues['Neighbourhood'] 

Toronto_Venues_OHE

Unnamed: 0,Women's Store,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Malvern, Rouge"
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Rouge Hill, Port Union, Highland Creek"
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Rouge Hill, Port Union, Highland Creek"
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Guildwood, Morningside, West Hill"
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Guildwood, Morningside, West Hill"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1313,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"South Steeles, Silverstone, Humbergate, Jamest..."
1314,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Northwest, West Humber - Clairville"
1315,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Northwest, West Humber - Clairville"
1316,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Northwest, West Humber - Clairville"


Where are my neighbourhoods?? Let's move this Neighbourhood column to the beginning of this table..

In [69]:
neighbourhood_column = [Toronto_Venues_OHE.columns[-1]] + list(Toronto_Venues_OHE.columns[:-1])
Toronto_Venues_OHE = Toronto_Venues_OHE[neighbourhood_column]

Toronto_Venues_OHE.head()

Unnamed: 0,Neighbourhood,Women's Store,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Let's group by Neighbourhood and calculate the mean of venue categories in each neighbourhood...

In [71]:
Toronto_Venues_Grouped = Toronto_Venues_OHE.groupby('Neighbourhood').mean().reset_index()
Toronto_Venues_Grouped.head()

Unnamed: 0,Neighbourhood,Women's Store,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Truck Stop,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now I'll create a function, in which the top number of venues will be returned

In [72]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


Let's go with the top 10 venues, the same as in the lab we've done this week

In [73]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = Toronto_Venues_Grouped['Neighbourhood']

for ind in np.arange(Toronto_Venues_Grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_Venues_Grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Clothing Store,Skating Rink,Lounge,Breakfast Spot,Curling Ice,Dog Run,Distribution Center,Discount Store,Diner
1,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Gym,Sandwich Place,Playground,Pub,College Stadium,Cuban Restaurant,Discount Store,Diner
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Convenience Store,Supermarket,Bridal Shop,Sushi Restaurant,Sandwich Place,Restaurant,Middle Eastern Restaurant,Pizza Place
3,Bayview Village,Chinese Restaurant,Bank,Japanese Restaurant,Café,Wings Joint,Curling Ice,Donut Shop,Dog Run,Distribution Center,Discount Store
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sandwich Place,Hobby Shop,Restaurant,Cupcake Shop,Fast Food Restaurant,Sushi Restaurant,Butcher,Café


It seems we're now all set for the K-Means clustering method, let's import the package

In [74]:
from sklearn.cluster import KMeans

I will use 5 clusters and run the K-Means clustering method

In [75]:
k_num_clusters = 5

Toronto_Clustering = Toronto_Venues_Grouped.drop('Neighbourhood', 1)

kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(Toronto_Clustering)
kmeans

KMeans(n_clusters=5, random_state=0)

Let's check the lables in Toronto Venues Grouped

In [77]:
# Let's check the lables in Toronto Venues Grouped

kmeans.labels_[0:100]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 3, 0, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2,
       1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1,
       2, 1, 1, 1, 1, 1, 2], dtype=int32)

Now I will insert the Labels into the table..

In [78]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Now let's merge everything and create one table with Postal Code, Neighbourhood info and all common values and cluster labels..

In [79]:
toronto_merged = toronto_complete

toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,3.0,Fast Food Restaurant,Wings Joint,Cuban Restaurant,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
18,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,1.0,Bar,Construction & Landscaping,Wings Joint,Cupcake Shop,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
27,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,1.0,Donut Shop,Rental Car Location,Electronics Store,Breakfast Spot,Bank,Intersection,Medical Center,Restaurant,Mexican Restaurant,Discount Store
36,M1G,Scarborough,Woburn,43.770992,-79.216917,1.0,Coffee Shop,Korean BBQ Restaurant,Other Repair Shop,College Auditorium,Curling Ice,Donut Shop,Dog Run,College Arts Building,Distribution Center,Discount Store
45,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,1.0,Caribbean Restaurant,Athletics & Sports,Gas Station,Hakka Restaurant,Bakery,Thai Restaurant,Bank,Fried Chicken Joint,Department Store,Dance Studio


Not to forget, I will remove the NaN values from the column Cluster Labels

In [80]:
toronto_merged_nonan = toronto_merged.dropna(subset=['Cluster Labels'])

Now the best part! Let's plot the clusters into the map!

In [81]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_nonan['Latitude'], toronto_merged_nonan['Longitude'], toronto_merged_nonan['Neighbourhood'], toronto_merged_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

As a final step, let's examine the clusters I've created, starting with cluster 1...

In [83]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 0, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,Scarborough,0.0,Playground,Wings Joint,Creperie,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
135,Scarborough,0.0,Park,Playground,Intersection,Cuban Restaurant,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop


Cluster 2...

In [84]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 1, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Scarborough,1.0,Bar,Construction & Landscaping,Wings Joint,Cupcake Shop,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
27,Scarborough,1.0,Donut Shop,Rental Car Location,Electronics Store,Breakfast Spot,Bank,Intersection,Medical Center,Restaurant,Mexican Restaurant,Discount Store
36,Scarborough,1.0,Coffee Shop,Korean BBQ Restaurant,Other Repair Shop,College Auditorium,Curling Ice,Donut Shop,Dog Run,College Arts Building,Distribution Center,Discount Store
45,Scarborough,1.0,Caribbean Restaurant,Athletics & Sports,Gas Station,Hakka Restaurant,Bakery,Thai Restaurant,Bank,Fried Chicken Joint,Department Store,Dance Studio
63,Scarborough,1.0,Coffee Shop,Chinese Restaurant,Discount Store,Department Store,Hobby Shop,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Diner
...,...,...,...,...,...,...,...,...,...,...,...,...
89,North York,1.0,Baseball Field,Furniture / Home Store,Wings Joint,Cupcake Shop,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
107,Etobicoke,1.0,Pizza Place,Coffee Shop,Intersection,Sandwich Place,Playground,Chinese Restaurant,Discount Store,Dim Sum Restaurant,Dessert Shop,Creperie
116,Etobicoke,1.0,Park,Mobile Phone Shop,Sandwich Place,Bus Line,College Gym,College Stadium,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
143,Etobicoke,1.0,Grocery Store,Pizza Place,Beer Store,Pharmacy,Fast Food Restaurant,Fried Chicken Joint,Sandwich Place,Discount Store,Dim Sum Restaurant,Dessert Shop


Cluster 3...

In [85]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 2, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
100,North York,2.0,Construction & Landscaping,Convenience Store,Park,Wings Joint,Cupcake Shop,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
2,North York,2.0,Park,Construction & Landscaping,Food & Drink Shop,Cupcake Shop,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
57,East York,2.0,Park,Metro Station,Convenience Store,Wings Joint,Cupcake Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
129,Central Toronto,2.0,Park,Wings Joint,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop
147,Downtown Toronto,2.0,Park,Playground,Trail,Coworking Space,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
32,York,2.0,Park,Pool,Women's Store,College Arts Building,Cuban Restaurant,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant
77,North York,2.0,Park,Basketball Court,Bakery,Construction & Landscaping,Cupcake Shop,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
98,York,2.0,Convenience Store,Park,Wings Joint,Cupcake Shop,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant


Cluster 4...

In [86]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 3, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Scarborough,3.0,Fast Food Restaurant,Wings Joint,Cuban Restaurant,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
86,York,3.0,Fast Food Restaurant,Discount Store,Sandwich Place,Wings Joint,Drugstore,Dog Run,Distribution Center,Diner,Dim Sum Restaurant,Dessert Shop


And last but not least... Cluster 5!

In [87]:
toronto_merged_nonan.loc[toronto_merged_nonan['Cluster Labels'] == 4, toronto_merged_nonan.columns[[1] + list(range(5, toronto_merged_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Etobicoke,4.0,Print Shop,Wings Joint,Creperie,Dog Run,Distribution Center,Discount Store,Diner,Dim Sum Restaurant,Dessert Shop,Department Store
