# Segmenting and Clustering Neighbourhoods in Toronto

In this assignment, you will be required to explore, segment, and cluster the neighbourhoods in the city of Toronto. However, unlike New York, the neighbourhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and to learn new libraries and tools quickly depending on the project.

Start by creating a new Notebook for this assignment. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below. There are lots of different versions of wiki paegs and you can select what you are famalair with,


## Import all needed libraries and variables

In [1]:
import pandas as pd
import numpy as np

# json
import json
from pandas.io.json import json_normalize

#scraping
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup

# geocoders
from geopy.geocoders import Nominatim

# visualization
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

# k-means clustering
from sklearn.cluster import KMeans

print('Done!')

Done!


## Problem 1 - Scrap wikipedia page, create pandas dataframe and clean data as needed

Using BeautifulSoup package and libraries, please refer to this link: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

In [2]:
wiki_link = 'https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=942655364'
raw_page = urlopen(wiki_link).read().decode('utf-8')
page = BeautifulSoup(raw_page, 'html.parser')
table = page.body.table.tbody
table

<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Harbourfront</a>
</td></tr>
<tr>
<td>M6A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Lawrence_Heights" title="Lawrence Heights">Lawrence Heights</a>
</td></tr>
<tr>
<td>M6A</td>
<td><a href="/wiki/North_York" title="North York">North Yor

Now, lets convert the table data into Pandas Dataframe

In [3]:
# functions for getting cell and row data
# Geting all values in tr and seperating each td using list element

def table_cell(i):
    cells = i.find_all('td')
    row = []
    
    for cell in cells:
        if cell.a:            
            if (cell.a.text):
                row.append(cell.a.text)
                continue
        row.append(cell.string.strip())
        
    return row

# invoke function table_cell

def table_row():    
    data = []  
    
    for tr in table.find_all('tr'):
        row = table_cell(tr)
        if len(row) != 3:
            continue
        data.append(row)        
    
    return data

In [4]:
# convert to pandas dataframe

data = table_row()
columns = ['Postcode', 'Borough', 'Neighbourhood']
df = pd.DataFrame(data, columns=columns)
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Data Cleaning Steps



In [5]:
# drop the "Not Assigned" borough, Ignore cells with a borough that is not assigned.

df1 = df[df.Borough != 'Not assigned']
df1 = df1.sort_values(by=['Postcode','Borough'])

df1.reset_index(inplace=True)
df1.drop('index',axis=1,inplace=True)
df1.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,Rouge
1,M1B,Scarborough,Malvern
2,M1C,Scarborough,Highland Creek
3,M1C,Scarborough,Rouge Hill
4,M1C,Scarborough,Port Union


In [6]:
# Consolidating the neighbourhoods that share the postcodeMore than one neighborhood can exist in one postal code area. 
# These two rows will be combined into one row with the neighborhoods separated with a comma 
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. 


df_postcodes = df1['Postcode']
df_postcodes.drop_duplicates(inplace=True)
df_clean = pd.DataFrame(df_postcodes)
df_clean['Borough'] = '';
df_clean['Neighbourhood'] = '';


df_clean.reset_index(inplace=True)
df_clean.drop('index', axis=1, inplace=True)
df1.reset_index(inplace=True)
df1.drop('index', axis=1, inplace=True)

for i in df_clean.index:
    for j in df1.index:
        if df_clean.iloc[i, 0] == df1.iloc[j, 0]:
            df_clean.iloc[i, 1] = df1.iloc[j, 1]
            df_clean.iloc[i, 2] = df_clean.iloc[i, 2] + ',' + df1.iloc[j, 2]
            
for i in df_clean.index:
    s = df_clean.iloc[i, 2]
    if s[0] == ',':
        s =s [1:]
    df_clean.iloc[i,2 ] = s
    
df_clean

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [7]:
# Print the number of rows of your dataframe

df_clean.shape

(103, 3)

## Problem 2 - Get Coordinates

Now that you have built a dataframe of the postal code of each neighborhood together with the borough name and neighborhood name, Now lets get the latitude and the longitude coordinates of each neighborhood and then to work with the Foursquare location data.

Use the provided Geospatial_Coordinates.csv file to get the coordinates:

In [8]:
# read the file to coord dataframe

df_clean['Latitude'] = '0';
df_clean['Longitude'] = '0';

coord = pd.read_csv('https://cocl.us/Geospatial_data')

In [9]:
# Consolidate dataframe contain coordinates and the one contains borough names

for i in df_clean.index:
    for j in coord.index:
        if df_clean.iloc[i, 0] == coord.iloc[j, 0]:
            df_clean.iloc[i, 3] = coord.iloc[j, 1]
            df_clean.iloc[i, 4] = coord.iloc[j, 2]

#checking the results            
df_clean.head()


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.8067,-79.1944
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.7845,-79.1605
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.7636,-79.1887
3,M1G,Scarborough,Woburn,43.771,-79.2169
4,M1H,Scarborough,Cedarbrae,43.7731,-79.2395


## Problem 3 - Explore and cluster the neighbourhoods in Toronto

Explore and cluster the neighbourhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data.



### 3.1 Select only the neighbourhoods of Downtown Toronto

Choose the neighbourhoods that contain word " Downtown Toronto":

In [10]:
Toronto_df = df_clean[df_clean['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
Toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.6796,-79.3775
1,M4X,Downtown Toronto,"Cabbagetown,St. James Town",43.668,-79.3677
2,M4Y,Downtown Toronto,Church and Wellesley,43.6659,-79.3832
3,M5A,Downtown Toronto,Harbourfront,43.6543,-79.3606
4,M5B,Downtown Toronto,"Ryerson,Garden District",43.6572,-79.3789


In [11]:
# Check the coordinates for Toronto
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


In [12]:
# Folium map of Downtown Toronto
Map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13)

# Add markers to the map
for lat, lng, label in zip(Toronto_df['Latitude'], Toronto_df['Longitude'], Toronto_df['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Map_toronto)  
    
Map_toronto

### 3.2 Utilizing the Foursquare API to get top 100 venues in Downtown Toronto

In [13]:
# set foursquare credintials
CLIENT_ID = 'TBO5YZNFOB5EOUJUOLUFY4QPDMGOGLUBINIP3LAWP1NWVN30' # your Foursquare ID
CLIENT_SECRET = 'W4I5RTHYWXVS5UBVWHUCZKF2MVTQQSPNHTDYAH2UP4KTEU0I' # your Foursquare Secret
VERSION = '20190323' # Foursquare API version

Leverage the function from the lab to get Top 100 venues in Downtown Toronto within a radius of 500m:

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now run the above function on each neighborhood and create a new dataframe called Downtown_venues.


In [15]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

Downtown_venues = getNearbyVenues(names=Toronto_df['Neighbourhood'],
                                   latitudes=Toronto_df['Latitude'],
                                   longitudes=Toronto_df['Longitude']
                                  )

Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Queen's Park


In [16]:
#checking the size of venues dataframe
Downtown_venues.shape

(1289, 7)

In [17]:
# Check how many unique categories of venues
print('There are {} unique categories.'.format(len(Downtown_venues['Venue Category'].unique())))

There are 205 unique categories.


### 3.3 Analyze each neighbourhood

In [18]:
# one hot encoding
Toronto_onehot = pd.get_dummies(Downtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_onehot['Neighbourhood'] = Downtown_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Rosedale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Rosedale,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,"Cabbagetown,St. James Town",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
# check the dataframe size
Toronto_onehot.shape

(1289, 206)

Now lets group rows by neighborhood and by taking the mean of the frequency of occurrence of each category:

In [20]:
Toronto_grouped = Toronto_onehot.groupby('Neighbourhood').mean().reset_index()
Toronto_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,...,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821
5,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.049383,0.0,0.061728,0.012346,0.0,0.0,0.0,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.024096
8,"Commerce Court,Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,...,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0
9,"Design Exchange,Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0


In [21]:
# check the grouped dataframe size
Toronto_grouped.shape

(19, 206)

Print out each neighbourhood along with the top 5 most common venues in it:

In [22]:
num_top_venues = 5

for neighbourhood in Toronto_grouped['Neighbourhood']:
    print("----"+neighbourhood+"----")
    temp = Toronto_grouped[Toronto_grouped['Neighbourhood'] == neighbourhood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
             venue  freq
0      Coffee Shop  0.07
1       Restaurant  0.06
2             Café  0.05
3           Bakery  0.03
4  Thai Restaurant  0.03


----Berczy Park----
          venue  freq
0   Coffee Shop  0.09
1  Cocktail Bar  0.05
2        Bakery  0.04
3          Café  0.04
4   Cheese Shop  0.04


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
              venue  freq
0   Airport Service  0.18
1    Airport Lounge  0.12
2  Airport Terminal  0.12
3   Harbor / Marina  0.06
4           Airport  0.06


----Cabbagetown,St. James Town----
                venue  freq
0          Restaurant  0.07
1         Coffee Shop  0.07
2                Café  0.05
3  Italian Restaurant  0.05
4              Bakery  0.05


----Central Bay Street----
                 venue  freq
0          Coffee Shop  0.18
1   Italian Restaurant  0.05
2         Burger Joint  0.04
3      Thai Restaurant  0.04
4  Japanese Restaur

Convert the results to Pandas dataframe:

In [23]:
# function to sort the venues in descending order:

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [24]:
#create the new dataframe and display the top 10 venues for each neighborhood:

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = Toronto_grouped['Neighbourhood']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Restaurant,Café,Bar,Bakery,Thai Restaurant,Breakfast Spot,Asian Restaurant,Clothing Store,Concert Hall
1,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Farmers Market,Bakery,Cheese Shop,Café,Restaurant,Bistro
2,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Service,Airport Lounge,Airport Terminal,Harbor / Marina,Boat or Ferry,Plane,Rental Car Location,Sculpture Garden,Boutique,Coffee Shop
3,"Cabbagetown,St. James Town",Coffee Shop,Restaurant,Italian Restaurant,Pub,Café,Bakery,Pizza Place,Chinese Restaurant,Japanese Restaurant,Jewelry Store
4,Central Bay Street,Coffee Shop,Italian Restaurant,Burger Joint,Sandwich Place,Thai Restaurant,Japanese Restaurant,Salad Place,Café,Department Store,Ice Cream Shop


### 3.4 Cluster Neighborhoods using K-means to cluster the neighborhood into 5 clusters.

In [25]:
# set number of clusters
kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([1, 1, 0, 1, 1, 3, 4, 1, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [26]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Labels', kmeans.labels_)

Toronto_merged = Toronto_df

# Add Downtown_grouped with Toronto_df data to add latitude/longitude for each neighborhood

Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

Toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.6796,-79.3775,2,Park,Playground,Trail,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
1,M4X,Downtown Toronto,"Cabbagetown,St. James Town",43.668,-79.3677,1,Coffee Shop,Restaurant,Italian Restaurant,Pub,Café,Bakery,Pizza Place,Chinese Restaurant,Japanese Restaurant,Jewelry Store
2,M4Y,Downtown Toronto,Church and Wellesley,43.6659,-79.3832,1,Japanese Restaurant,Coffee Shop,Gay Bar,Sushi Restaurant,Restaurant,Mediterranean Restaurant,Hotel,Café,Yoga Studio,Men's Store
3,M5A,Downtown Toronto,Harbourfront,43.6543,-79.3606,1,Coffee Shop,Park,Pub,Mexican Restaurant,Breakfast Spot,Restaurant,Bakery,Café,Theater,Distribution Center
4,M5B,Downtown Toronto,"Ryerson,Garden District",43.6572,-79.3789,1,Coffee Shop,Clothing Store,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Café,Middle Eastern Restaurant,Ramen Restaurant,Thai Restaurant,Diner


Visualize the resulting clusters on map:

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Neighbourhood'], Toronto_merged['Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=9,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

(I've attached the screenshot because the map is not rendered on GitHub)

<img src="toronto_map.png" >

### 3.5 Examine the clusters

In [28]:
# Cluster 1
Toronto_merged.loc[Toronto_merged['Labels'] == 0, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,0,Airport Service,Airport Lounge,Airport Terminal,Harbor / Marina,Boat or Ferry,Plane,Rental Car Location,Sculpture Garden,Boutique,Coffee Shop


In [29]:
# Cluster 2
Toronto_merged.loc[Toronto_merged['Labels'] == 1, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,1,Coffee Shop,Restaurant,Italian Restaurant,Pub,Café,Bakery,Pizza Place,Chinese Restaurant,Japanese Restaurant,Jewelry Store
2,Downtown Toronto,1,Japanese Restaurant,Coffee Shop,Gay Bar,Sushi Restaurant,Restaurant,Mediterranean Restaurant,Hotel,Café,Yoga Studio,Men's Store
3,Downtown Toronto,1,Coffee Shop,Park,Pub,Mexican Restaurant,Breakfast Spot,Restaurant,Bakery,Café,Theater,Distribution Center
4,Downtown Toronto,1,Coffee Shop,Clothing Store,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Café,Middle Eastern Restaurant,Ramen Restaurant,Thai Restaurant,Diner
5,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Hotel,Italian Restaurant,Breakfast Spot,Beer Bar,Bakery,Clothing Store,Cosmetics Shop
6,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Farmers Market,Bakery,Cheese Shop,Café,Restaurant,Bistro
7,Downtown Toronto,1,Coffee Shop,Italian Restaurant,Burger Joint,Sandwich Place,Thai Restaurant,Japanese Restaurant,Salad Place,Café,Department Store,Ice Cream Shop
8,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Bar,Bakery,Thai Restaurant,Breakfast Spot,Asian Restaurant,Clothing Store,Concert Hall
9,Downtown Toronto,1,Coffee Shop,Aquarium,Café,Hotel,Italian Restaurant,Brewery,Restaurant,Sporting Goods Shop,Fried Chicken Joint,Scenic Lookout
10,Downtown Toronto,1,Coffee Shop,Café,Hotel,Restaurant,Gastropub,Seafood Restaurant,American Restaurant,Italian Restaurant,Bar,Japanese Restaurant


In [30]:
# Cluster 3
Toronto_merged.loc[Toronto_merged['Labels'] == 2, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,2,Park,Playground,Trail,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


In [31]:
# Cluster 4
Toronto_merged.loc[Toronto_merged['Labels'] == 3, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Downtown Toronto,3,Café,Japanese Restaurant,Bar,Bakery,Restaurant,Bookstore,Yoga Studio,Beer Store,Italian Restaurant,Comfort Food Restaurant
13,Downtown Toronto,3,Bar,Vietnamese Restaurant,Café,Coffee Shop,Vegetarian / Vegan Restaurant,Mexican Restaurant,Dumpling Restaurant,Park,Burger Joint,Farmers Market


In [32]:
# Cluster 5
Toronto_merged.loc[Toronto_merged['Labels'] == 4, Toronto_merged.columns[[1] + list(range(5, Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Downtown Toronto,4,Grocery Store,Café,Park,Gas Station,Diner,Restaurant,Italian Restaurant,Baby Store,Candy Store,Coffee Shop


### 3.6 Conclusion:

As seen from the above dataframes corresponding to each cluster label, the following conclusions can be made:

1. Cluster 1: the most common venue type is Airport Service at this unusual time (CORVID-19)
2. Cluster 2: the most common venue type is Coffee Shop.
3. Cluster 3: the most common venue type is Park.
4. Cluster 4: the most common venue type is Cafe or Bar.
5. Cluster 5: the most common venue type is Grocery Store.
