<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto City</font></h1>

## Introduction

In this assignment, we are required to explore, segment, and cluster the neighborhoods in the city of Toronto using their postal codes.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#ref1">Download and Explore Toronto neighborhood Dataset</a>

2. <a href="#ref2">Download and explore geographical coordinates of each postal code</a>

3. <a href="#ref3">Explore and cluster the neighborhoods in Toronto</a>  
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
#Install necessary modules
!pip install geopy
!pip install folium



In [2]:
#General libraries
import pandas as pd
import numpy as np

# Libraries to scrape Wikipedia page
import urllib.request
from bs4 import BeautifulSoup

# Converts an address into latitude and longitude values
from geopy.geocoders import Nominatim

#Map rendering library
import folium

#Library to handle requests and JSON files
import requests
from pandas.io.json import json_normalize
import json

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
print('Libraries imported.')

Libraries imported.


<a id="ref1"></a>
## 1. Download and Explore Toronto neighborhood Dataset

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. We will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format.

Let's retrieve the Wikipedia webpage which has the list of postal codes of Canada.

In [3]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)

Let's instantiate BeautifulSoup to find the required postal code table from the page.

In [4]:
soup = BeautifulSoup(page,'lxml')
canada_table = soup.find('table',class_='wikitable sortable')

#### Read the data into a *dictornary*

In [5]:
postalcode=[]
borough=[]
neighborhood=[]
for row in canada_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        postalcode.append(cells[0].find(text=True))
        borough.append(cells[1].find(text=True))
        neighborhood.append(cells[2].find(text=True))
canada_dict = {'Postal Code':postalcode,'Borough':borough,'Neighborhood':neighborhood}

#### Tranform the data into a *pandas* dataframe

In [6]:
canada_df = pd.DataFrame(canada_dict)
canada_df['Borough'].replace('Not assigned',np.NaN,inplace=True) #Ignore cells with a borough that is Not assigned
#canada_df.replace('Not assigned\n',np.NaN,inplace=True)
canada_df.dropna(inplace=True) 
canada_df=canada_df.reset_index(drop=True) #Reset the index
canada_df['Neighborhood'] = canada_df['Neighborhood'].str.strip() #Remove any whitespaces in Neighborhood cells

#If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
index_list=canada_df[canada_df['Neighborhood'] == 'Not assigned'].index.values
for ind in index_list:
    canada_df.iloc[ind,2] = canada_df.iloc[ind,1]
    #print(ind,canada_df.iloc[ind,2])

#More than one neighborhood can exist in one postal code area, let's combined them into one row with the neighborhoods separated with a comma.
canada_grouped=canada_df.groupby(['Postal Code','Borough'])['Neighborhood'].apply(lambda Neighborhood:''.join(Neighborhood.to_string(index=False))).str.replace('\n',',').reset_index()
canada_grouped['Neighborhood']=canada_grouped['Neighborhood'].replace(r'\\n','',regex=True)
canada_grouped['Neighborhood']=canada_grouped['Neighborhood'].str.replace('     ','  ').str.replace('   ','  ')

Let's examine the dataframe

In [7]:
canada_grouped.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Let's find out how many boroughs and neighborhoods are there.

In [8]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(canada_grouped['Borough'].unique()),
        canada_grouped.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


Let's find out how many rows and columns are there.

In [9]:
canada_grouped.shape

(103, 3)

<a id="ref2"></a>
## 2. Download and explore geographical coordinates of each postal code

Now that we have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

Let's download Geospatial_Coordinates.csv file.

In [10]:
latlng_df = pd.read_csv('Geospatial_Coordinates.csv')
print(latlng_df.shape)
latlng_df.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Let's add geographical coordinates to canada dataframe.

In [11]:
canada_merged = canada_grouped.join(latlng_df.set_index('Postal Code'),on='Postal Code')

Size of the dataframe.

In [12]:
canada_merged.shape

(103, 5)

Let's examine the dataframe.

In [13]:
canada_merged.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<a id="ref3"></a>
## 3. Explore and cluster the neighborhoods in Toronto

Let's work with boroughs that contain the word Toronto.

In [14]:
toronto_df = canada_merged[canada_merged['Borough'].str.contains('Toronto')].reset_index(drop=True)

Let's find out the size of the dataframe.

In [15]:
toronto_df.shape

(39, 5)

Let's examine the dataframe

In [16]:
toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


#### Use geopy library to get the latitude and longitude values of Toronto City.

In [17]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [18]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, p_code, hood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Postal Code'],toronto_df['Neighborhood']):
    #print(hood)
    label = folium.Popup(str(p_code) + hood.strip(), parse_html=True)
    #print(hood.strip())
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Define Foursquare Credentials and Version

In [19]:
# @hidden_cell
CLIENT_ID='JP5X0JNM1LBSBFI1MYUWGRW4RULZZVSJY4TLZMJOP1TBTXV5'
CLIENT_SECRET='I5NSAQUEAIE2YDTSMBEO3TQMGTRQC3YMOF4ZE2OPVAHA2A0B'
VERSION='20180604'
LIMIT=10
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JP5X0JNM1LBSBFI1MYUWGRW4RULZZVSJY4TLZMJOP1TBTXV5
CLIENT_SECRET:I5NSAQUEAIE2YDTSMBEO3TQMGTRQC3YMOF4ZE2OPVAHA2A0B


#### Let's create a function to get the top 100 venues that are within a radius of 500 meters for all the neighborhoods in Toronto.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name.strip())
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT
        )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood Name', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [21]:
toronto_venues = getNearbyVenues(names=toronto_df['Neighborhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )

The Beaches
The Danforth West,    Riverdale
The Beaches West,  India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park,  Forest Hill SE,   Rathnelly,  South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,  King, Richmond
Harbourfront East,  Toronto Islands,  Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North,  Forest Hill West
The Annex, North Midtown,  Yorkville
Harbord, University of Toronto
Chinatown,   Grange Park, Kensington Market
CN Tower,  Bathurst Quay,   Island airport, Harbourfront West,  King and Spadina,  Railway Lands,  South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,  Underground city
Christie
Dovercourt Village,    Dufferin
Little Portugal,    Trinity
Brockton, Exhibition Place, Parkdale Village


#### Let's check the size of the resulting dataframe

In [22]:
print(toronto_venues.shape)
toronto_venues.head()

(1720, 7)


Unnamed: 0,Neighborhood Name,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
2,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
3,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
4,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


Let's check how many venues were returned for each neighborhood

In [23]:
toronto_venues.groupby('Neighborhood Name').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Brockton, Exhibition Place, Parkdale Village",22,22,22,22,22,22
"Chinatown, Grange Park, Kensington Market",84,84,84,84,84,84
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100
"Harbord, University of Toronto",36,36,36,36,36,36
"Ryerson, Garden District",100,100,100,100,100,100
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",17,17,17,17,17,17
"Cabbagetown, St. James Town",49,49,49,49,49,49
"Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West",14,14,14,14,14,14
"High Park, The Junction South",24,24,24,24,24,24
"Moore Park, Summerhill East",4,4,4,4,4,4


#### Let's find out how many unique categories can be curated from all the returned venues

In [24]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 240 uniques categories.


### Analyze Each Neighborhood

In [25]:
#one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']],prefix='',prefix_sep='').reset_index(drop=True)
toronto_onehot.insert(0,'Neighborhood Name',toronto_venues[['Neighborhood Name']])
toronto_onehot.head()

Unnamed: 0,Neighborhood Name,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [26]:
toronto_onehot.shape

(1720, 241)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [27]:
toronto_grouped = toronto_onehot.groupby('Neighborhood Name').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood Name,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.035714,0.0,0.059524,0.011905,0.0,0.0,0.0,0.0
2,"Design Exchange, Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0
3,"Harbord, University of Toronto",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778
4,"Ryerson, Garden District",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0
5,"CN Tower, Bathurst Quay, Island airport,...",0.0,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Deer Park, Forest Hill SE, Rathnelly, S...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,...,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0
8,"High Park, The Junction South",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Moore Park, Summerhill East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [28]:
toronto_grouped.shape

(39, 241)

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [43]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood Name']

for ind in np.arange(toronto_grouped.shape[0]):    
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Brockton, Exhibition Place, Parkdale Village",Café,Coffee Shop,Breakfast Spot,Gym,Intersection,Performing Arts Venue,Pet Store,Grocery Store,Nightclub,Climbing Gym
1,"Chinatown, Grange Park, Kensington Market",Bar,Vietnamese Restaurant,Café,Chinese Restaurant,Coffee Shop,Dumpling Restaurant,Vegetarian / Vegan Restaurant,Mexican Restaurant,Bakery,Dessert Shop
2,"Design Exchange, Toronto Dominion Centre",Coffee Shop,Café,Hotel,Restaurant,Italian Restaurant,Bar,Bakery,Gastropub,Steakhouse,American Restaurant
3,"Harbord, University of Toronto",Café,Bakery,Bookstore,Japanese Restaurant,Restaurant,Bar,Yoga Studio,Chinese Restaurant,Beer Store,Sandwich Place
4,"Ryerson, Garden District",Coffee Shop,Clothing Store,Japanese Restaurant,Café,Bakery,Cosmetics Shop,Italian Restaurant,Middle Eastern Restaurant,Bookstore,Restaurant


### Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 4 clusters.

In [44]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=2).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [45]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Trail,Health Food Store,Other Great Outdoors,Neighborhood,Pub,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Italian Restaurant,Coffee Shop,Bookstore,Furniture / Home Store,Ice Cream Shop,Brewery,Bubble Tea Shop,Café,Caribbean Restaurant
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Pizza Place,Sandwich Place,Liquor Store,Ice Cream Shop,Pet Store,Coffee Shop,Pub,Movie Theater,Burrito Place,Burger Joint
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Italian Restaurant,Bakery,Brewery,Gastropub,American Restaurant,Bar,Clothing Store,Stationery Store
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Photography Studio,Construction & Landscaping,Park,Swim School,Bus Line,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


Finally, let's visualize the resulting clusters.

In [46]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster,p_code in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels'],toronto_merged['Postal Code']):
    label = folium.Popup(str(p_code) + ':'+str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Now, let's examine each cluster and determine the discriminating venue categories that distinguish each cluster.

#### Cluster 1 - Urban Core(Downtown)

In [47]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,Trail,Health Food Store,Other Great Outdoors,Neighborhood,Pub,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
1,"The Danforth West, Riverdale",Greek Restaurant,Italian Restaurant,Coffee Shop,Bookstore,Furniture / Home Store,Ice Cream Shop,Brewery,Bubble Tea Shop,Café,Caribbean Restaurant
2,"The Beaches West, India Bazaar",Pizza Place,Sandwich Place,Liquor Store,Ice Cream Shop,Pet Store,Coffee Shop,Pub,Movie Theater,Burrito Place,Burger Joint
3,Studio District,Café,Coffee Shop,Italian Restaurant,Bakery,Brewery,Gastropub,American Restaurant,Bar,Clothing Store,Stationery Store
4,Lawrence Park,Photography Studio,Construction & Landscaping,Park,Swim School,Bus Line,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
5,Davisville North,Gym,Breakfast Spot,Department Store,Park,Hotel,Sandwich Place,Food & Drink Shop,Donut Shop,Discount Store,Dog Run
6,North Toronto West,Clothing Store,Coffee Shop,Yoga Studio,Gym / Fitness Center,Shoe Store,Salon / Barbershop,Restaurant,Pet Store,Park,Mexican Restaurant
7,Davisville,Sandwich Place,Pizza Place,Dessert Shop,Sushi Restaurant,Italian Restaurant,Gym,Coffee Shop,Café,Indoor Play Area,Flower Shop
9,"Deer Park, Forest Hill SE, Rathnelly, S...",Coffee Shop,Pub,Pizza Place,Fried Chicken Joint,Restaurant,Liquor Store,Sports Bar,Supermarket,Sushi Restaurant,Light Rail Station
11,"Cabbagetown, St. James Town",Café,Restaurant,Coffee Shop,Park,Italian Restaurant,Pub,Bakery,Pizza Place,Pet Store,Outdoor Sculpture


#### Cluster 2 - Cul-de-sacs & Kids (Residential)

In [48]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Moore Park, Summerhill East",Tennis Court,Playground,Park,Restaurant,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dumpling Restaurant
10,Rosedale,Park,Playground,Trail,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


#### Cluster 3 - Ethnic

In [49]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,"Forest Hill North, Forest Hill West",Mexican Restaurant,Trail,Jewelry Store,Sushi Restaurant,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


#### Cluster 4 - Rural

In [50]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Roselawn,Pool,Garden,Yoga Studio,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


### Thank you!

This notebook was created by Janaranjani Kannan.