# Segmenting and Clustering Neighborhoods in Toronto

### Scrapping the table data from wikipedia page

In [1]:
from pandas.io.html import read_html
page = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

wikitables = read_html(page,  attrs={"class":"wikitable"})

print ("Extracted {num} wikitables".format(num=len(wikitables)))

Extracted 1 wikitables


### Loading the data in table into a dataframe named as "df"

In [2]:
df = wikitables[0]

### Looking the first 5 rows from the dataframe "df"

In [3]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


## Cleaning our data

### Dropping the rows where the Borough is "Not assigned"

In [4]:
df = df[~df.Borough.str.contains("Not assigned")]

### Looking the first five rows after dropping

In [5]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


### Resetting the index after cleaning

In [6]:
df = df.reset_index()
df = df[['Postal Code', 'Borough', 'Neighborhood']]

### Checking the null values 'NaN' in the dataframe

In [7]:
df.isnull().sum(axis = 0)

Postal Code     0
Borough         0
Neighborhood    0
dtype: int64

In [8]:
df.head(15)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


### Shape of cleaned dataset

In [9]:
df.shape

(103, 3)

## Geospatial Data

### Importing pandas library

In [10]:
import pandas as pd

### Loading geospatial data into a new dataframe "df1"

In [11]:
df1 = pd.read_csv("Geospatial_Coordinates.csv")

### Looking into first five rows of "df1"

In [12]:
df1.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Shape of df1

In [13]:
df1.shape

(103, 3)

## Merging the nieghborhood data with geospacial data

In [14]:
df_merged = pd.merge(df, df1, on="Postal Code")

In [15]:
# Looking into first 15 rows of merged data
df_merged.head(15)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### Considering only the data including "Toronto" in Borough

In [16]:
df_2 = df_merged[df_merged.Borough.str.contains('Toronto',case=False)]

### Looking into first 5 rows of df_2

In [19]:
df_2.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031


### Shape of df_2

In [20]:
df_2.shape

(39, 5)

## Looking into first 5 rows after resetting the Index

In [22]:
df_2 = df_2.reset_index()

In [23]:
df_2 = df_2.drop(columns="index")

In [24]:
df_2.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [27]:
# Installing geopy library
!conda install -c conda-forge geopy --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::certifi-2019.11.28-py37_0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::certifi-2019.11.28-py37_0
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, defaults/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::certifi-2019.11.28-py37_0, defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::certifi-2019.11.28-py37_0, anaconda/win-64::openssl-1.1

In [29]:
# installing folium library
!conda install -c conda-forge folium=0.5.0 --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4done

## Package Plan ##

  environment location: C:\Users\TADHAASTU\anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4

## Importing Required Libraries

In [30]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium

In [31]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_2['Borough'].unique()),
        df_2.shape[0]
    )
)

The dataframe has 4 boroughs and 39 neighborhoods.


### Use geopy library to get the latitude and longitude values of New York City.

In [32]:
address = 'Toronto City, TO'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 43.6534817, -79.3839347.


### Create a map of New York with neighborhoods superimposed on top.

In [33]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_2['Latitude'], df_2['Longitude'], df_2['Borough'], df_2['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [34]:
CLIENT_ID = 'YFYR2YCSR4EFORQ2ZOXV2VPCMJR2VDS2ZS1B5UUZRQXY05AX' # your Foursquare ID
CLIENT_SECRET = 'J1ZE1ZL2OHJDMGLCGJIVZYBGAIIWQCC3XMABEHYZHXMLLBFA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YFYR2YCSR4EFORQ2ZOXV2VPCMJR2VDS2ZS1B5UUZRQXY05AX
CLIENT_SECRET:J1ZE1ZL2OHJDMGLCGJIVZYBGAIIWQCC3XMABEHYZHXMLLBFA


In [35]:
df_2.loc[0, 'Neighborhood']

'Regent Park, Harbourfront'

In [36]:
neighborhood_latitude = df_2.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_2.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_2.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Regent Park, Harbourfront are 43.6542599, -79.3606359.


In [37]:
LIMIT = 100
radius = 500
VERSION = '20180605'
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    neighborhood_latitude,
    neighborhood_longitude,
    radius,
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=YFYR2YCSR4EFORQ2ZOXV2VPCMJR2VDS2ZS1B5UUZRQXY05AX&client_secret=J1ZE1ZL2OHJDMGLCGJIVZYBGAIIWQCC3XMABEHYZHXMLLBFA&v=20180605&ll=43.6542599,&radius=-79.3606359&limit=500'

In [38]:
results = requests.get(url).json()
results

{'meta': {'code': 400,
  'errorType': 'param_error',
  'errorDetail': 'll must be of the form XX.XX,YY.YY (received 43.6542599,)',
  'requestId': '5ec80e81bae9a2001b292d13'},
 'response': {}}

In [39]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Analyze Each Neighborhood

In [43]:
# one hot encoding
toronto_onehot = pd.get_dummies(df_2[['Neighborhood']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = df_2['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Berczy Park,"Brockton, Parkdale Village, Exhibition Place",Business reply mail Processing Centre,"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",Central Bay Street,Christie,Church and Wellesley,"Commerce Court, Victoria Hotel",Davisville,Davisville North,"Dufferin, Dovercourt Village","First Canadian Place, Underground city",Forest Hill North & West,"Garden District, Ryerson","Harbourfront East, Union Station, Toronto Islands","High Park, The Junction South","India Bazaar, The Beaches West","Kensington Market, Chinatown, Grange Park",Lawrence Park,"Little Portugal, Trinity","Moore Park, Summerhill East",North Toronto West,"Parkdale, Roncesvalles","Queen's Park, Ontario Provincial Government","Regent Park, Harbourfront","Richmond, Adelaide, King",Rosedale,Roselawn,"Runnymede, Swansea",St. James Town,"St. James Town, Cabbagetown",Stn A PO Boxes,Studio District,"Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park","The Annex, North Midtown, Yorkville",The Beaches,"The Danforth West, Riverdale","Toronto Dominion Centre, Design Exchange","University of Toronto, Harbord"
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Garden District, Ryerson",0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,St. James Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [45]:
# Examining the new dataframe size.
toronto_onehot.shape

(39, 40)

###  Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [48]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Berczy Park,"Brockton, Parkdale Village, Exhibition Place",Business reply mail Processing Centre,"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",Central Bay Street,Christie,Church and Wellesley,"Commerce Court, Victoria Hotel",Davisville,Davisville North,"Dufferin, Dovercourt Village","First Canadian Place, Underground city",Forest Hill North & West,"Garden District, Ryerson","Harbourfront East, Union Station, Toronto Islands","High Park, The Junction South","India Bazaar, The Beaches West","Kensington Market, Chinatown, Grange Park",Lawrence Park,"Little Portugal, Trinity","Moore Park, Summerhill East",North Toronto West,"Parkdale, Roncesvalles","Queen's Park, Ontario Provincial Government","Regent Park, Harbourfront","Richmond, Adelaide, King",Rosedale,Roselawn,"Runnymede, Swansea",St. James Town,"St. James Town, Cabbagetown",Stn A PO Boxes,Studio District,"Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park","The Annex, North Midtown, Yorkville",The Beaches,"The Danforth West, Riverdale","Toronto Dominion Centre, Design Exchange","University of Toronto, Harbord"
0,Berczy Park,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Brockton, Parkdale Village, Exhibition Place",0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Business reply mail Processing Centre,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Central Bay Street,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [47]:
# confirming the new size
toronto_grouped.shape

(39, 40)

### Let's put that into a *pandas* dataframe

In [50]:
# A function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### creating the new dataframe and display the top 10 venues for each neighborhood.

In [52]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Berczy Park,Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village",Davisville
1,"Brockton, Parkdale Village, Exhibition Place","Brockton, Parkdale Village, Exhibition Place","University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
2,Business reply mail Processing Centre,Business reply mail Processing Centre,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
3,"CN Tower, King and Spadina, Railway Lands, Har...","CN Tower, King and Spadina, Railway Lands, Har...","University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
4,Central Bay Street,Central Bay Street,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"


# Cluster Neighborhoods

In [53]:
# Run k-means to cluster the neighborhood into 5 clusters.
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

#### create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [54]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_2

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,"Regent Park, Harbourfront","University of Toronto, Harbord",Lawrence Park,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,"Queen's Park, Ontario Provincial Government","University of Toronto, Harbord",Lawrence Park,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,"Garden District, Ryerson","University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village",Davisville
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,St. James Town,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,The Beaches,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"


### Visualizing the resulting clusters

In [55]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examine Clusters

### Cluster 1

In [56]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,"Regent Park, Harbourfront","University of Toronto, Harbord",Lawrence Park,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
1,Downtown Toronto,0,"Queen's Park, Ontario Provincial Government","University of Toronto, Harbord",Lawrence Park,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
2,Downtown Toronto,0,"Garden District, Ryerson","University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village",Davisville
3,Downtown Toronto,0,St. James Town,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
4,East Toronto,0,The Beaches,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
5,Downtown Toronto,0,Berczy Park,Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village",Davisville
6,Downtown Toronto,0,Central Bay Street,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
7,Downtown Toronto,0,Christie,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
8,Downtown Toronto,0,"Richmond, Adelaide, King","University of Toronto, Harbord",Lawrence Park,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
9,West Toronto,0,"Dufferin, Dovercourt Village","University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city",Davisville


### Cluster 2

In [57]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Downtown Toronto,1,Rosedale,"University of Toronto, Harbord",Lawrence Park,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"


### Cluster 3

In [58]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,2,North Toronto West,"University of Toronto, Harbord",Lawrence Park,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"


### Cluster 4

In [59]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,East Toronto,3,Studio District,"University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"


### Cluster 5

In [60]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,Central Toronto,4,"Summerhill West, Rathnelly, South Hill, Forest...","University of Toronto, Harbord",Davisville North,"India Bazaar, The Beaches West","High Park, The Junction South","Harbourfront East, Union Station, Toronto Islands","Garden District, Ryerson",Forest Hill North & West,"First Canadian Place, Underground city","Dufferin, Dovercourt Village"
