# Segmenting and Clustering Neighborhoods in Toronto

In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

### Import Pandas

In [1]:
import pandas as pd # library for data analsysis
print('Libraries imported.')

Libraries imported.


## Part 1

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

### Using BeautifulSoup Package

Use the BeautifulSoup package to transform the data in the table on the Wikipedia page into the pandas dataframe

In [2]:
import requests
from bs4 import BeautifulSoup
print('Libraries imported.')

Libraries imported.


In [3]:
wikipedia_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
page = BeautifulSoup(wikipedia_url, 'html.parser')
page

table = page.body.table.tbody

In [4]:
#getting the cells
def table_cell(i):
    cells = i.find_all('td')
    row = []
    
    for cell in cells:
        if cell.a:            
            if (cell.a.text):
                row.append(cell.a.text)
                continue
        row.append(cell.string.strip())
        
    return row
#getting the rows
def table_row():    
    data = []  
    
    for tr in table.find_all('tr'):
        row = table_cell(tr)
        if len(row) != 3:
            continue
        data.append(row)        
    
    return data

In [5]:
#creating dataframe
data = table_row()
df = pd.DataFrame(data, columns=['Postcode', 'Borough', 'Neighbourhood'])
df = df.rename(columns = {'Postcode':'PostalCode', 'Neighbourhood':'Neighborhood'})
df.head(7)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [6]:
df1 = df[df.Borough != 'Not assigned']
df1.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [7]:
#finding the cells that have a Borough but a 'Not assigned' Neighbourhood
df1.loc[df1['Neighborhood'] == 'Not assigned']

Unnamed: 0,PostalCode,Borough,Neighborhood
8,M7A,Queen's Park,Not assigned


In [8]:
#patching the Borough value to the Neighbourhood
df1.loc[df1['Neighborhood'] == "Not assigned", 'Neighborhood'] = df1.loc[df1['Neighborhood'] == "Not assigned", 'Borough']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


In [9]:
df1.head(10) #Where Postcode M7A, the Borough and Neighbourhood have same value

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


Notice M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma.

In [10]:
df2 = df1.groupby(['PostalCode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
df2.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Check the shape of the dataframe

In [11]:
print('The shape of the dataframe is:',df2.shape, ' and the number of rows of it is: ', df2.shape[0])

The shape of the dataframe is: (103, 3)  and the number of rows of it is:  103


## Part 2

Get the latitude and the longitude coordinates of each neighborhood. Link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data, now let's create a dataframe.

In [12]:
geo_coord_url="http://cocl.us/Geospatial_data"
df_geo_coord = pd.read_csv(geo_coord_url)
df_geo_coord.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
df_geo_coord.shape #Checking the shape of the new dataframe

(103, 3)

In [14]:
#rename the 'Postal Code' column, so it matches the name of the first dataframe
df_geo_coord = df_geo_coord.rename(columns = {'Postal Code':'PostalCode'})
df_geo_coord.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merging the first dataframe and the dataframe containing the geographical coordinates

In [15]:
df_full = pd.merge(df2,df_geo_coord, on='PostalCode')
df_full.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [16]:
#checking the shape of the final dataset
print('The shape of the final dataframe is', df_full.shape)

The shape of the final dataframe is (103, 5)


## Part 3

Analyse, Explore and cluster the neighborhoods in Toronto.

### Import other libraries

In [17]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


First, let's find the coordinates of Toronto. To use the geocoder, define user_agent.

In [18]:
address1 = 'Toronto, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address1)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


Now, create a map of Toronto to visualise all the neighborhoods.

In [19]:
# create map using latitude and longitude
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_full['Latitude'], df_full['Longitude'], df_full['Borough'], df_full['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Visualise the neighborhoods in WEST TORONTO

In [20]:
west_toronto_data = df_full[df_full['Borough'] == 'West Toronto'].reset_index(drop=True)
west_toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M6H,West Toronto,"Dovercourt Village, Dufferin",43.669005,-79.442259
1,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975
2,M6K,West Toronto,"Brockton, Exhibition Place, Parkdale Village",43.636847,-79.428191
3,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763
4,M6R,West Toronto,"Parkdale, Roncesvalles",43.64896,-79.456325


Getting the coordinates of West Toronto. For this we need the coordinates of Toronto, which were calculated above.

In [21]:
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Creating a map to visualise the neighborhoods in West Toronto

In [22]:
# create map of Manhattan using latitude and longitude values
map_west_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(west_toronto_data['Latitude'], west_toronto_data['Longitude'], west_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_west_toronto)  
    
map_west_toronto

### Define Foursquare Credentials and Version

In [23]:
# The code was removed by Watson Studio for sharing.

Credentails and Version set!


### Exploring neighborhoods in West Toronto

Getting the top 100 venues in West Toronto within a radius of 500 meters.

In [24]:
#Define function
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Neighborhoods in West Toronto

In [25]:
LIMIT = 100
west_toronto_venues = getNearbyVenues(names=west_toronto_data['Neighborhood'],
                                   latitudes=west_toronto_data['Latitude'],
                                   longitudes=west_toronto_data['Longitude']
                                  )

Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction South
Parkdale, Roncesvalles
Runnymede, Swansea


In [26]:
print('The shape of the new dataframe is:',west_toronto_venues.shape)
print('The content of the new dataframe is the following:')
west_toronto_venues.head()

The shape of the new dataframe is: (175, 7)
The content of the new dataframe is the following:


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Dovercourt Village, Dufferin",43.669005,-79.442259,The Greater Good Bar,43.669409,-79.439267,Bar
1,"Dovercourt Village, Dufferin",43.669005,-79.442259,Parallel,43.669516,-79.438728,Middle Eastern Restaurant
2,"Dovercourt Village, Dufferin",43.669005,-79.442259,Happy Bakery & Pastries,43.66705,-79.441791,Bakery
3,"Dovercourt Village, Dufferin",43.669005,-79.442259,Planet Fitness Toronto Galleria,43.667588,-79.442574,Gym / Fitness Center
4,"Dovercourt Village, Dufferin",43.669005,-79.442259,FreshCo,43.667918,-79.440754,Supermarket


Venues per Neighborhood

In [27]:
west_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Brockton, Exhibition Place, Parkdale Village",20,20,20,20,20,20
"Dovercourt Village, Dufferin",20,20,20,20,20,20
"High Park, The Junction South",23,23,23,23,23,23
"Little Portugal, Trinity",62,62,62,62,62,62
"Parkdale, Roncesvalles",15,15,15,15,15,15
"Runnymede, Swansea",35,35,35,35,35,35


In [28]:
print('There are {} uniques categories.'.format(len(west_toronto_venues['Venue Category'].unique())))

There are 88 uniques categories.


### Analyze Each Neighborhood

In [29]:
# one hot encoding
west_toronto_onehot = pd.get_dummies(west_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
west_toronto_onehot['Neighborhood'] = west_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [west_toronto_onehot.columns[-1]] + list(west_toronto_onehot.columns[:-1])
west_toronto_onehot = west_toronto_onehot[fixed_columns]

west_toronto_onehot.head()

Unnamed: 0,Neighborhood,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Bistro,...,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Dovercourt Village, Dufferin",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,"Dovercourt Village, Dufferin",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Dovercourt Village, Dufferin",0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Dovercourt Village, Dufferin",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Dovercourt Village, Dufferin",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
#cheking the shape of the dataframe
print('The shape of the dataframe is:', west_toronto_onehot.shape)

The shape of the dataframe is: (175, 89)


Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [31]:
west_toronto_grouped = west_toronto_onehot.groupby('Neighborhood').mean().reset_index()
west_toronto_grouped

Unnamed: 0,Neighborhood,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Bistro,...,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Dovercourt Village, Dufferin",0.0,0.0,0.0,0.0,0.05,0.1,0.05,0.05,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0
2,"High Park, The Junction South",0.043478,0.0,0.043478,0.0,0.0,0.043478,0.0,0.086957,0.0,...,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0
3,"Little Portugal, Trinity",0.0,0.016129,0.0,0.048387,0.0,0.032258,0.0,0.129032,0.016129,...,0.0,0.016129,0.0,0.0,0.016129,0.016129,0.032258,0.016129,0.0,0.016129
4,"Parkdale, Roncesvalles",0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Runnymede, Swansea",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,...,0.057143,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0


In [32]:
#chek the shape of the dataframe
print('The shape of the grouped dataframe is:', west_toronto_grouped.shape)

The shape of the grouped dataframe is: (6, 89)


Print each neighborhood along with the top 5 most common venues

In [33]:
num_top_venues = 5

for hood in west_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = west_toronto_grouped[west_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Brockton, Exhibition Place, Parkdale Village----
            venue  freq
0  Breakfast Spot  0.10
1     Coffee Shop  0.10
2            Café  0.10
3   Grocery Store  0.05
4         Stadium  0.05


----Dovercourt Village, Dufferin----
         venue  freq
0     Pharmacy  0.10
1       Bakery  0.10
2  Supermarket  0.10
3         Café  0.05
4      Brewery  0.05


----High Park, The Junction South----
                       venue  freq
0         Mexican Restaurant  0.09
1                        Bar  0.09
2                       Café  0.09
3               Antique Shop  0.04
4  Cajun / Creole Restaurant  0.04


----Little Portugal, Trinity----
              venue  freq
0               Bar  0.13
1       Men's Store  0.05
2  Asian Restaurant  0.05
3       Coffee Shop  0.05
4        Restaurant  0.03


----Parkdale, Roncesvalles----
            venue  freq
0       Gift Shop  0.13
1  Breakfast Spot  0.13
2    Dessert Shop  0.07
3       Bookstore  0.07
4         Dog Run  0.07


----Runnymede, Swa

Put that into a pandas dataframe

In [34]:
#function to sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
#create dataframe and showing top 10 venues for each neighbourhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

#create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

#create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = west_toronto_grouped['Neighborhood']

for ind in np.arange(west_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(west_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Brockton, Exhibition Place, Parkdale Village",Breakfast Spot,Coffee Shop,Café,Gym / Fitness Center,Gym,Furniture / Home Store,Italian Restaurant,Performing Arts Venue,Pet Store,Climbing Gym
1,"Dovercourt Village, Dufferin",Pharmacy,Bakery,Supermarket,Gym / Fitness Center,Café,Middle Eastern Restaurant,Park,Discount Store,Pool,Liquor Store
2,"High Park, The Junction South",Café,Mexican Restaurant,Bar,Antique Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,Italian Restaurant,Flea Market,Fast Food Restaurant
3,"Little Portugal, Trinity",Bar,Men's Store,Asian Restaurant,Coffee Shop,Café,Cocktail Bar,Pizza Place,Restaurant,New American Restaurant,Bakery
4,"Parkdale, Roncesvalles",Gift Shop,Breakfast Spot,Italian Restaurant,Cuban Restaurant,Movie Theater,Bank,Bar,Coffee Shop,Bookstore,Dog Run


### Cluster Neighborhoods
Run k-means to cluster the neighborhood into 5 clusters.

In [36]:
# set number of clusters
kclusters = 4

wtoronto_grouped_clustering = west_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(wtoronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 3, 1, 1, 2, 0], dtype=int32)

Create new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [37]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

wtoronto_merged = west_toronto_data

# merge west_toronto_grouped with west_toronto_data to add latitude/longitude for each neighborhood
wtoronto_merged = wtoronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

wtoronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M6H,West Toronto,"Dovercourt Village, Dufferin",43.669005,-79.442259,3,Pharmacy,Bakery,Supermarket,Gym / Fitness Center,Café,Middle Eastern Restaurant,Park,Discount Store,Pool,Liquor Store
1,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,1,Bar,Men's Store,Asian Restaurant,Coffee Shop,Café,Cocktail Bar,Pizza Place,Restaurant,New American Restaurant,Bakery
2,M6K,West Toronto,"Brockton, Exhibition Place, Parkdale Village",43.636847,-79.428191,0,Breakfast Spot,Coffee Shop,Café,Gym / Fitness Center,Gym,Furniture / Home Store,Italian Restaurant,Performing Arts Venue,Pet Store,Climbing Gym
3,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,1,Café,Mexican Restaurant,Bar,Antique Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,Italian Restaurant,Flea Market,Fast Food Restaurant
4,M6R,West Toronto,"Parkdale, Roncesvalles",43.64896,-79.456325,2,Gift Shop,Breakfast Spot,Italian Restaurant,Cuban Restaurant,Movie Theater,Bank,Bar,Coffee Shop,Bookstore,Dog Run


Visualise the resulting clusters:

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(wtoronto_merged['Latitude'], wtoronto_merged['Longitude'], wtoronto_merged['Neighborhood'], wtoronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Examine each cluster and determine the venue categories that distinguish each cluster.

#### Cluster1

In [39]:
wtoronto_merged.loc[wtoronto_merged['Cluster Labels'] == 0, wtoronto_merged.columns[[1] + list(range(5, wtoronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,West Toronto,0,Breakfast Spot,Coffee Shop,Café,Gym / Fitness Center,Gym,Furniture / Home Store,Italian Restaurant,Performing Arts Venue,Pet Store,Climbing Gym
5,West Toronto,0,Coffee Shop,Café,Pizza Place,Sushi Restaurant,Italian Restaurant,Gastropub,Smoothie Shop,Diner,Latin American Restaurant,Falafel Restaurant


#### Cluster2

In [40]:
wtoronto_merged.loc[wtoronto_merged['Cluster Labels'] == 1, wtoronto_merged.columns[[1] + list(range(5, wtoronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,West Toronto,1,Bar,Men's Store,Asian Restaurant,Coffee Shop,Café,Cocktail Bar,Pizza Place,Restaurant,New American Restaurant,Bakery
3,West Toronto,1,Café,Mexican Restaurant,Bar,Antique Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,Italian Restaurant,Flea Market,Fast Food Restaurant


#### Cluster3

In [41]:
wtoronto_merged.loc[wtoronto_merged['Cluster Labels'] == 2, wtoronto_merged.columns[[1] + list(range(5, wtoronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,West Toronto,2,Gift Shop,Breakfast Spot,Italian Restaurant,Cuban Restaurant,Movie Theater,Bank,Bar,Coffee Shop,Bookstore,Dog Run


#### Cluster4

In [42]:
wtoronto_merged.loc[wtoronto_merged['Cluster Labels'] == 3, wtoronto_merged.columns[[1] + list(range(5, wtoronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,West Toronto,3,Pharmacy,Bakery,Supermarket,Gym / Fitness Center,Café,Middle Eastern Restaurant,Park,Discount Store,Pool,Liquor Store
