# Segmenting and Clustering Neighborhoods in Toronto

this notebook is the assignment of the Applied Data Science Capstone course - Week 3

In [181]:
import pandas as pd
import numpy as np

## 1) Create the dataframe with the neighborhoods in Toronto

read the tables contained in the wikipedia page "List of postal codes of Canada: M", in html form

In [2]:
dfs = pd.read_html("https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=945633050", header=0)

the first table is the one that will be used in this notebook

In [221]:
pcode = dfs[0]
pcode.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


count the rows of the table

In [222]:
pcode.shape

(287, 3)

### rename the columns: PostalCode, Borough, and Neighborhood

In [223]:
pcode.columns = ["PostalCode", "Borough", "Neighborhood"]
pcode.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Ignore cells with a borough that is Not assigned

count the cells that do not have an assigned borough

In [224]:
len(pcode[pcode["Borough"]=="Not assigned"])

77

redefine the dataframe discarding the rows with "Borough" that is equal to "Not assigned"

In [225]:
pcode = pcode[pcode["Borough"]!="Not assigned"]

check that the size is equal to 287 - 77 = 210

In [226]:
pcode.shape

(210, 3)

### Combine rows corresponding to the same postal code area

More than one neighborhood can exist in one postal code area. 
Rows corresponding to the same code will be combined into one row with the neighborhoods separated with a comma.

This is done by grouping the table by "PostalCode", and then aggregating the results of each group in a different way depending on the column. For the column "Borough", we simply select the first element of the group series. For the column "Neighborhood", we join the neighborhoods strings with a separating comma.

In [227]:
pcode = pcode.groupby("PostalCode").agg({'Borough': lambda ser: list(ser)[0],
                                         'Neighborhood': lambda ser: ", ".join(list(ser))})
pcode.reset_index(inplace=True)


Show the first lines of the table.

In [228]:
pcode.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


### Not assigned neighborhoods

If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

Check whether there are not assigned neighborhoods.

In [229]:
len(pcode[pcode["Neighborhood"]=="Not assigned"])

0

All the neighborhoods have a name... go on!

### Print the current size of the dataframe

use the .shape method to print the number of rows of your dataframe.

In [230]:
pcode.shape

(103, 3)

## 2) Add the latitude and the longitude coordinates of each neighborhood

**NOTE** I could not get the coordinates with the geocoder library.
So, I am working with the csv file linked in the instructions http://cocl.us/Geospatial_data

In [196]:
coords = pd.read_csv("http://cocl.us/Geospatial_data")

In [210]:
coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now add the columns "Latitude" and "Longitude" to the pcode dataframe.

In [231]:
pcode = pcode.set_index('PostalCode').join(coords.set_index('Postal Code'))
pcode.reset_index(inplace=True)
pcode.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [232]:
pcode.shape

(103, 5)

## 3) Explore and cluster the neighborhoods in Toronto. 

get the coordinates of Toronto

In [213]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


create a map of Toronto

In [214]:
import folium
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(pcode['Latitude'], pcode['Longitude'], 
                                           pcode['Borough'], pcode['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='red',
        fill=True,
        fill_color='#FF5733',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Import libraries and define credations for Foursquare API

In [64]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


In [428]:
CLIENT_ID = '0WSSLBASAZEOMXIEUC5W4UDTLYADATYKYH1J0GBY4KE0LXVN' # your Foursquare ID
CLIENT_SECRET = 'XXXXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0WSSLBASAZEOMXIEUC5W4UDTLYADATYKYH1J0GBY4KE0LXVN
CLIENT_SECRET:XXXXX


Define a function `getNearbyVenues` that collects the venues which are in the radius of 700 meters from the coordinates of each neighbourhood, and use this function with the data stored in the `pcode` dataframe.

In [148]:
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=700):
    
    venues_list=[]
    for n, name, lat, lng in zip(range(len(names)), names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        if len(results) == 0:
            print("Warning! found no venue for neighboorhood nr. {1}: {0}".format(name, n))
        
        # return only relevant information for each nearby venue
        venues_list.append([ (name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], 
                              v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    print("\n Finished... a total number of {0} venues has been found!".format(len(nearby_venues)))
    return(nearby_venues)

In [149]:
toronto_venues = getNearbyVenues(names=pcode['Neighborhood'],
                                 latitudes=pcode['Latitude'],
                                 longitudes=pcode['Longitude'] )


 Finished... a total number of 3490 venues has been found!


**NOTE!** for the neighboorhood nr. 16, "Upper Rouge", the Foursquare API could not find any venue in the 700 m area from the coordinates. This neighbourhood will be then excluded from the subsequent analysis. 

In [215]:
print(toronto_venues.shape)
toronto_venues.head()

(3490, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant
2,"Rouge, Malvern",43.806686,-79.194353,Tim Hortons,43.802,-79.198169,Coffee Shop
3,"Rouge, Malvern",43.806686,-79.194353,Lee Valley,43.803161,-79.199681,Hobby Shop
4,"Rouge, Malvern",43.806686,-79.194353,Bus Stop: 85 & 116,43.802198,-79.199389,Bus Station


Now there are 3490 venues stored in the `toronto_venues` dataframe.

In [431]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 323 uniques categories.


And the venues belong to 323 categories.

**There is a problem**: there is a venue category that is named "Neighborhood"

In [154]:
toronto_venues[toronto_venues["Venue Category"]=="Neighborhood"]

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
536,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
806,Studio District,43.659526,-79.340923,Leslieville,43.66207,-79.337856,Neighborhood
1402,"Ryerson, Garden District",43.657162,-79.378937,Downtown Toronto,43.653232,-79.385296,Neighborhood
1658,Central Bay Street,43.657952,-79.387383,Downtown Toronto,43.653232,-79.385296,Neighborhood
1739,"Adelaide, King, Richmond",43.650571,-79.384568,Downtown Toronto,43.653232,-79.385296,Neighborhood
1832,"Harbourfront East, Toronto Islands, Union Station",43.640816,-79.381752,Harbourfront,43.639526,-79.380688,Neighborhood
2655,"First Canadian Place, Underground city",43.648429,-79.38228,Downtown Toronto,43.653232,-79.385296,Neighborhood
2911,"Brockton, Exhibition Place, Parkdale Village",43.636847,-79.428191,Parkdale,43.640524,-79.4322,Neighborhood


This name choice would create problems in the following, when transforming the `Venue Category` to a set of numerical columns using the `get_dummies` function. For this reason, first, we convert the venue name `Neighborhood` to the unambigous `NeighborhoodVenue`. 

In [298]:
toronto_venues.replace({"Neighborhood":"NeighborhoodVenue"}, inplace=True)
len(toronto_venues[toronto_venues["Venue Category"]=="Neighborhood"])

0

Now we can finally transform the categorical variable `Venue Category` to a set of numerical variables...

In [299]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
print(toronto_onehot.shape)


(3490, 323)


There are still 3490 venues, and now the columns are 323, one per each of the venue categories

We need to add the `Neighborhood` column back to the dataset, so the colums will become 3490

In [300]:
# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
collist = list(toronto_onehot.columns)
collist.remove('Neighborhood')
toronto_onehot = toronto_onehot[['Neighborhood'] + collist]

print(toronto_onehot.shape)
toronto_onehot.head()


(3490, 324)


Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now we group the data by the column `Neighborhood`, collecting all the venues of the same neighborhood. Applying the function `mean` to the grouped data, we can obtain the frequency (as a number >= 0 and <= 1) of a certain type of venue in the neighborhood.

In [434]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').sum().reset_index()
toronto_grouped.shape
toronto_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0,0,0,0,0,0,0,0,2,...,2,0,0,0,0,0,0,0,1,1
1,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Alderwood, Long Branch",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We now process the data extracting the top 10 venues for each neighborhood and collecting this result in a new dataframe. This dataframe will be used at the end to analyse the clustering results and to understand the types of cluster that are computed.

**IMPORTANT** we have added a modification to the data processing, such that in the list of top 10 venues the venue category is added only if that type of category has a frequency that is higher than 0. In this way, venues that are not actually present in the neighborhood are not added to this list. 

In [446]:
num_top_venues = 10

# create columns according to number of top venues
indicators = ['st', 'nd', 'rd']
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in range(len(toronto_grouped)):
    row_sorted = toronto_grouped.iloc[ind, 1:].sort_values(ascending=False)
    for ntop in range(num_top_venues):
        if row_sorted[ntop] > 0.0:
            venues_sorted.iloc[ind, 1+ntop] = row_sorted.index.values[ntop] + " ({})".format(row_sorted[ntop])

venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop (8),Café (5),Restaurant (5),Hotel (4),Gym (3),Pizza Place (3),Gastropub (3),Bar (3),Theater (3),Breakfast Spot (2)
1,Agincourt,Breakfast Spot (1),Clothing Store (1),Shanghai Restaurant (1),Latin American Restaurant (1),Lounge (1),Sandwich Place (1),Skating Rink (1),Pool Hall (1),Badminton Court (1),
2,"Agincourt North, L'Amoreaux East, Milliken, St...",BBQ Joint (2),Pizza Place (2),Chinese Restaurant (2),Noodle House (2),Udon Restaurant (1),Malay Restaurant (1),Food Court (1),Shop & Service (1),Bakery (1),Caribbean Restaurant (1)
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store (2),Caribbean Restaurant (1),Gym Pool (1),Beer Store (1),Liquor Store (1),Fried Chicken Joint (1),Sandwich Place (1),Pharmacy (1),Fast Food Restaurant (1),Hardware Store (1)
4,"Alderwood, Long Branch",Pizza Place (2),Convenience Store (2),Sandwich Place (1),Skating Rink (1),Gas Station (1),Pharmacy (1),Coffee Shop (1),Pub (1),Gym (1),


Now we cluster the results in 4 clusters, which are labelled with an integer number 0, 1, 2 or 3.

In [447]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# import k-means from clustering stage
from sklearn.cluster import KMeans

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=4).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:30] 

array([4, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 0, 2, 2, 2, 0, 2, 2, 2, 3, 0, 2,
       3, 2, 2, 2, 2, 4, 0, 2], dtype=int32)

We insert the cluster indentifier in the `venues_sorted` dataframe, which contains the top 10 venues for each neighborhood. 

In [448]:
# add clustering labels
try:
    venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
except:
    venues_sorted['Cluster Labels'] = kmeans.labels_

And now, merging `venues_sorted` and `pcode` we can create the dataframe `pcode_merged`, that contains all the geographical data of the neighborhood (postal code, borough, latitude, longitude) and the top 10 list of venues.

In [449]:
# merge venues_sorted with pcode to add latitude/longitude for each neighborhood
pcode_merged = pcode.join(venues_sorted.set_index('Neighborhood'), on='Neighborhood')
pcode_merged.dropna(inplace = True, subset=['Cluster Labels'])
pcode_merged['Cluster Labels'] = pcode_merged['Cluster Labels'].astype(int)

print(pcode_merged.shape)
pcode_merged.head(10) # check the last columns!


(102, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,2,Fast Food Restaurant (2),Coffee Shop (2),Hobby Shop (1),Construction & Landscaping (1),Bus Station (1),,,,,
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,2,Breakfast Spot (2),Burger Joint (1),Bar (1),,,,,,,
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,2,Park (2),Fast Food Restaurant (2),Pharmacy (1),Beer Store (1),Rental Car Location (1),Restaurant (1),Fried Chicken Joint (1),Bank (1),Moving Target (1),Bus Line (1)
3,M1G,Scarborough,Woburn,43.770992,-79.216917,2,Coffee Shop (2),Park (2),Business Service (1),,,,,,,
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2,Bakery (2),Coffee Shop (2),Indian Restaurant (2),Gym / Fitness Center (1),Gas Station (1),Caribbean Restaurant (1),Thai Restaurant (1),Chinese Restaurant (1),Athletics & Sports (1),Fried Chicken Joint (1)
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476,2,Ice Cream Shop (2),Fast Food Restaurant (1),Pizza Place (1),Convenience Store (1),Coffee Shop (1),,,,,
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029,2,Coffee Shop (3),Convenience Store (2),Hobby Shop (1),Grocery Store (1),Light Rail Station (1),Sandwich Place (1),Metro Station (1),Chinese Restaurant (1),Intersection (1),Department Store (1)
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577,2,Intersection (3),Diner (2),Bakery (2),Convenience Store (1),Coffee Shop (1),Metro Station (1),Park (1),Ice Cream Shop (1),Bus Line (1),Soccer Field (1)
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476,2,Wings Joint (1),Chinese Restaurant (1),Hardware Store (1),Burger Joint (1),Gym / Fitness Center (1),,,,,
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848,2,Thai Restaurant (1),Café (1),College Stadium (1),General Entertainment (1),Diner (1),Park (1),Skating Rink (1),,,


Using `folium`, we plot the map of Toronto with the neighborhood indicated as circles, color coded according to the cluster they belong to.

In [450]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pcode_merged['Latitude'], pcode_merged['Longitude'], pcode_merged['Neighborhood'], pcode_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now we can analyse the clusters extracting the rows of `pcode_merged` that corresponds to each of the cluster labels.

## CLUSTER 0

The cluster 0 includes neighborhoods in the proximity of the city center, the red dots of the map. 
The neighborhoods of this cluster are characterized by a moderate number of venues, mostly bars/coffee shops and some restaurant.

In [456]:
pcode_merged.loc[pcode_merged['Cluster Labels'] == 0, 
                     pcode_merged.columns[[1] + list(range(5, pcode_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,North York,0,Clothing Store (10),Coffee Shop (5),Japanese Restaurant (4),Fast Food Restaurant (4),Bank (2),Convenience Store (2),Tea Room (2),Baseball Field (2),Park (2),Juice Bar (2)
22,North York,0,Coffee Shop (5),Pizza Place (4),Ramen Restaurant (4),Korean Restaurant (3),Middle Eastern Restaurant (3),Sandwich Place (3),Restaurant (2),Café (2),Fast Food Restaurant (2),Japanese Restaurant (2)
38,East York,0,Coffee Shop (4),Sporting Goods Shop (3),Sandwich Place (2),Restaurant (2),Sushi Restaurant (2),Furniture / Home Store (2),Department Store (2),Bank (2),Sports Bar (2),Brewery (2)
42,East Toronto,0,Indian Restaurant (4),Sandwich Place (3),Grocery Store (3),Café (2),Gym (2),Coffee Shop (2),Fast Food Restaurant (2),Restaurant (2),Park (1),Steakhouse (1)
43,East Toronto,0,Café (7),Coffee Shop (5),Bar (4),Bakery (4),Sandwich Place (4),American Restaurant (3),Diner (3),Vietnamese Restaurant (2),Italian Restaurant (2),Gastropub (2)
46,Central Toronto,0,Sporting Goods Shop (4),Clothing Store (4),Café (3),Coffee Shop (3),Dessert Shop (2),Diner (2),Restaurant (2),Italian Restaurant (2),Ramen Restaurant (1),Chinese Restaurant (1)
47,Central Toronto,0,Pizza Place (4),Coffee Shop (4),Café (3),Dessert Shop (3),Italian Restaurant (3),Sandwich Place (3),Gym (3),Sushi Restaurant (2),Pub (2),Indian Restaurant (2)
49,Central Toronto,0,Coffee Shop (7),Italian Restaurant (4),Pharmacy (3),Sushi Restaurant (3),Thai Restaurant (2),Skating Rink (2),Sandwich Place (2),Café (2),Restaurant (2),Pub (2)
51,Downtown Toronto,0,Coffee Shop (6),Pizza Place (3),Grocery Store (3),Park (3),Café (3),Restaurant (3),Chinese Restaurant (2),Italian Restaurant (2),Bakery (2),Gastropub (2)
65,Central Toronto,0,Coffee Shop (5),Café (4),Sandwich Place (3),Pizza Place (3),Vegetarian / Vegan Restaurant (3),Pub (3),American Restaurant (2),Diner (2),Burger Joint (2),Pharmacy (2)


## CLUSTER 1

The cluster 1 includes a single neighborhoods, with a very high concentration of greek restaurants. 

In [452]:
pcode_merged.loc[pcode_merged['Cluster Labels'] == 1, 
                     pcode_merged.columns[[1] + list(range(5, pcode_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
41,East Toronto,1,Greek Restaurant (13),Coffee Shop (8),Café (4),Pub (4),Italian Restaurant (3),Grocery Store (3),Fast Food Restaurant (3),Bakery (2),Furniture / Home Store (2),Bank (2)


## CLUSTER 2

The cluster 2 includes neighborhoods that have a small number of venues, with very diverse types of venues. 

In [453]:
pcode_merged.loc[pcode_merged['Cluster Labels'] == 2, 
                     pcode_merged.columns[[1] + list(range(5, pcode_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,2,Fast Food Restaurant (2),Coffee Shop (2),Hobby Shop (1),Construction & Landscaping (1),Bus Station (1),,,,,
1,Scarborough,2,Breakfast Spot (2),Burger Joint (1),Bar (1),,,,,,,
2,Scarborough,2,Park (2),Fast Food Restaurant (2),Pharmacy (1),Beer Store (1),Rental Car Location (1),Restaurant (1),Fried Chicken Joint (1),Bank (1),Moving Target (1),Bus Line (1)
3,Scarborough,2,Coffee Shop (2),Park (2),Business Service (1),,,,,,,
4,Scarborough,2,Bakery (2),Coffee Shop (2),Indian Restaurant (2),Gym / Fitness Center (1),Gas Station (1),Caribbean Restaurant (1),Thai Restaurant (1),Chinese Restaurant (1),Athletics & Sports (1),Fried Chicken Joint (1)
5,Scarborough,2,Ice Cream Shop (2),Fast Food Restaurant (1),Pizza Place (1),Convenience Store (1),Coffee Shop (1),,,,,
6,Scarborough,2,Coffee Shop (3),Convenience Store (2),Hobby Shop (1),Grocery Store (1),Light Rail Station (1),Sandwich Place (1),Metro Station (1),Chinese Restaurant (1),Intersection (1),Department Store (1)
7,Scarborough,2,Intersection (3),Diner (2),Bakery (2),Convenience Store (1),Coffee Shop (1),Metro Station (1),Park (1),Ice Cream Shop (1),Bus Line (1),Soccer Field (1)
8,Scarborough,2,Wings Joint (1),Chinese Restaurant (1),Hardware Store (1),Burger Joint (1),Gym / Fitness Center (1),,,,,
9,Scarborough,2,Thai Restaurant (1),Café (1),College Stadium (1),General Entertainment (1),Diner (1),Park (1),Skating Rink (1),,,


## CLUSTER 3

The cluster 3 includes neighborhoods of downtown toronto, with a large number of venues. Coffee shops at the first place, and then a large number of restaurants.

In [454]:
pcode_merged.loc[pcode_merged['Cluster Labels'] == 3, 
                     pcode_merged.columns[[1] + list(range(5, pcode_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,Downtown Toronto,3,Coffee Shop (8),Japanese Restaurant (5),Burger Joint (3),Restaurant (3),Ramen Restaurant (3),Gay Bar (3),Yoga Studio (2),Mediterranean Restaurant (2),Bookstore (2),Sushi Restaurant (2)
53,Downtown Toronto,3,Coffee Shop (11),Theater (4),Restaurant (4),Park (4),Café (3),Pub (3),Performing Arts Venue (2),Thai Restaurant (2),Mexican Restaurant (2),Bakery (2)
54,Downtown Toronto,3,Coffee Shop (11),Clothing Store (7),Ramen Restaurant (3),Restaurant (3),Diner (2),Japanese Restaurant (2),Thai Restaurant (2),Tea Room (2),Hotel (2),Cosmetics Shop (2)
57,Downtown Toronto,3,Coffee Shop (12),Italian Restaurant (4),Japanese Restaurant (3),Thai Restaurant (3),Burger Joint (3),Art Gallery (3),Clothing Store (3),Fast Food Restaurant (2),Middle Eastern Restaurant (2),Arts & Crafts Store (2)
85,Downtown Toronto,3,Coffee Shop (14),Sandwich Place (4),Italian Restaurant (4),Burger Joint (3),Falafel Restaurant (2),Park (2),Café (2),Gym (2),Burrito Place (2),Gastropub (2)


## CLUSTER 4

Also the cluster 4 has neighborhoods of downtown toronto, with a large number of venues. Again coffee shops at the first place, but here in the second and third places there are hotels and cafè's

In [455]:
pcode_merged.loc[pcode_merged['Cluster Labels'] == 4, 
                     pcode_merged.columns[[1] + list(range(5, pcode_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
55,Downtown Toronto,4,Coffee Shop (9),Café (6),Bakery (5),Seafood Restaurant (4),Restaurant (4),Italian Restaurant (4),Hotel (4),Diner (3),Cosmetics Shop (3),Breakfast Spot (3)
56,Downtown Toronto,4,Coffee Shop (9),Hotel (5),Restaurant (5),Café (5),Beer Bar (4),Seafood Restaurant (3),Cocktail Bar (3),Japanese Restaurant (3),Art Gallery (2),Cheese Shop (2)
58,Downtown Toronto,4,Coffee Shop (8),Café (5),Restaurant (5),Hotel (4),Gym (3),Pizza Place (3),Gastropub (3),Bar (3),Theater (3),Breakfast Spot (2)
59,Downtown Toronto,4,Coffee Shop (12),Hotel (7),Restaurant (3),Brewery (3),Italian Restaurant (3),Café (3),Park (3),Bar (3),Deli / Bodega (2),Plaza (2)
60,Downtown Toronto,4,Coffee Shop (10),Café (7),Hotel (7),Restaurant (6),Bakery (3),Seafood Restaurant (3),Gastropub (3),Japanese Restaurant (3),Bar (3),Pizza Place (2)
61,Downtown Toronto,4,Coffee Shop (12),Hotel (5),Café (5),Restaurant (5),Japanese Restaurant (4),American Restaurant (4),Gastropub (3),Italian Restaurant (3),Concert Hall (3),Seafood Restaurant (3)
69,Downtown Toronto,4,Coffee Shop (13),Café (6),Restaurant (6),Japanese Restaurant (5),Beer Bar (4),Hotel (3),Seafood Restaurant (3),Bakery (3),Gym (3),Breakfast Spot (2)
70,Downtown Toronto,4,Coffee Shop (10),Hotel (6),Café (6),Restaurant (5),Gastropub (3),Asian Restaurant (3),Bar (3),American Restaurant (3),Seafood Restaurant (3),Japanese Restaurant (3)
