# Clustering Neighbourhoods in Toronto 

This notebook tries to use K-Clustering to cluster neighbourhoods in Toronto as an assignment of the coursera Data Science Capstone project.

First, import all required libraries. I'll only use the Pandas library to manipulate the data.

In [1]:
import pandas as pd
import geocoder
import requests
import numpy as np

Now, use the method _read__html( )_ to read the wikipedia link into a pandas dataframe with all tables contained in the page.

The table containing the postal codes is in the first one (index 0) in the **tables** data frame. So I can save the table as a dataframe named **df** by accessing that index.


In [2]:
wiki_link='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
tables=pd.read_html(wiki_link,'Postcode', header=0)
df=tables[0]

Now we can clean up the dataframe:

In [3]:
#rename the column Postcode to Postalcode
df.rename(columns={'Postcode':'Postalcode'},inplace=True)

#remove rows without and assigned borough
not_assigned_borough=df[df['Borough']=='Not assigned']
df.drop(not_assigned_borough.index,inplace=True)
df.reset_index(drop=True,inplace=True)

#Make it so that if a cell has a borough but a Not assigned neighborhood, 
#then the neighborhood will be the same as the borough.
not_assigned_neigh=df[df['Neighbourhood']=='Not assigned']
for i in not_assigned_neigh.index:
    df.replace(to_replace=df['Neighbourhood'][i],value=df['Borough'][i],inplace=True)

Then I can group the dataframe by the postal code and join the neighbourhoods names which are under the same pastal code.

In [4]:
grouped_df=df.groupby(['Postalcode','Borough'],sort=False)['Neighbourhood'].apply(', '.join).reset_index()
grouped_df

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


Now it's possible to filter the postal codes toget the same dataframe shown in the assignment.

In [5]:
pc=['M5G','M2H','M4B','M1J','M4G','M4M','M1R','M9V','M9L','M5V','M1B','M5A']
final_df=grouped_df[grouped_df['Postalcode'].isin(pc)].reset_index(drop=True)
final_df

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M5A,Downtown Toronto,"Harbourfront, Regent Park"
1,M1B,Scarborough,"Rouge, Malvern"
2,M4B,East York,"Woodbine Gardens, Parkview Hill"
3,M4G,East York,Leaside
4,M5G,Downtown Toronto,Central Bay Street
5,M2H,North York,Hillcrest Village
6,M1J,Scarborough,Scarborough Village
7,M9L,North York,Humber Summit
8,M4M,East Toronto,Studio District
9,M1R,Scarborough,"Maryvale, Wexford"


Finally, we get the shape of the dataframes.

In [6]:
print('The shape of the grouped dataframe is:{}.\nThe shape of the final dataframe is:{}.'.format(grouped_df.shape,final_df.shape))

The shape of the grouped dataframe is:(103, 3).
The shape of the final dataframe is:(12, 3).


## Obtaining the Geodata

The geodata for this was obtained using the link provided in coursera containing the geodata as a csv file. 

In [7]:
geodata=pd.read_csv('http://cocl.us/Geospatial_data')

In [8]:
r=list(range(0,len(geodata)))
grouped_df['Latitude']=''
grouped_df['Longitude']=''

Add the information about the Latitude and Longitude of each postal code using the information on the read csv file. 

In [9]:
for i in r:
    grouped_df['Latitude'][grouped_df['Postalcode']==geodata['Postal Code'][i]]=geodata['Latitude'][i]
    grouped_df['Longitude'][grouped_df['Postalcode']==geodata['Postal Code'][i]]=geodata['Longitude'][i]
grouped_df

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7533,-79.3297
1,M4A,North York,Victoria Village,43.7259,-79.3156
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.6543,-79.3606
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.7185,-79.4648
4,M7A,Queen's Park,Queen's Park,43.6623,-79.3895
5,M9A,Etobicoke,Islington Avenue,43.6679,-79.5322
6,M1B,Scarborough,"Rouge, Malvern",43.8067,-79.1944
7,M3B,North York,Don Mills North,43.7459,-79.3522
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.7064,-79.3099
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.6572,-79.3789


Filter so that it contains only the postal codes shown by in the assignment.

In [10]:
final_df=grouped_df[grouped_df['Postalcode'].isin(pc)].reset_index(drop=True)
final_df

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.6543,-79.3606
1,M1B,Scarborough,"Rouge, Malvern",43.8067,-79.1944
2,M4B,East York,"Woodbine Gardens, Parkview Hill",43.7064,-79.3099
3,M4G,East York,Leaside,43.7091,-79.3635
4,M5G,Downtown Toronto,Central Bay Street,43.658,-79.3874
5,M2H,North York,Hillcrest Village,43.8038,-79.3635
6,M1J,Scarborough,Scarborough Village,43.7447,-79.2395
7,M9L,North York,Humber Summit,43.7563,-79.566
8,M4M,East Toronto,Studio District,43.6595,-79.3409
9,M1R,Scarborough,"Maryvale, Wexford",43.7501,-79.2958


# KClustering the Toronto Neighbourhoods

In [11]:
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

print ('Done!')

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Done!


In [12]:
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Done!')

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Done!


In [13]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


I'll filter the dataframe so that we only look into boroughs containing the word "Toronto" in their names. This way, the analysis will be simplified.

In [14]:
toronto_df=grouped_df[grouped_df['Borough'].str.contains('Toronto')]

In [15]:
print('The shape of the dataframe was reduced from {} rows to {} rows.'.format(grouped_df.shape[0],toronto_df.shape[0]))

The shape of the dataframe was reduced from 103 rows to 38 rows.


This is how the postal codes look in the map.

In [16]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Using Foursquare API to segment the selected neighbourhoods.

First, connect with the Foursquare API.

In [17]:
CLIENT_ID = 'AEO45ABRSDOJNO0RZOSZHLORAT5U52BBC1FNH0OEVZSX5UXP' # your Foursquare ID
CLIENT_SECRET = 'VDWRYWJIWT3UH4JOA0SACQXRP322BBQUQWKL5TDZXHSBE5VP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

Using the same analysis used in the kclustering lab: first, define the function to get all nerby venues.

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
toronto_venues = getNearbyVenues(names=toronto_df['Borough'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )



Downtown Toronto
Downtown Toronto
Downtown Toronto
East Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
West Toronto
Downtown Toronto
West Toronto
East Toronto
Downtown Toronto
West Toronto
East Toronto
Downtown Toronto
East Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
West Toronto
Central Toronto
Central Toronto
West Toronto
Central Toronto
Downtown Toronto
West Toronto
Central Toronto
Downtown Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
East Toronto


From there, it's possible to check the shape and ensure that the dataframe contains the correct data.

In [20]:
print(toronto_venues.shape)
toronto_venues.head()

(1700, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Downtown Toronto,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Downtown Toronto,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Downtown Toronto,43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,Downtown Toronto,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Downtown Toronto,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


In [21]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,109,109,109,109,109,109
Downtown Toronto,1287,1287,1287,1287,1287,1287
East Toronto,124,124,124,124,124,124
West Toronto,180,180,180,180,180,180


Use one hot encoding to modify the data structure into numeric inputs.

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now group it by neighborhood

In [23]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,Central Toronto,0.009174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.009174,0.018349,0.0,0.009174,0.0,0.009174,0.0,0.0
1,Downtown Toronto,0.002331,0.000777,0.000777,0.000777,0.000777,0.000777,0.001554,0.002331,0.001554,...,0.000777,0.000777,0.001554,0.000777,0.002331,0.012432,0.002331,0.003885,0.006216,0.000777
2,East Toronto,0.024194,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.008065,0.0,0.0,0.0,0.0,0.0,0.0
3,West Toronto,0.005556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.011111,0.005556,0.0


In [24]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
            venue  freq
0     Coffee Shop  0.08
1  Sandwich Place  0.06
2     Pizza Place  0.06
3            Park  0.05
4            Café  0.05


----Downtown Toronto----
         venue  freq
0  Coffee Shop  0.09
1         Café  0.06
2       Bakery  0.03
3   Restaurant  0.03
4        Hotel  0.03


----East Toronto----
                venue  freq
0         Coffee Shop  0.06
1    Greek Restaurant  0.06
2  Italian Restaurant  0.04
3      Ice Cream Shop  0.04
4                Café  0.04


----West Toronto----
                venue  freq
0                 Bar  0.08
1                Café  0.06
2         Coffee Shop  0.05
3  Italian Restaurant  0.03
4          Restaurant  0.03




In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
2,East Toronto,Coffee Shop,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Brewery,Pizza Place,Park,Yoga Studio,Pub
3,West Toronto,Bar,Café,Coffee Shop,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Breakfast Spot,Bookstore


I'll use four clusters in the algorithm.

In [27]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 3, 2, 1], dtype=int32)

In [28]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [40]:
toronto_merged = toronto_df

# merge toronto_grouped with toronto_df to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')

toronto_merged.head() 

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.6543,-79.3606,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.6572,-79.3789,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
15,M5C,Downtown Toronto,St. James Town,43.6515,-79.3754,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
19,M4E,East Toronto,The Beaches,43.6764,-79.293,2,Coffee Shop,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Brewery,Pizza Place,Park,Yoga Studio,Pub
20,M5E,Downtown Toronto,Berczy Park,43.6448,-79.3733,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant


It is possible to visualize the clusters in the map using folium. Clearly, it was classfied by location.

In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
62,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
67,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
68,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
73,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
74,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
79,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
83,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant
86,Central Toronto,0,Coffee Shop,Sandwich Place,Pizza Place,Café,Park,Dessert Shop,Clothing Store,Sushi Restaurant,Pub,Italian Restaurant


In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,West Toronto,1,Bar,Café,Coffee Shop,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Breakfast Spot,Bookstore
37,West Toronto,1,Bar,Café,Coffee Shop,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Breakfast Spot,Bookstore
43,West Toronto,1,Bar,Café,Coffee Shop,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Breakfast Spot,Bookstore
69,West Toronto,1,Bar,Café,Coffee Shop,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Breakfast Spot,Bookstore
75,West Toronto,1,Bar,Café,Coffee Shop,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Breakfast Spot,Bookstore
81,West Toronto,1,Bar,Café,Coffee Shop,Italian Restaurant,Restaurant,Pizza Place,Bakery,Gym,Breakfast Spot,Bookstore


In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,East Toronto,2,Coffee Shop,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Brewery,Pizza Place,Park,Yoga Studio,Pub
41,East Toronto,2,Coffee Shop,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Brewery,Pizza Place,Park,Yoga Studio,Pub
47,East Toronto,2,Coffee Shop,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Brewery,Pizza Place,Park,Yoga Studio,Pub
54,East Toronto,2,Coffee Shop,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Brewery,Pizza Place,Park,Yoga Studio,Pub
100,East Toronto,2,Coffee Shop,Greek Restaurant,Ice Cream Shop,Café,Italian Restaurant,Brewery,Pizza Place,Park,Yoga Studio,Pub


In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
9,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
15,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
20,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
24,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
25,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
30,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
36,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
42,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant
48,Downtown Toronto,3,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Japanese Restaurant,Bar,American Restaurant,Seafood Restaurant


## Comments about the clusters

It is interesting that the clear cutoff was between boroughs. It also makes sense that the cluster containing all neighborhoods in Downtown Toronto has Hotels as the fourth most common venue category. On the other hand, West Toronto seems to be the most common place to go for a night out having bars as the most common venue category. And Central Toronto for day activities such as the Park.