# Segmenting and Clustering Neighborhoods in Toronto

## Table of Contents

1. <a href="#section1">Web Scraping with BeautifulSoup</a>
2. <a href="#section2">Joining location coordinates</a>
3. <a href="#section3">Clustering Neighbourhoods</a>

######################################################################################################

Importing the needed libraries:

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim
import folium

<a id='section1'></a>
## 1. Web Scraping with BeautifulSoup:

**Objective:** to build a code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe:

Assigning the target web page to a variable:

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Using <code>requests</code> library to get the sourcecode of the target web page:

In [3]:
url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
src= url.text

Creating a <code> BeautifulSoup </code> object:

In [4]:
soup = BeautifulSoup(src, 'lxml')

Extracting the table from the page:

In [5]:
table = soup.find_all('table')[0]

Reading the HTML table into a <code>list</code> of <code> DataFrame </code> objects:

In [6]:
df = pd.read_html(str(table))[0]

In [7]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Dropping the rows where the **Borough** is _Not assigned_ :

In [8]:
df = df[~df.Borough.str.startswith('Not')]

In [9]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


Grouping the rows with similar **Poscode** and joining the **Neighbourhood**s:

In [10]:
df =  df.groupby(['Postcode','Borough'], sort = False).agg(lambda x: ', '.join(x))

In [11]:
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Neighbourhood
Postcode,Borough,Unnamed: 2_level_1
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,Harbourfront
M6A,North York,"Lawrence Heights, Lawrence Manor"
M7A,Downtown Toronto,Queen's Park


Resetting the index:

In [12]:
df.reset_index(level=['Postcode','Borough'], inplace=True)

In [13]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


Checking the shape of the dataframe:

In [14]:
df.shape

(103, 3)

#####################################################################################################

<a id='section2'></a>
## 2. Joining Location Coordinates:

**Objective:** Adding the <code>Latitude</code> and <code>Longitude</code> to the previous dataframe:

Converting the provided <code>Geospatial_Coordinates.csv</code> to pandas <code> DataFrame</code>:

In [15]:
df1=pd.read_csv('Geospatial_Coordinates.csv')

In [16]:
df1.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


>Since the merge will be on the _Postal Code_ column, the column name should match in both dataframes; thus, renaming the <code>Postal Code</code> in <code>df1</code> to <code>Postcode</code>:

In [17]:
df1.rename(columns={'Postal Code':'Postcode'}, inplace=True)

In [18]:
df1.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merging the two dataframes:

In [19]:
merged=pd.merge(df, df1, on='Postcode')

In [20]:
merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


Checking the Shape of the merged dataframe:

In [21]:
merged.shape

(103, 5)

The `merged` dataframe shape matches the first dataframe `df`... Job is done correctly

In [22]:
print('The dataframe has {} boroughs and {} Neighbourhood.'.format(
        len(merged['Borough'].unique()),
        merged.Neighbourhood.shape[0]
    )
)

The dataframe has 10 boroughs and 103 Neighbourhood.


####################################################################################################

<a id='section3'></a>
# 3. Clustering Neighbourhoods:

**Objective:** Exploring and clustering the neighborhoods in Toronto:

Using `geopy` library to get the latitude and longitude values of Toronto.

In [23]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Creating a map of Toronto with neighborhoods superimposed on top.

In [24]:
# create map of the merged dataframe using latitude and longitude values
map_merged= folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, postcode in zip(merged['Latitude'], merged['Longitude'], merged['Borough'], merged['Postcode']):
    label = '{}, {}'.format(postcode, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_merged)  

map_merged

Slicing the original dataframe to work with boroughs that contain the word `Toronto` only:

In [25]:
toronto_data = merged[merged['Borough'].str.contains('Toronto')].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


Visualizing the sliced dataframe:

In [26]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Defining Foursquare Credentials and Version:

In [27]:
{
    "tags": [
        "hide_input",
    ]
}

CLIENT_ID = 'RUQ0MUT3IS5KAHLZ3CYLM1YEETINYV0QLK20OUBLKPHGQCOE' # your Foursquare ID
CLIENT_SECRET = 'R3M2YNFFDIJ2CQBW2L3QXQ5PKLHGIBUOX00W3PRBOF4N2WK3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

# print('Your credentails:')
# print('CLIENT_ID: ' + CLIENT_ID)
# print('CLIENT_SECRET:' + CLIENT_SECRET)

Creating a fucntion to get the top 10 venues within 500 meters radius for all neighbourhood in Toronto:

In [28]:
LIMIT = 10 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
   
    return(nearby_venues)

Writing the code to run the above function on each neighbourhood and creating a new dataframe called `_toronto_venues_` :

In [30]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Harbourfront
Queen's Park
Ryerson, Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide, King, Richmond
Dovercourt Village, Dufferin
Harbourfront East, Toronto Islands, Union Station
Little Portugal, Trinity
The Danforth West, Riverdale
Design Exchange, Toronto Dominion Centre
Brockton, Exhibition Place, Parkdale Village
The Beaches West, India Bazaar
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North, Forest Hill West
High Park, The Junction South
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
Harbord, University of Toronto
Runnymede, Swansea
Moore Park, Summerhill East
Chinatown, Grange Park, Kensington Market
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown, St. James Town
Fir

In [31]:
print(toronto_venues.shape)
toronto_venues.head()

(350, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


Number of unique categories returned:

In [32]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 122 uniques categories.


Hot-Encoding the venues categories as a step to bild the clustering model:

In [33]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.shape
toronto_onehot.head()

Unnamed: 0,Neighbourhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,...,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Grouping rows by neighbourhoods and taking the mean of the frequency of occurance of each category:

In [34]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()

print('Grouped dataframe shape: ',toronto_grouped.shape)
toronto_grouped.head()

Grouped dataframe shape:  (39, 123)


Unnamed: 0,Neighbourhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,...,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.1,0.1,0.1,0.2,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


 Finding the top 5 most common venues in each neighbourhood:

In [35]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
              venue  freq
0  Greek Restaurant   0.1
1        Steakhouse   0.1
2             Hotel   0.1
3             Plaza   0.1
4        Restaurant   0.1


----Berczy Park----
             venue  freq
0     Concert Hall   0.1
1  Thai Restaurant   0.1
2             Park   0.1
3     Cocktail Bar   0.1
4       Restaurant   0.1


----Brockton, Exhibition Place, Parkdale Village----
          venue  freq
0   Coffee Shop   0.2
1           Gym   0.1
2  Climbing Gym   0.1
3          Café   0.1
4     Pet Store   0.1


----Business Reply Mail Processing Centre 969 Eastern----
                  venue  freq
0         Garden Center   0.1
1           Pizza Place   0.1
2               Brewery   0.1
3            Comic Shop   0.1
4  Fast Food Restaurant   0.1


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
                venue  freq
0      Airport Lounge   0.2
1             Airport   0.1
2          

Putting the above in a dataframe:

First, defining a function to sort the vebues in a descending order:

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Creating the dataframe:

In [37]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

print('Datafram shape: ',neighbourhoods_venues_sorted.shape)
neighbourhoods_venues_sorted.head()

Datafram shape:  (39, 6)


Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide, King, Richmond",Concert Hall,Café,Vegetarian / Vegan Restaurant,Restaurant,Greek Restaurant
1,Berczy Park,Park,Restaurant,Concert Hall,Vegetarian / Vegan Restaurant,Cocktail Bar
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Pet Store,Bar,Café,Breakfast Spot
3,Business Reply Mail Processing Centre 969 Eastern,Brewery,Auto Workshop,Burrito Place,Comic Shop,Farmers Market
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport,Bar,Harbor / Marina,Coffee Shop


#### Clustering Neighbourhoods:

Using `k-means` algorithm to cluster the neighbourhoods into 5 clusters:

In [38]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 3, 1, 1, 2, 3, 2, 2, 3], dtype=int32)

#### Adding the cluster label for each row in the previous dataframe:

In [39]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,3,Breakfast Spot,Spa,Pub,Park,Restaurant
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,3,Coffee Shop,Sushi Restaurant,Italian Restaurant,Distribution Center,Creperie
2,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,1,Pizza Place,Plaza,Comic Shop,Burrito Place,Café
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,BBQ Joint,Food Truck,Cosmetics Shop,Japanese Restaurant,Italian Restaurant
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Trail,Neighborhood,Pub,Health Food Store,Yoga Studio


#### Visualizing the clusters:

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examining Each Cluster to determine the discriminating venue categories that distinguish each cluster:

### Cluster 1:

In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
29,Central Toronto,0,Tennis Court,Restaurant,Park,Playground,Diner
33,Downtown Toronto,0,Park,Playground,Trail,Yoga Studio,Dog Run


### Cluster 2:

In [42]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Downtown Toronto,1,Pizza Place,Plaza,Comic Shop,Burrito Place,Café
3,Downtown Toronto,1,BBQ Joint,Food Truck,Cosmetics Shop,Japanese Restaurant,Italian Restaurant
4,East Toronto,1,Trail,Neighborhood,Pub,Health Food Store,Yoga Studio
5,Downtown Toronto,1,Park,Restaurant,Concert Hall,Vegetarian / Vegan Restaurant,Cocktail Bar
10,Downtown Toronto,1,Performing Arts Venue,Deli / Bodega,Park,Dessert Shop,Sporting Goods Shop
11,West Toronto,1,Pizza Place,Ice Cream Shop,Cuban Restaurant,Wine Bar,Greek Restaurant
12,East Toronto,1,Greek Restaurant,Ice Cream Shop,Yoga Studio,Pub,Brewery
15,East Toronto,1,Park,Pub,Brewery,Italian Restaurant,Fast Food Restaurant
18,Central Toronto,1,Park,Swim School,Bus Line,Dim Sum Restaurant,Yoga Studio
21,Central Toronto,1,Trail,Mexican Restaurant,Sushi Restaurant,Jewelry Store,Yoga Studio


### Cluster 3:

In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
7,Downtown Toronto,2,Café,Grocery Store,Restaurant,Diner,Coffee Shop
8,Downtown Toronto,2,Concert Hall,Café,Vegetarian / Vegan Restaurant,Restaurant,Greek Restaurant
9,West Toronto,2,Bakery,Brewery,Bar,Café,Grocery Store
13,Downtown Toronto,2,Coffee Shop,Café,Tea Room,Hotel,Gym
16,Downtown Toronto,2,Café,Coffee Shop,Gym,Restaurant,Pub
17,East Toronto,2,Coffee Shop,Pet Store,Ice Cream Shop,Sandwich Place,Gay Bar
24,Central Toronto,2,Café,Middle Eastern Restaurant,Park,Vegetarian / Vegan Restaurant,Coffee Shop
30,Downtown Toronto,2,Café,Bakery,Vietnamese Restaurant,Organic Grocery,Mexican Restaurant
35,Downtown Toronto,2,Café,Japanese Restaurant,Bakery,Jewelry Store,Italian Restaurant
36,Downtown Toronto,2,Café,Restaurant,Coffee Shop,Pizza Place,American Restaurant


### Cluster 4:

In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Downtown Toronto,3,Breakfast Spot,Spa,Pub,Park,Restaurant
1,Downtown Toronto,3,Coffee Shop,Sushi Restaurant,Italian Restaurant,Distribution Center,Creperie
6,Downtown Toronto,3,Coffee Shop,Sushi Restaurant,Gastropub,Park,Japanese Restaurant
14,West Toronto,3,Coffee Shop,Pet Store,Bar,Café,Breakfast Spot
20,Central Toronto,3,Breakfast Spot,Gym,Sandwich Place,Hotel,Food & Drink Shop
23,Central Toronto,3,Yoga Studio,Diner,Coffee Shop,Clothing Store,Chinese Restaurant
31,Central Toronto,3,Coffee Shop,Pub,Liquor Store,Restaurant,American Restaurant
37,Downtown Toronto,3,Breakfast Spot,Restaurant,Park,Gastropub,Theme Restaurant


### Cluster 5:

In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
19,Central Toronto,4,Pool,Garden,Yoga Studio,Eastern European Restaurant,Concert Hall


*********