## Segmenting and Clustering Neighborhoods in Toronto

## Part 1: Creating a Dataframe containing the Toronto data

#### Importing libraries

First, we need to import all the required libraries.

In [44]:
import requests
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

#### Parsing data from Wikipedia

We are going to work with a list of postal codes that is available in Wikipedia: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

We can use **request** to store all the code of the webpage in a variable.

In [2]:
wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
webpage = requests.get(wiki_url).text

webpage[:1000]

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>List of postal codes of Canada: M - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"Xldp0QpAMMIAAEb-vnEAAAAN","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":942851379,"wgRevisionId":942851379,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in O

Using **BeautifulSoup**, we can parse the content of the webpage and identify the tags that contain the table. We create an object containing those tags.

In [3]:
bs_webpage = BeautifulSoup(webpage,'lxml')
toronto_table = bs_webpage.find('table',{'class':'wikitable sortable'})

Now, let's define an iterable object containing each of the rows of the table. Each row is identified by the HTML tag **\<tr>**.

In [4]:
rows = toronto_table.findAll('tr')

#### Creating and populating the *pandas* dataframe.

Now, let's create an empty *pandas* dataframe with the columns we are going to use. That is: **PostalCode**, **Borough** and **Neighborhood**.

In [5]:
## Define the dataframe columns

column_names = ['PostalCode', 'Borough', 'Neighborhood']

## Create the empty dataframe
toronto_data = pd.DataFrame(columns = column_names)
toronto_data

Unnamed: 0,PostalCode,Borough,Neighborhood


We want to check first whether there is any Postalcode associated with a Borough and without a Neighborhood. Iterating through rows, we can search for **'Not assigned'** values in the **\<td>** fields of each row.

In [6]:
for i in rows:

    ## We only want to check rows that have at least one 'Not assigned' cell.
    if 'Not assigned' in str(i):
        items = i.findAll('td')
        if 'Not assigned' not in str(items[1]) or 'Not assigned' not in str(items[2]):
            print(items)        

Every row with a Borough assigned has also a Neighborhood. Let's populate the dataframe.

In [7]:
for i in rows:
    
    ## Ignore headers and empty rows
    if '<th>' not in str(i) and 'Not assigned' not in str(i):
        row = []
        ## Iterate through cells of each row
        items = i.findAll('td')
        for j in items:
            ## Some cells contain links, while other only have the <td> tag.
            if '<a' in str(j):
                link = j.findAll('a')
                row.append(link[0].get('title')\
                           .replace(' (Toronto)','')\
                           .replace(', Toronto',''))  
            else:
                row.append(str(j)\
                           .replace('<td>','')\
                           .replace('</td>','')\
                           .replace('\n',''))
    
        ## Append each row to the dataframe
        ## If the Postal Code hasn't been loaded before to the DF.        
        if toronto_data[toronto_data['PostalCode'].str.contains(row[0])].empty:
            toronto_data = toronto_data.append({'PostalCode': row[0],
                                                'Borough': row[1],
                                                'Neighborhood': row[2]}, ignore_index = True)
        ## If it has been loaded before, we only need to append the neighborhood.
        else:
            toronto_data.loc[toronto_data['PostalCode'] == row[0],'Neighborhood'] += ', ' + row[2]   

In [8]:
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park


In [9]:
toronto_data.shape

(103, 3)

## Part 2: Appending the coordinates to the dataframe

In order to get the coordinates for each one of the Postal Codes, we will use the CSV file available in the address http://cocl.us/Geospatial_data. This is a CSV file called *'Geospatial_Coordinates.csv'*.



In [10]:
filename = 'Geospatial_Coordinates.csv'
geo_data = pd.read_csv(filename)
geo_data

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


Using **join()**, we can append the **Latitude** and **Longitude** columns from the file, using the PostalCode as a key for the join.

In [12]:
toronto_data = toronto_data.join(geo_data.set_index('Postal Code'), on = 'PostalCode')
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


## Part 3: Explore and cluster the Postal Codes in Toronto

We are going to analyze the venues in each Postal Code of Toronto, like we did with NYC in the lab during the course.

In [13]:
import json
from pandas.io.json import json_normalize

#### Define Foursquare Credentials

In order to protect the privacy of the credentials, I have created a file called **API_Credentials.json**, with the following format:

```json
{
    "CLIENT_ID": "your_client_id",
    "CLIENT_SECRET": "your_client_secret",
    "VERSION": "your_version"
}
```

This allows the notebook to work with any credentials, as long as you create a file with a similar syntax. Now, we only need to load the file and import the values.

In [14]:
credentials = json.load(open('API_Credentials.json'))

CLIENT_ID = credentials['CLIENT_ID']
CLIENT_SECRET = credentials['CLIENT_SECRET']
VERSION = credentials['VERSION']

#### Explore the first Postal Code in our Dataframe

Let's check the first value of the dataframe:

In [15]:
toronto_data.loc[0]

PostalCode             M3A
Borough         North York
Neighborhood     Parkwoods
Latitude           43.7533
Longitude         -79.3297
Name: 0, dtype: object

Let's define a Foursquare URL to get the top 100 venues within a radius of 500 metres.

In [73]:
limit = 100
radius = 500
latitude = toronto_data.loc[0,'Latitude']
longitude = toronto_data.loc[0,'Longitude']

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, limit)

Let's send the GET request and check the results.

In [17]:
results = requests.get(url).json()

We are going to re use the function that extracts the category for a venue, given the content of the JSON file for that venue.

In [18]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now, let's clean the JSON file and build a *pandas* dataframe.

In [19]:
venues = json_normalize(results['response']['groups'][0]['items'])

filtered_columns = ['venue.name','venue.categories','venue.location.lat','venue.location.lng']
venues = venues.loc[:, filtered_columns]

venues['venue.categories'] = venues.apply(get_category_type,axis = 1)

venues.columns = [col.split(".")[-1] for col in venues.columns]
venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,TTC stop - 44 Valley Woods,Bus Stop,43.755402,-79.333741


Let's define the function to repeat the same process to all the postal codes in Toronto.

In [31]:
def getNearbyVenues(postalcodes, latitudes, longitudes, radius = 500):
    
    venues_list = []
    
    for code, lat, lng in zip(postalcodes, latitudes, longitudes):
        print(code + ', ', end ="")
        
    ## Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            limit)
        
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        venues_list.append([(code,
                             lat,
                             lng,
                             v['venue']['name'],
                             v['venue']['location']['lat'],
                             v['venue']['location']['lng'],
                             v['venue']['categories'][0]['name']) for v in results])
        
    venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    venues.columns = ['PostalCode',
                      'PostalCode Latitude',
                      'PostalCode Longitude',
                      'Venue',
                      'Venue Latitude',
                      'Venue Longitude',
                      'Venue Category']
        
    return (venues)

Create a dataframe with the function we just defined:

In [32]:
toronto_venues = getNearbyVenues(postalcodes = toronto_data['PostalCode'],
                                 latitudes = toronto_data['Latitude'],
                                 longitudes = toronto_data['Longitude']
                                )

M3A, M4A, M5A, M6A, M7A, M9A, M1B, M3B, M4B, M5B, M6B, M9B, M1C, M3C, M4C, M5C, M6C, M9C, M1E, M4E, M5E, M6E, M1G, M4G, M5G, M6G, M1H, M2H, M3H, M4H, M5H, M6H, M1J, M2J, M3J, M4J, M5J, M6J, M1K, M2K, M3K, M4K, M5K, M6K, M1L, M2L, M3L, M4L, M5L, M6L, M9L, M1M, M2M, M3M, M4M, M5M, M6M, M9M, M1N, M2N, M3N, M4N, M5N, M6N, M9N, M1P, M2P, M4P, M5P, M6P, M9P, M1R, M2R, M4R, M5R, M6R, M7R, M9R, M1S, M4S, M5S, M6S, M1T, M4T, M5T, M1V, M4V, M5V, M8V, M9V, M1W, M4W, M5W, M8W, M9W, M1X, M4X, M5X, M8X, M4Y, M7Y, M8Y, M8Z, 

In [33]:
toronto_venues.head()

Unnamed: 0,PostalCode,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,M3A,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,M4A,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,M4A,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [34]:
toronto_venues.shape

(2251, 7)

### Analyze each Postal Code

One Hot Encoding:

In [36]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['PostalCode'] = toronto_venues['PostalCode']

fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,PostalCode,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [37]:
toronto_onehot.shape

(2251, 271)

Now, let's group rows by Postal Code and by taking the mean of the frequency of occurence of each category.

In [38]:
toronto_grouped = toronto_onehot.groupby('PostalCode').mean().reset_index()
toronto_grouped

Unnamed: 0,PostalCode,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,M9N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
96,M9P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
97,M9R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
98,M9V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [39]:
toronto_grouped.shape

(100, 271)

Let's check all of the postal codes.

In [41]:
num_top_venues = 5

for code in toronto_grouped['PostalCode']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['PostalCode'] == code].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M9W----
                  venue  freq
0  Fast Food Restaurant   1.0
1         Movie Theater   0.0
2                Market   0.0
3        Massage Studio   0.0
4        Medical Center   0.0


----M9W----
                       venue  freq
0                        Bar  0.33
1             History Museum  0.33
2                Golf Course  0.33
3          Accessories Store  0.00
4  Middle Eastern Restaurant  0.00


----M9W----
                 venue  freq
0                  Spa  0.14
1   Mexican Restaurant  0.14
2         Intersection  0.14
3  Rental Car Location  0.14
4    Electronics Store  0.14


----M9W----
                       venue  freq
0                Coffee Shop  0.50
1          Korean Restaurant  0.25
2          Convenience Store  0.25
3  Middle Eastern Restaurant  0.00
4                      Motel  0.00


----M9W----
                  venue  freq
0      Hakka Restaurant  0.11
1    Athletics & Sports  0.11
2                Lounge  0.11
3   Fried Chicken Joint  0.11
4  Carib

To put the results into a *pandas* dataframe, we are going to reuse the function to sort the venues in descending order.

In [48]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now, let's create the new dataframe and display the top 10 venues for each Postal Code.

In [60]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

postalcode_venues_sorted = pd.DataFrame(columns=columns)
postalcode_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']

for ind in np.arange(toronto_grouped.shape[0]):
    postalcode_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

postalcode_venues_sorted.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Fast Food Restaurant,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio,Dim Sum Restaurant
1,M1C,Bar,Golf Course,History Museum,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
2,M1E,Spa,Electronics Store,Breakfast Spot,Rental Car Location,Medical Center,Mexican Restaurant,Intersection,Yoga Studio,Doner Restaurant,Distribution Center
3,M1G,Coffee Shop,Korean Restaurant,Convenience Store,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
4,M1H,Lounge,Bank,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Athletics & Sports,Gas Station,Bakery,Hakka Restaurant,Dumpling Restaurant


## Cluster Postal Codes

In [66]:
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

from sklearn.cluster import KMeans

In [62]:
## Set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('PostalCode', 1)

## Run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(toronto_grouped_clustering)

## Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 4, 0, 0, 0, 0])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each Postal Code.

In [63]:
## Add k-means labels
postalcode_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = toronto_data

## Merge grouped data to add lat/long for each postal code
toronto_merged = toronto_merged.join(postalcode_venues_sorted.set_index('PostalCode'), on = 'PostalCode')
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4.0,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Intersection,Coffee Shop,Financial or Legal Service,Portuguese Restaurant,Hockey Arena,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,0.0,Coffee Shop,Bakery,Café,Pub,Park,Breakfast Spot,Restaurant,Theater,Mexican Restaurant,Yoga Studio
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,0.0,Accessories Store,Clothing Store,Coffee Shop,Boutique,Miscellaneous Shop,Furniture / Home Store,Event Space,Vietnamese Restaurant,Discount Store,Comic Shop
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,0.0,Coffee Shop,Park,Burger Joint,Music Venue,Seafood Restaurant,Sandwich Place,Burrito Place,Café,Portuguese Restaurant,Chinese Restaurant


And we can now visualize the results!

In [108]:
# Create Map
latitude = 43.6532
longitude = -79.3832
toronto_map = folium.Map(location = [latitude, longitude], zoom_start = 10)

## Color scheme
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

## Add markers
markers_colors = []

toronto_dropped = toronto_merged.dropna()

for lat, lon, poi, cluster in zip(toronto_dropped['Latitude'], toronto_dropped['Longitude'], toronto_dropped['Neighborhood'], toronto_dropped['Cluster Labels']):
    
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster - 1)],
        fill=True,
        fill_color=rainbow[int(cluster - 1)],
        fill_opacity=0.7).add_to(toronto_map)

toronto_map

## Examine the clusters

### Cluster 0

In [110]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0.0,Intersection,Coffee Shop,Financial or Legal Service,Portuguese Restaurant,Hockey Arena,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run
2,Downtown Toronto,0.0,Coffee Shop,Bakery,Café,Pub,Park,Breakfast Spot,Restaurant,Theater,Mexican Restaurant,Yoga Studio
3,North York,0.0,Accessories Store,Clothing Store,Coffee Shop,Boutique,Miscellaneous Shop,Furniture / Home Store,Event Space,Vietnamese Restaurant,Discount Store,Comic Shop
4,Downtown Toronto,0.0,Coffee Shop,Park,Burger Joint,Music Venue,Seafood Restaurant,Sandwich Place,Burrito Place,Café,Portuguese Restaurant,Chinese Restaurant
6,Scarborough,0.0,Fast Food Restaurant,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio,Dim Sum Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,0.0,Coffee Shop,Café,Restaurant,Steakhouse,Japanese Restaurant,Gym,American Restaurant,Seafood Restaurant,Bar,Gastropub
98,Etobicoke,0.0,River,Pool,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
99,Downtown Toronto,0.0,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Mediterranean Restaurant,Hotel,Café,Pizza Place,Burger Joint
100,East Toronto,0.0,Light Rail Station,Yoga Studio,Spa,Auto Workshop,Brewery,Burrito Place,Comic Shop,Farmers Market,Fast Food Restaurant,Garden


### Cluster 1

In [111]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Etobicoke,1.0,Jewelry Store,Yoga Studio,Dumpling Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant


### Cluster 2

In [112]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,2.0,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant
101,Etobicoke,2.0,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant


### Cluster 3

In [113]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,North York,3.0,Cafeteria,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio,College Stadium


### Cluster 4

In [114]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,4.0,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
21,York,4.0,Park,Women's Store,Market,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
32,Scarborough,4.0,Playground,Convenience Store,Yoga Studio,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
35,East York,4.0,Park,Coffee Shop,Convenience Store,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
40,North York,4.0,Park,Airport,Snack Place,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
61,Central Toronto,4.0,Park,Swim School,Bus Line,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
64,York,4.0,Park,Convenience Store,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
66,North York,4.0,Park,Bank,Convenience Store,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
85,Scarborough,4.0,Park,Playground,Yoga Studio,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
91,Downtown Toronto,4.0,Park,Trail,Playground,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
