# Moving to Toronto: Which neighborhood to settle in based on its trending venues

## Problem

Are you moving to Toronto and wondering in which neighborhood to settle, what do neighborhoods have in common and how to they compare based on their trending venues?
Don't look any further, in this report, Toronto neighborhoods are clustered for a better understanding on how to approach making this decision.
The data used is:
- [Canada's open data catalog](https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv) for data on 140 different neighborhoods in Toronto 
- Geocoders API for Toronto neighboorhoods' geo-coordinates
- [Foursquare API](https://foursquare.com) for top 10 trending venues in each neighborhood

## Data Loading and Wrangling

#### Data is extracted from [Canada's open data catalog](https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv) and pre-processed to be transposed, drop un-necessary columns, rows, update data types and formats and adding column names.

In [1]:
import pandas as pd
import requests
import urllib.request, json
import lxml.html as lh
#!pip install folium==0.10.1 #restart kernel
#! pip install geopy
import folium # map rendering library

#### Dataframe as extracted from Canada's website:

In [2]:
# read toronto data
df = pd.read_csv("https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv",encoding='latin1')
df = df.transpose()
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2373,2374,2375,2376,2377,2378,2379,2380,2381,2382
Category,Neighbourhood Information,Neighbourhood Information,Population,Population,Population,Population,Population,Population,Population,Population,...,Mobility,Mobility,Mobility,Mobility,Mobility,Mobility,Mobility,Mobility,Mobility,Mobility
Topic,Neighbourhood Information,Neighbourhood Information,Population and dwellings,Population and dwellings,Population and dwellings,Population and dwellings,Population and dwellings,Population and dwellings,Population and dwellings,Age characteristics,...,Mobility status - Place of residence 1 year ago,Mobility status - Place of residence 5 years ago,Mobility status - Place of residence 5 years ago,Mobility status - Place of residence 5 years ago,Mobility status - Place of residence 5 years ago,Mobility status - Place of residence 5 years ago,Mobility status - Place of residence 5 years ago,Mobility status - Place of residence 5 years ago,Mobility status - Place of residence 5 years ago,Mobility status - Place of residence 5 years ago
Data Source,City of Toronto,City of Toronto,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,...,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001,Census Profile 98-316-X2016001
Characteristic,Neighbourhood Number,TSNS2020 Designation,"Population, 2016","Population, 2011",Population Change 2011-2016,Total private dwellings,Private dwellings occupied by usual residents,Population density per square kilometre,Land area in square kilometres,Children (0-14 years),...,External migrants,Total - Mobility status 5 years ago - 25% samp...,Non-movers,Movers,Non-migrants,Migrants,Internal migrants,Intraprovincial migrants,Interprovincial migrants,External migrants
City of Toronto,,,2731571,2615060,4.50%,1179057,1112929,4334,630.2,398135,...,59945,2556120,1516110,1040015,639060,400950,184120,141135,42985,216835


#### Dataframe after data pre-processing and wrangling:

In [3]:
# clean up the data by adding columns, dropping unneeded rows, reformat numbers and datatypes

df.drop(['Category', 'Topic', 'Data Source'],axis=0, inplace=True)
columns=['Neighborhood','Code']
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
df = df.drop(['City of Toronto'])
df = df[[ 'Neighbourhood Number', 'Population, 2016']]
df = df.reset_index()
df['Population, 2016'] = df['Population, 2016'].str.replace(',', '')
df['Population, 2016'] = df['Population, 2016'].astype({'Population, 2016': 'int64'})
df.rename(columns={"index": "Neighborhood"},inplace=True)
df.head()

Characteristic,Neighborhood,Neighbourhood Number,"Population, 2016"
0,Agincourt North,129,29113
1,Agincourt South-Malvern West,128,23757
2,Alderwood,20,12054
3,Annex,95,30526
4,Banbury-Don Mills,42,27695


#### Next, geopy.geocoders is called to fetch geo-coordinates of each neighborhood and Neighborhood column is renamed to HOOD to match the neighborhood data in geoJSON file that will be imported for plotting the map. 

In [5]:
# obtain lng and lat from geocoders
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
lat=[]
lng=[]

for a in df['Neighborhood']:
    address = a + ', Canada'
    #print(address)
    
  
    geolocator = Nominatim(user_agent="canada_explorer",timeout=4)
    geocode = RateLimiter(geolocator.geocode, min_delay_seconds=4)
    location = geolocator.geocode(address)
    if location == None:
        lat.append(0)
        lng.append(0)
    else:
        lat.append(location.latitude)
        lng.append(location.longitude)
    
df['Latitude']=lat
df['Longitude']=lng


In [6]:
df.drop(df[df['Latitude'] == 0].index, inplace=True)
df.rename(columns={ 'Neighborhood':'HOOD'},inplace=True)

In [7]:
# import the dataset after correcting the missing geocoordinates for the 35 neighborhoods
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_9ec483c0831342be812265e6eb2abb7e = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='Nzfwfm1K8u5-k8lQ31JvQZyJwKhlNstSP4hf4saxiRqU',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_9ec483c0831342be812265e6eb2abb7e.get_object(Bucket='capstone-donotdelete-pr-cldz9kckcnxrd4',Key='total.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_2 = pd.read_csv(body)
df_data_2.head()

df=df_data_2
df.rename(columns={ 'Neighborhood':'HOOD'},inplace=True)
df.drop(['Unnamed: 5','Unnamed: 6'], axis=1,inplace=True)
df.head()

Unnamed: 0,HOOD,Neighbourhood Number,"Population, 2016",Latitude,Longitude
0,Agincourt South-Malvern West,128,23757,31.34295,-110.9584
1,Bedford Park-Nortown,39,23236,43.18766,-79.53312
2,Beechborough-Greenbrook,112,6577,43.22394,79.77175
3,Birchcliffe-Cliffside,122,22291,45.18418,-64.35898
4,Blake-Jones,69,7727,49.30445,-121.62225


#### For plotting purposes, [geoJSON for Toronto](https://raw.githubusercontent.com/adamw523/toronto-geojson/master/simple.geojson) is imported and passed to a dataframe. geoJSON dataframe looks as follows:

In [8]:
# import geoJSON file
!wget --quiet https://raw.githubusercontent.com/adamw523/toronto-geojson/master/simple.geojson -O toronto_neigh.json
toronto_geo = r'toronto_neigh.json'

In [9]:
# pass geoJSON to a dataframe
#!pip install geopandas
import geopandas as gpd
geofile = gpd.read_file(toronto_geo)
geofile.head()

Unnamed: 0,DAUID,PRUID,CSDUID,HOODNUM,HOOD,FULLHOOD,geometry
0,35200879,35,3520005,81,Trinity-Bellwoods,Trinity-Bellwoods (81),"POLYGON ((-79.40428 43.64798, -79.40396 43.647..."
1,35201763,35,3520005,1,West Humber-Clairville,West Humber-Clairville (1),"POLYGON ((-79.56668 43.71179, -79.55673 43.714..."
2,35201852,35,3520005,2,Mount Olive-Silverstone-Jamestown,Mount Olive-Silverstone-Jamestown (2),"POLYGON ((-79.57825 43.73552, -79.57739 43.733..."
3,35201872,35,3520005,21,Humber Summit,Humber Summit (21),"POLYGON ((-79.55762 43.74881, -79.56439 43.747..."
4,35201857,35,3520005,3,Thistletown-Beaumond Heights,Thistletown-Beaumond Heights (3),"POLYGON ((-79.55390 43.72960, -79.55505 43.729..."


In [10]:
# clean names in dataset to match in the GeoJSON file
df.loc[df['HOOD']=='Mimico (includes Humber Bay Shores)', 'HOOD']='Mimico'
df.loc[df['HOOD']=='Markland Wood', 'HOOD']='Markland Woods'
df.loc[df['HOOD']=='Dovercourt-Wallace Emerson-Junction', 'HOOD']='Dovercourt-Wallace Emerson-Juncti'
df.loc[df['HOOD']=='Weston-Pelham Park', 'HOOD']='Weston-Pellam Park'
df.loc[df['HOOD']=='Corso Italia-Davenport', 'HOOD']='Corsa Italia-Davenport'
df.loc[df['HOOD']=='Oakwood Village', 'HOOD']='Oakwood-Vaughan'
df.loc[df['HOOD']=='Caledonia-Fairbank', 'HOOD']='Caledonia-Fairbanks'
df.loc[df['HOOD']=='North St. James Town', 'HOOD']='North St.James Town'
df.loc[df['HOOD']=='Cabbagetown-South St. James Town', 'HOOD']='Cabbagetown-South St.James Town'
df.loc[df['HOOD']=='Danforth', 'HOOD']='Danforth Village - Toronto'
df.loc[df['HOOD']=='Danforth East York', 'HOOD']='Danforth Village - East York'

df.loc[141] = ['Crescent Town','',2007,43.695403,-79.293099]


In [11]:
#geofile[~geofile['HOOD'].isin(df['HOOD'])]


In [12]:
from geopy.geocoders import Nominatim

address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="canada_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

## Methodology: Clustering Toronto Neighborhoods using K-means

#### First, a connection to Foursquare is setup to send HTML requests to it in order to obtain a json file with top 10 venues at a radius of 500m of each location.

In [13]:
CLIENT_ID = 'FGAZJB5V2EAYBCSGTJ33WU1J442EAWRUQSCITAVQJOAT30LO' # your Foursquare ID
CLIENT_SECRET = '0BQWLVZVX2KLWJSHKI535ROY2JF0YXW44NBBYK2UWZVJREAA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Version of Foursquare used: ' + VERSION)

Version of Foursquare used: 20180605


In [14]:
LIMIT = 10
RADIUS = 500

print('Number of trending venues fetched: ' + str(LIMIT) + ' at a radius of '+ str(RADIUS) +'m')

Number of trending venues fetched: 10 at a radius of 500m


In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=RADIUS):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        items = requests.get(url).json()
        if len(items["response"])==0:
            continue
        else:
            results = items ["response"]['groups'][0]['items']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        
        
        

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Then Foursquare requests are sent in the form of html and data is extracted from JSON response:

In [16]:
toronto_venues = getNearbyVenues(names=df['HOOD'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  );
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agincourt South-Malvern West,31.34295,-110.9584,Bustamante Refrigeration,31.34626,-110.96096,Construction & Landscaping
1,Agincourt South-Malvern West,31.34295,-110.9584,"E.D.S. Manufacturing, Inc.",31.345143,-110.962976,Electronics Store
2,Bedford Park-Nortown,43.18766,-79.53312,The Gas Man,43.183843,-79.534599,Other Repair Shop
3,Bedford Park-Nortown,43.18766,-79.53312,The Express,43.190778,-79.536525,Italian Restaurant
4,Briar Hill-Belgravia,43.699482,-79.454643,La Bicicletta,43.700176,-79.455146,Bike Shop


In [17]:
len(toronto_venues['Venue Category'].unique()) #count of unique categories of venues
print('The count of unique categories of venues is ' + str(len(toronto_venues['Venue Category'].unique())))

The count of unique categories of venues is 198


#### Each neighborhood is analysed and one-hot encoding is used to flatten the dataframe:

In [18]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
toronto_onehot.columns.get_loc("Neighborhood")


# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[55]] +list(toronto_onehot.columns[0:55])+list(toronto_onehot.columns[56:])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head(5)


Unnamed: 0,Deli / Bodega,Accessories Store,Afghan Restaurant,American Restaurant,Animal Shelter,Antique Shop,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,...,Train Station,Transportation Service,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Neighborhoods are grouped by rows by taking the mean of the frequency of occurrence of each category and then passed to k-means.

In [19]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Deli / Bodega,Accessories Store,Afghan Restaurant,American Restaurant,Animal Shelter,Antique Shop,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,...,Train Station,Transportation Service,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Yoga Studio
0,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agincourt South-Malvern West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alderwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Annex,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0
4,Banbury-Don Mills,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### K-means is used to cluster and observe the labels of the first 10 rows:

In [20]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 2, 0, 1, 0, 2, 0, 0, 2], dtype=int32)

#### The venues are sorted in descending order for each neighborhood and added to the dataframe:

In [21]:
import numpy as np

# function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]



num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)



In [22]:
#new dataframe that includes cluster and top 10 venues in each cluster

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df
neighborhoods_venues_sorted.rename(columns={'Neighborhood':'HOOD'}, inplace=True)
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('HOOD'), on='HOOD')

toronto_merged.head() 

Unnamed: 0,HOOD,Neighbourhood Number,"Population, 2016",Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt South-Malvern West,128,23757,31.34295,-110.9584,0.0,Construction & Landscaping,Electronics Store,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Farm,Event Space,Egyptian Restaurant,Eastern European Restaurant,Dumpling Restaurant
1,Bedford Park-Nortown,39,23236,43.18766,-79.53312,0.0,Other Repair Shop,Italian Restaurant,Yoga Studio,Distribution Center,Fast Food Restaurant,Farmers Market,Farm,Event Space,Electronics Store,Egyptian Restaurant
2,Beechborough-Greenbrook,112,6577,43.22394,79.77175,,,,,,,,,,,
3,Birchcliffe-Cliffside,122,22291,45.18418,-64.35898,,,,,,,,,,,
4,Blake-Jones,69,7727,49.30445,-121.62225,,,,,,,,,,,


#### Note that some neighborhoods fetch no trending venues from Foursquare and for this reason we drop those neighborhoods due to the lack of information available.

In [23]:
toronto_merged = toronto_merged[toronto_merged['Cluster Labels'].notna()]
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(int)

#### Each cluster is examined and it is noticed that cluster 0 is highlighted by 'Cafes/Bakeries/ Mediterranean', cluster 1 by 'Outdoors/Fitness/Fast Food' and cluster 2 by 'Services (Banking, Hotels, Shopping)'.

### Interact with the map below by __hovering__/__zooming__ over to observe the name and top venues of each neighborhood. In black are the neighborhoods which had no trending venues in Foursquare and therefore are not included in this analysis.

In [24]:
Highlight =[]
for l in toronto_merged['Cluster Labels']:
    if l==0:
        Highlight.append('Cafes/Bakeries/ Mediterranean')
    elif l==1:
         Highlight.append('Outdoors/Fitness/Fast Food')
    else:
         Highlight.append('Services (Banking, Hotels, Shopping)')

toronto_merged['Highlight']=Highlight       
    

In [25]:
geodata=geofile.merge(toronto_merged,on="HOOD")

In [26]:
map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=11)
toronto_geo = r'toronto_neigh.json'


folium.Choropleth(geo_data=toronto_geo,
    data = geodata,
    columns=['HOOD', 'Cluster Labels'],
    key_on='feature.properties.HOOD',
    fill_color='YlGn',
    fill_opacity=0.8, 
    line_opacity=0.1,
    threshold_scale=[0, 1, 2, 3],
    legend_name='Clutered Toronto Neighborhoods').add_to(map_toronto)    

<folium.features.Choropleth at 0x7fcb2e9c1eb8>

In [27]:
style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#000000', 
                                'color':'#000000', 
                                'fillOpacity': 0.50, 
                                'weight': 0.1}


NIL = folium.features.GeoJson(
    
    geodata,
    style_function=style_function, 
    control=False,
    highlight_function=highlight_function, 
    tooltip=folium.features.GeoJsonTooltip(
        fields=['HOOD','Highlight', '1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue','5th Most Common Venue'],
        aliases=['Neighborhood: ','Highlight','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue','5th Most Common Venue'],
        style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;") 
    )
)
map_toronto.add_child(NIL)
map_toronto.keep_in_front(NIL)
folium.LayerControl().add_to(map_toronto)
map_toronto

In [28]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

In [29]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].groupby(['1st Most Common Venue']).count()

In [30]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]].groupby(['1st Most Common Venue']).count()

In [31]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

In [32]:
#toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

## Download Data Set

In [33]:
import base64  
from IPython.display import HTML

def create_download_link( df, title = "Download CSV file", filename = "data.csv"):  
    csv = df.to_csv()
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)


create_download_link(toronto_merged)