## Exploring and Clustering the Neighborhoods in Toronto

In this notebook, I'll use the file "Neighborhoods with locations.csv" as the data source, and kmeans as the tool to perform clustering.

In [4]:
import pandas as pd
import numpy as np
import requests
import json
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim

print("all libraries are imported")

all libraries are imported


### Explore Dataset 
##### let's load the data and transform it into *pandas* dataframe

In [46]:
df=pd.read_csv('Neighborhoods with locations.csv')
df.head(3)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711


In [48]:
toronto_data=df[df['Borough'].str.contains('Toronto')].reset_index(drop=True)
toronto_data.head(3)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572


##### Create a map of Toronto with neighborhoods labels on top

In [49]:
address='Toronto, Ontario'

geolocator=Nominatim(user_agent="ny_explorer")
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The coordinate of Toronto are {},{}.'.format(latitude, longitude))

The coordinate of Toronto are 43.6534817,-79.3839347.


In [50]:
toronto_map=folium.Map(location=[latitude,longitude],zoom_start=10)

#add marks of neighborhoods to map
for lat, lon,borough, hood in zip(toronto_data['Latitude'],toronto_data['Longitude'],
                                 toronto_data['Borough'],toronto_data['Neighborhood']):
    label='{}, {}'.format(hood,borough)
    label=folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat,lon],
                        radius=5,
                        popup=label,
                        color='green',
                        fill=True,
                        fill_color='red',
                        fill_opacity=0.7,
                        parse_html=False).add_to(toronto_map)
toronto_map

#### The neighborhood I'm going to explore is Central Bay street since I live there.

In [51]:
bay_data=toronto_data[toronto_data['Neighborhood']=='Central Bay Street'].reset_index(drop=True)
bay_latitude=bay_data.loc[0,'Latitude']
bay_longitude=bay_data.loc[0,'Longitude']

print('The neighborhood I explore is Central Bay Street, with geographical coordinates are {},{}'.format(
        bay_latitude, bay_longitude))

The neighborhood I explore is Central Bay Street, with geographical coordinates are 43.6579524,-79.3873826


Then, I'm going to start utilizing the Foursquare API to explore the neighborhoods and segment them

Firstly, define the client_id and client_secret and version

In [52]:
CLIENT_ID = 'Y40FH1X1BXZ0NI12IU0ZDV0Y3SZI5WAEAP2PQJBVKDIPGNH3' # your Foursquare ID
CLIENT_SECRET = 'TDAE2HKVLAIU2B4H3QEOU5JBJAXRERU0TE5HAJMXLDIEJWGQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Y40FH1X1BXZ0NI12IU0ZDV0Y3SZI5WAEAP2PQJBVKDIPGNH3
CLIENT_SECRET:TDAE2HKVLAIU2B4H3QEOU5JBJAXRERU0TE5HAJMXLDIEJWGQ


#### Now, to get the top 100 venues that are in Bay Street within a radius of 500 meters

create the url to call the APT:

In [53]:
LIMIT=100 
radius=500
url='http://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, CLIENT_SECRET,VERSION,bay_latitude, bay_longitude,radius, LIMIT)

In [35]:
result=requests.get(url).json()
result

{'meta': {'code': 200, 'requestId': '5f2bfd8dfa01662f62b9a399'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 66,
  'suggestedBounds': {'ne': {'lat': 43.6624524045, 'lng': -79.38117421839567},
   'sw': {'lat': 43.6534523955, 'lng': -79.39359098160432}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '537d4d6d498ec171ba22e7fe',
       'name': "Jimmy's Coffee",
       'location': {'address': '82 Gerrard Street W',
        'crossStreet': 'Gerrard & LaPlante',
        'lat': 43.65842123574496,
        'lng': -79.38561319551111,
        'label

##### Define **get_catogory_type** function 

In [39]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Structure the json file into *pandas* dataframe

In [42]:
venues=result['response']['groups'][0]['items']
venues=json_normalize(venues)

columns=['venue.name','venue.categories','venue.location.lat','venue.location.lng']
venues=venues.loc[:,columns]

#get the category for each row
venues['venue.categories']=venues.apply(get_category_type, axis=1)

venues.columns=[col.split(".")[-1] for col in venues.columns]
venues.head()

  


Unnamed: 0,name,categories,lat,lng
0,Jimmy's Coffee,Coffee Shop,43.658421,-79.385613
1,Tim Hortons,Coffee Shop,43.65857,-79.385123
2,Somethin' 2 Talk About,Middle Eastern Restaurant,43.658395,-79.385338
3,Hailed Coffee,Coffee Shop,43.658833,-79.383684
4,Neo Coffee Bar,Coffee Shop,43.66014,-79.38587


### Get Venues for all neighborhoods

In [54]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]          #empty list to store venues 
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']   #the explored relevant infor are stored in items   
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results]) # v is list of dictionary, with the key representing different feature
      #venues_list is a list of tuples
    
    nearby_venues = pd.DataFrame([item 
                                  for venue_list in venues_list   #venue_list=(name,lat,lng,venue_)
                                  for item in venue_list])        #df is a list of dictionaries
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [55]:
all_venues=getNearbyVenues(toronto_data['Neighborhood'],
                          toronto_data['Latitude'],
                          toronto_data['Longitude'],
                          radius)

The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West,  Lawrence Park
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North & West, Forest Hill Road Park
The Annex, North Midtown, Yorkville
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Stn A PO Boxes
First Canadian Place, Underground city
Christie
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High

In [58]:
print('There are {} unique categories of venues near Toronto'.format(len(all_venues['Venue Category'].unique())))

There are 234 unique categories of venues near Toronto


In [62]:
all_venues.head(3)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub


### Analyze Each Neighborhood near Toronto

In [90]:
toronto_onehot = pd.get_dummies(all_venues[['Venue Category']], prefix="", prefix_sep="").drop(['Neighborhood'],axis=1)
                                                             #prefix is the string to append df columns'names
                                                                                                  #prefic_sep is the string to seperate names, using together with prefix
# add neighborhood column 
toronto_onehot.insert(0,'Neighborhood',  all_venues['Neighborhood'], True)
toronto_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [91]:
toronto_onehot.shape

(1627, 234)

#### Next, group neighborhoods and calculate the frequency of occurance of each category

In [95]:
toronto_group=toronto_onehot.groupby(['Neighborhood'],axis=0).mean().reset_index()
toronto_group.head(3)

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625


In [97]:
toronto_group.shape

(39, 234)

Print each neighborhood and their top 5 common venues

In [98]:
num_top_venues = 5

for hood in toronto_group['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_group[toronto_group['Neighborhood'] == hood].T.reset_index() #T is to transpose
    temp.columns = ['venue','freq']   #after transposing, we have 2 coloumns, the first listing all columns'names
    temp = temp.iloc[1:]              #drop the fist row, since it contain no calculatable information
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values(by='freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0         Coffee Shop  0.11
1          Restaurant  0.04
2                Café  0.04
3  Seafood Restaurant  0.04
4              Bakery  0.04


----Brockton, Parkdale Village, Exhibition Place----
            venue  freq
0            Café  0.12
1          Bakery  0.08
2  Breakfast Spot  0.08
3     Coffee Shop  0.08
4   Grocery Store  0.04


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                  venue  freq
0    Light Rail Station  0.12
1           Yoga Studio  0.06
2         Auto Workshop  0.06
3  Fast Food Restaurant  0.06
4        Farmers Market  0.06


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                venue  freq
0     Airport Service  0.13
1    Airport Terminal  0.13
2         Coffee Shop  0.07
3    Sculpture Garden  0.07
4  Airport Food Court  0.07


----Central Bay Street----
                 venue  

##### Next, put the above information into a *pandas* dataframe

In [137]:
# define a function to sort the frequency in decending order
def return_most_common_venues(row, num_top_venues):
    row_categories=row.iloc[1:]
    row_categories_sorted=row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the dataframe showing the top 10 venues for each neighborhood

In [147]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:      
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))     #try<语句1>: 尝试执行的命令，一般是检查异常值，这里直接执行append，所以不需要else了
                                                                #except<语句2>：第一次执行try语句1失败的时候改为执行语句2
                                                                   #else(optional)<语句3>：如果try没有失败，则执行语句命令3
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_group['Neighborhood'] #给df第一列赋值

for ind in np.arange(toronto_group.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_group.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Café,Restaurant,Cocktail Bar,Beer Bar,Farmers Market,Seafood Restaurant,Cheese Shop,Bakery,Eastern European Restaurant
1,"Brockton, Parkdale Village, Exhibition Place",Café,Bakery,Breakfast Spot,Coffee Shop,Yoga Studio,Gym,Pet Store,Performing Arts Venue,Nightclub,Italian Restaurant
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Pizza Place
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Terminal,Coffee Shop,Boutique,Rental Car Location,Sculpture Garden,Plane,Boat or Ferry,Harbor / Marina,Airport Lounge
4,Central Bay Street,Coffee Shop,Sandwich Place,Japanese Restaurant,Café,Italian Restaurant,Salad Place,Bubble Tea Shop,Burger Joint,Department Store,New American Restaurant


## Clustering

As the same as the lab, I'm going to cluster 5 neighborhoods

In [143]:
k=5
toronto_clustering=toronto_group.drop(['Neighborhood'],axis=1)
kmeans=KMeans(n_clusters=k, random_state=0).fit(toronto_clustering)

print('There are {} unique labels assigned to each neighborhood'.format(len(set(kmeans.labels_))))
kmeans.labels_[0:10]
set(kmeans.labels_)

There are 5 unique labels assigned to each neighborhood


{0, 1, 2, 3, 4}

Create a *pandas* dataframe to store the clusters along with the top 10 venues for neighborhoods sharing the same PostalCode

In [148]:
#first, add the cluster labels to the neighborhood lists
neighborhoods_venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head()

toronto_merged=toronto_data.join(neighborhoods_venues_sorted.set_index('Neighborhood'),on='Neighborhood')
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Health Food Store,Pub,Trail,Dog Run,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Yoga Studio
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Italian Restaurant,Coffee Shop,Restaurant,Ice Cream Shop,Furniture / Home Store,Fruit & Vegetable Store,Pub,Pizza Place,Lounge
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,0,Park,Sushi Restaurant,Food & Drink Shop,Light Rail Station,Burrito Place,Italian Restaurant,Restaurant,Liquor Store,Pub,Ice Cream Shop
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Bakery,Gastropub,Brewery,American Restaurant,Comfort Food Restaurant,Bookstore,Seafood Restaurant,Sandwich Place
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,2,Park,Bus Line,Swim School,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


Now, we're ready to visualize the resulting clusters

In [155]:
cluster_map=folium.Map(location=[latitude,longitude], zoom_start=10) #the coordinates for Toronto, Ontario

#set the color scheme
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys))) #len(ys) should be the number of k,感觉没太大必要，可以直接设定k
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors=[]
for lat, lon, hood, cluster in zip(toronto_merged['Latitude'],toronto_merged['Longitude'],toronto_merged['Neighborhood'],toronto_merged['Cluster Labels']):
    label=folium.Popup(str(hood)+'Cluster '+str(cluster), parse_html=True)
    folium.CircleMarker([lat,lon],
                       radius=5,
                       popup=label,
                       color=rainbow[int(cluster-1)],
                       fill=True,
                       fill_color=rainbow[int(cluster-1)],
                       fill_opacity=0.6).add_to(cluster_map)
cluster_map

## Finally, we can examine Clusters

we can examine each cluster and determine the discriminating venues categories that distinguish each cluster.

Firstly, to better visualize the destribution of different venues for each cluster group, define a function `get_destribution()`:

In [197]:
def get_destribution(df):
    result=pd.DataFrame(columns=['Category','Count as 1st Most Common Venue'])
    result[result.columns[0]]=df['1st Most Common Venue'].unique()

    count_list=[]
    for cat in result.iloc[:,0]:
        count=sum(df['1st Most Common Venue']==cat)
        count_list.append(count)
    result[result.columns[1]]=count_list
    result.sort_values(by='Count as 1st Most Common Venue', ascending=False, axis=0,inplace=True)
    result.set_index('Category',inplace=True)
    return result

In [171]:
cluster1=toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster1

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Health Food Store,Pub,Trail,Dog Run,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Yoga Studio
1,East Toronto,0,Greek Restaurant,Italian Restaurant,Coffee Shop,Restaurant,Ice Cream Shop,Furniture / Home Store,Fruit & Vegetable Store,Pub,Pizza Place,Lounge
2,East Toronto,0,Park,Sushi Restaurant,Food & Drink Shop,Light Rail Station,Burrito Place,Italian Restaurant,Restaurant,Liquor Store,Pub,Ice Cream Shop
3,East Toronto,0,Café,Coffee Shop,Bakery,Gastropub,Brewery,American Restaurant,Comfort Food Restaurant,Bookstore,Seafood Restaurant,Sandwich Place
5,Central Toronto,0,Park,Pizza Place,Breakfast Spot,Dog Run,Sandwich Place,Food & Drink Shop,Department Store,Hotel,Gym,Gym / Fitness Center
6,Central Toronto,0,Clothing Store,Coffee Shop,Sporting Goods Shop,Fast Food Restaurant,Diner,Metro Station,Mexican Restaurant,Park,Chinese Restaurant,Café
7,Central Toronto,0,Sandwich Place,Dessert Shop,Gym,Italian Restaurant,Café,Sushi Restaurant,Pizza Place,Coffee Shop,Greek Restaurant,Farmers Market
9,Central Toronto,0,Coffee Shop,Pub,Sushi Restaurant,Bagel Shop,Supermarket,Sports Bar,Bank,Fried Chicken Joint,Pizza Place,Liquor Store
11,Downtown Toronto,0,Bakery,Coffee Shop,Café,Pizza Place,Italian Restaurant,Pub,Park,Restaurant,Convenience Store,Gift Shop
12,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Café,Pub,Men's Store,Mediterranean Restaurant,Hotel


In [198]:
get_destribution(cluster1)

Unnamed: 0_level_0,Count as 1st Most Common Venue
Category,Unnamed: 1_level_1
Coffee Shop,12
Café,6
Park,2
Clothing Store,2
Sandwich Place,2
Health Food Store,1
Greek Restaurant,1
Bakery,1
Airport Service,1
Grocery Store,1


#### Cluster 2

In [187]:
cluster2=toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster2

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Toronto,1,Restaurant,Yoga Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store


In [200]:
get_destribution(cluster2)

Unnamed: 0_level_0,Count as 1st Most Common Venue
Category,Unnamed: 1_level_1
Restaurant,1


#### Cluster 3

In [189]:
cluster3=toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster3

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,2,Park,Bus Line,Swim School,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


In [199]:
get_destribution(cluster3)

Unnamed: 0_level_0,Count as 1st Most Common Venue
Category,Unnamed: 1_level_1
Park,1


#### Cluster 4

In [191]:
cluster4=toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster4

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,3,Park,Trail,Playground,Cupcake Shop,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
23,Central Toronto,3,Park,Jewelry Store,Trail,Sushi Restaurant,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


In [201]:
get_destribution(cluster4)

Unnamed: 0_level_0,Count as 1st Most Common Venue
Category,Unnamed: 1_level_1
Park,2


#### Cluster 5

In [193]:
cluster5=toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
cluster5

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,4,Home Service,Garden,Yoga Studio,Deli / Bodega,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


In [202]:
get_destribution(cluster5)

Unnamed: 0_level_0,Count as 1st Most Common Venue
Category,Unnamed: 1_level_1
Home Service,1


From the above destribution tables, we can see that cluster1 has many coffee shops, while the others don't have any.