# Segmenting and Clustering Neighborhoods in Toronto

## First part: scraping the data from the wikipedia page


In [4]:
import pandas as pd
import numpy as np

wikiFrame=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df=wikiFrame[0]
df.rename(columns={'Postcode':'PostalCode'}, inplace=True)

#Only process the cells that have an assigned borough.
df=df[df['Borough']!='Not assigned']

#More than one neighborhood can exist in one postal code area, we group them and separate them by comma. 
df=df.groupby(['PostalCode','Borough'],as_index=False)['Neighbourhood'].apply(lambda x: ','.join(x))
df=df.reset_index().rename(columns={0:'Neighbourhood'})

#cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
df['Neighbourhood'].replace('Not assigned',df['Borough'],inplace=True)




In [5]:
df.shape


(103, 3)

## Second part: Create dataframe  containing the longitude and latitude

As the Geocoder package did not work for me I am using the csv file to extract the longitude and latitude


In [8]:
csv=pd.read_csv('http://cocl.us/Geospatial_data')
csv.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [19]:
df2=pd.merge(df,csv,left_on='PostalCode', right_on='Postal Code')
df2.drop('Postal Code', axis=1, inplace=True)
df2.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## Third Part: Clustering neighborhood

First let's create a function that will call the foursquare API to explore around a place and return a dataframe


In [55]:
import requests
CLIENT_ID = 'JXJ1ZKX4RKY5EP5PF5YMPQTDJTHDYET5HWRYZDJGP22VVUOC' # your Foursquare ID
CLIENT_SECRET = 'S0KY1UAL4WI2TMMASIY115LHES5VY2RARZQHNWXBRRAYGTCY' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

LIMIT=10

def get_places_around(neighborhoods, lats,lngs, radius=500):
    list_places=[]
    for neighborhood, lat, lng in zip(neighborhoods,lats,lngs):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                    CLIENT_ID, 
                    CLIENT_SECRET, 
                    VERSION, 
                    lat, 
                    lng, 
                    radius, 
                    LIMIT)
        response=requests.get(url).json()
        

        items=response['response']['groups'][0]['items']
        for item in items:
            list_places.append([neighborhood, lat, lng, item['venue']['name'], item['venue']['location']['lat'], item['venue']['location']['lng'], item['venue']['categories'][0]['name']])
    frame_places=pd.DataFrame(list_places)
    frame_places.columns=['Neighborhood','Latitude','Longitude','Name','Place latitude','Place longitude','Categories']
    return frame_places

In [56]:
frame_places=get_places_around(df2['Neighbourhood'],df2['Latitude'],df2['Longitude'] )
frame_places.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Name,Place latitude,Place longitude,Categories
0,"Rouge,Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge,Malvern",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
3,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
4,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target


Now we are going to transform categories to a one hot vector so we will be able to apply the Kmean algorithm for clustering

In [132]:
from sklearn.cluster import KMeans

one_hot_vector_places=pd.get_dummies(frame_places[['Categories']])
one_hot_vector_places['Neighborhood']=frame_places['Neighborhood']
one_hot_vector_places_grouped=one_hot_vector_places.groupby(['Neighborhood']).mean()
one_hot_vector_places_grouped



Unnamed: 0_level_0,Categories_Accessories Store,Categories_Airport,Categories_Airport Food Court,Categories_Airport Gate,Categories_Airport Lounge,Categories_Airport Service,Categories_Airport Terminal,Categories_American Restaurant,Categories_Antique Shop,Categories_Art Gallery,...,Categories_Toy / Game Store,Categories_Trail,Categories_Vegetarian / Vegan Restaurant,Categories_Video Store,Categories_Vietnamese Restaurant,Categories_Warehouse Store,Categories_Wine Bar,Categories_Wings Joint,Categories_Women's Store,Categories_Yoga Studio
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.1,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
"Alderwood,Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Willowdale West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
"Woodbine Gardens,Parkview Hill",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0


I decided to apply the Kmean algorithm to do 4 clusters it can be changed if needed changing the variable kclusters

In [133]:
kclusters=4
kmean=KMeans(n_clusters=kclusters, random_state=0).fit(one_hot_vector_places_grouped)
kmean.labels_

array([0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 2, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 2, 3, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2])

Now that we have the label we will look for the top venues for each neighborhood and create a new datasate with those most common venues. For this example we choose the top 10 (variable num_top_venues)

In [134]:
one_hot_vector_places_grouped=one_hot_vector_places_grouped.reset_index()

In [135]:
def returning_top(dataFrame, num_top_venues):
    return dataFrame.sort_values(ascending=False).index[0:num_top_venues]
     



num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = one_hot_vector_places_grouped['Neighborhood']
for ind in range(neighborhoods_venues_sorted.shape[0]):   
    neighborhoods_venues_sorted.iloc[ind, 1:]=returning_top(one_hot_vector_places_grouped.iloc[ind, 1:], num_top_venues)
#neighborhoods_venues_sorted

In [136]:
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Categories_Steakhouse,Categories_Plaza,Categories_Café,Categories_Hotel,Categories_Opera House,Categories_Concert Hall,Categories_Asian Restaurant,Categories_Vegetarian / Vegan Restaurant,Categories_Speakeasy,Categories_Dessert Shop
1,Agincourt,Categories_Lounge,Categories_Skating Rink,Categories_Latin American Restaurant,Categories_Breakfast Spot,Categories_Clothing Store,Categories_Concert Hall,Categories_Construction & Landscaping,Categories_Comic Shop,Categories_Electronics Store,Categories_Eastern European Restaurant
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Categories_Park,Categories_Playground,Categories_Yoga Studio,Categories_Dance Studio,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Categories_Grocery Store,Categories_Liquor Store,Categories_Coffee Shop,Categories_Pharmacy,Categories_Pizza Place,Categories_Beer Store,Categories_Fried Chicken Joint,Categories_Fast Food Restaurant,Categories_Sandwich Place,Categories_Deli / Bodega
4,"Alderwood,Long Branch",Categories_Pizza Place,Categories_Coffee Shop,Categories_Gym,Categories_Skating Rink,Categories_Pharmacy,Categories_Pool,Categories_Sandwich Place,Categories_Pub,Categories_Curling Ice,Categories_Discount Store
...,...,...,...,...,...,...,...,...,...,...,...
96,Willowdale West,Categories_Pharmacy,Categories_Butcher,Categories_Discount Store,Categories_Coffee Shop,Categories_Pizza Place,Categories_Grocery Store,Categories_Airport Service,Categories_Department Store,Categories_Empanada Restaurant,Categories_Electronics Store
97,Woburn,Categories_Coffee Shop,Categories_Korean Restaurant,Categories_Deli / Bodega,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop
98,"Woodbine Gardens,Parkview Hill",Categories_Pizza Place,Categories_Gym / Fitness Center,Categories_Bank,Categories_Gastropub,Categories_Intersection,Categories_Fast Food Restaurant,Categories_Pharmacy,Categories_Café,Categories_Pet Store,Categories_Antique Shop
99,Woodbine Heights,Categories_Park,Categories_Cosmetics Shop,Categories_Beer Store,Categories_Dance Studio,Categories_Curling Ice,Categories_Skating Rink,Categories_Asian Restaurant,Categories_Video Store,Categories_Pharmacy,Categories_Concert Hall


In [137]:
neighborhoods_venues_sorted['Label']=kmean.labels_

In [138]:
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Label
0,"Adelaide,King,Richmond",Categories_Steakhouse,Categories_Plaza,Categories_Café,Categories_Hotel,Categories_Opera House,Categories_Concert Hall,Categories_Asian Restaurant,Categories_Vegetarian / Vegan Restaurant,Categories_Speakeasy,Categories_Dessert Shop,0
1,Agincourt,Categories_Lounge,Categories_Skating Rink,Categories_Latin American Restaurant,Categories_Breakfast Spot,Categories_Clothing Store,Categories_Concert Hall,Categories_Construction & Landscaping,Categories_Comic Shop,Categories_Electronics Store,Categories_Eastern European Restaurant,0
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Categories_Park,Categories_Playground,Categories_Yoga Studio,Categories_Dance Studio,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,2
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Categories_Grocery Store,Categories_Liquor Store,Categories_Coffee Shop,Categories_Pharmacy,Categories_Pizza Place,Categories_Beer Store,Categories_Fried Chicken Joint,Categories_Fast Food Restaurant,Categories_Sandwich Place,Categories_Deli / Bodega,0
4,"Alderwood,Long Branch",Categories_Pizza Place,Categories_Coffee Shop,Categories_Gym,Categories_Skating Rink,Categories_Pharmacy,Categories_Pool,Categories_Sandwich Place,Categories_Pub,Categories_Curling Ice,Categories_Discount Store,0
...,...,...,...,...,...,...,...,...,...,...,...,...
96,Willowdale West,Categories_Pharmacy,Categories_Butcher,Categories_Discount Store,Categories_Coffee Shop,Categories_Pizza Place,Categories_Grocery Store,Categories_Airport Service,Categories_Department Store,Categories_Empanada Restaurant,Categories_Electronics Store,0
97,Woburn,Categories_Coffee Shop,Categories_Korean Restaurant,Categories_Deli / Bodega,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,0
98,"Woodbine Gardens,Parkview Hill",Categories_Pizza Place,Categories_Gym / Fitness Center,Categories_Bank,Categories_Gastropub,Categories_Intersection,Categories_Fast Food Restaurant,Categories_Pharmacy,Categories_Café,Categories_Pet Store,Categories_Antique Shop,0
99,Woodbine Heights,Categories_Park,Categories_Cosmetics Shop,Categories_Beer Store,Categories_Dance Studio,Categories_Curling Ice,Categories_Skating Rink,Categories_Asian Restaurant,Categories_Video Store,Categories_Pharmacy,Categories_Concert Hall,0


We will merge now two dataset to get the most common venue for each neighborhood with the coordonates of the neighborhood and which cluster the neighborhood is in

In [142]:
toronto_merge=pd.merge(df2,neighborhoods_venues_sorted, left_on='Neighbourhood', right_on='Neighborhood')
toronto_merge

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Label
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353,"Rouge,Malvern",Categories_Fast Food Restaurant,Categories_Print Shop,Categories_Yoga Studio,Categories_Dance Studio,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,0
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497,"Highland Creek,Rouge Hill,Port Union",Categories_Construction & Landscaping,Categories_Bar,Categories_Moving Target,Categories_Yoga Studio,Categories_Department Store,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,0
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711,"Guildwood,Morningside,West Hill",Categories_Electronics Store,Categories_Rental Car Location,Categories_Medical Center,Categories_Mexican Restaurant,Categories_Intersection,Categories_Breakfast Spot,Categories_Pizza Place,Categories_Convenience Store,Categories_Cosmetics Shop,Categories_Drugstore,0
3,M1G,Scarborough,Woburn,43.770992,-79.216917,Woburn,Categories_Coffee Shop,Categories_Korean Restaurant,Categories_Deli / Bodega,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,0
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,Cedarbrae,Categories_Athletics & Sports,Categories_Bakery,Categories_Hakka Restaurant,Categories_Gas Station,Categories_Caribbean Restaurant,Categories_Thai Restaurant,Categories_Fried Chicken Joint,Categories_Bank,Categories_Electronics Store,Categories_Eastern European Restaurant,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97,M9N,York,Weston,43.706876,-79.518188,Weston,Categories_Park,Categories_Convenience Store,Categories_Yoga Studio,Categories_Deli / Bodega,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,2
98,M9P,Etobicoke,Westmount,43.696319,-79.532242,Westmount,Categories_Coffee Shop,Categories_Pizza Place,Categories_Chinese Restaurant,Categories_Sandwich Place,Categories_Intersection,Categories_Middle Eastern Restaurant,Categories_Deli / Bodega,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,0
99,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.688905,-79.554724,"Kingsview Village,Martin Grove Gardens,Richvie...",Categories_Mobile Phone Shop,Categories_Pizza Place,Categories_Sandwich Place,Categories_Bus Line,Categories_Yoga Studio,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,0
100,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.739416,-79.588437,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Categories_Grocery Store,Categories_Liquor Store,Categories_Coffee Shop,Categories_Pharmacy,Categories_Pizza Place,Categories_Beer Store,Categories_Fried Chicken Joint,Categories_Fast Food Restaurant,Categories_Sandwich Place,Categories_Deli / Bodega,0


We use folium to draw a map with the different point colored according to their cluster

In [148]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


# create map
map_clusters = folium.Map(location=[43.784343, -79.163085], zoom_start=11)
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merge['Latitude'], toronto_merge['Longitude'], toronto_merge['Neighborhood'], toronto_merge['Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters)

In [149]:
map_clusters

Now we are checking the clusters to see if we can extract a logical name for each cluster

In [150]:
neighborhoods_venues_sorted[neighborhoods_venues_sorted['Label']==0]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Label
0,"Adelaide,King,Richmond",Categories_Steakhouse,Categories_Plaza,Categories_Café,Categories_Hotel,Categories_Opera House,Categories_Concert Hall,Categories_Asian Restaurant,Categories_Vegetarian / Vegan Restaurant,Categories_Speakeasy,Categories_Dessert Shop,0
1,Agincourt,Categories_Lounge,Categories_Skating Rink,Categories_Latin American Restaurant,Categories_Breakfast Spot,Categories_Clothing Store,Categories_Concert Hall,Categories_Construction & Landscaping,Categories_Comic Shop,Categories_Electronics Store,Categories_Eastern European Restaurant,0
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Categories_Grocery Store,Categories_Liquor Store,Categories_Coffee Shop,Categories_Pharmacy,Categories_Pizza Place,Categories_Beer Store,Categories_Fried Chicken Joint,Categories_Fast Food Restaurant,Categories_Sandwich Place,Categories_Deli / Bodega,0
4,"Alderwood,Long Branch",Categories_Pizza Place,Categories_Coffee Shop,Categories_Gym,Categories_Skating Rink,Categories_Pharmacy,Categories_Pool,Categories_Sandwich Place,Categories_Pub,Categories_Curling Ice,Categories_Discount Store,0
5,"Bathurst Manor,Downsview North,Wilson Heights",Categories_Coffee Shop,Categories_Deli / Bodega,Categories_Bridal Shop,Categories_Diner,Categories_Bank,Categories_Sushi Restaurant,Categories_Middle Eastern Restaurant,Categories_Restaurant,Categories_Ice Cream Shop,Categories_Curling Ice,0
...,...,...,...,...,...,...,...,...,...,...,...,...
95,Willowdale South,Categories_Café,Categories_Grocery Store,Categories_Indonesian Restaurant,Categories_Coffee Shop,Categories_Plaza,Categories_Japanese Restaurant,Categories_Movie Theater,Categories_Steakhouse,Categories_Ramen Restaurant,Categories_Dog Run,0
96,Willowdale West,Categories_Pharmacy,Categories_Butcher,Categories_Discount Store,Categories_Coffee Shop,Categories_Pizza Place,Categories_Grocery Store,Categories_Airport Service,Categories_Department Store,Categories_Empanada Restaurant,Categories_Electronics Store,0
97,Woburn,Categories_Coffee Shop,Categories_Korean Restaurant,Categories_Deli / Bodega,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,0
98,"Woodbine Gardens,Parkview Hill",Categories_Pizza Place,Categories_Gym / Fitness Center,Categories_Bank,Categories_Gastropub,Categories_Intersection,Categories_Fast Food Restaurant,Categories_Pharmacy,Categories_Café,Categories_Pet Store,Categories_Antique Shop,0


We can see that the cluster 0 seems to be mainly coffee shop places and food, drink related


In [151]:
neighborhoods_venues_sorted[neighborhoods_venues_sorted['Label']==1]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Label
36,Downsview Central,Categories_Baseball Field,Categories_Home Service,Categories_Food Truck,Categories_Yoga Studio,Categories_Department Store,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,1
42,"Emery,Humberlea",Categories_Construction & Landscaping,Categories_Baseball Field,Categories_Yoga Studio,Categories_Department Store,Categories_Empanada Restaurant,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,1
56,"Humber Bay,King's Mill Park,Kingsway Park Sout...",Categories_Baseball Field,Categories_Home Service,Categories_Yoga Studio,Categories_Department Store,Categories_Empanada Restaurant,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,1


The cluster 1 seems more related to home or construction

In [156]:
neighborhoods_venues_sorted[neighborhoods_venues_sorted['Label']==2]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Label
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Categories_Park,Categories_Playground,Categories_Yoga Studio,Categories_Dance Studio,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,2
16,Caledonia-Fairbanks,Categories_Park,Categories_Women's Store,Categories_Fast Food Restaurant,Categories_Market,Categories_Concert Hall,Categories_Dessert Shop,Categories_Comic Shop,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,2
38,Downsview West,Categories_Grocery Store,Categories_Bank,Categories_Shopping Mall,Categories_Park,Categories_Gift Shop,Categories_Dance Studio,Categories_Drugstore,Categories_Dog Run,Categories_Greek Restaurant,Categories_Discount Store,2
41,East Toronto,Categories_Coffee Shop,Categories_Park,Categories_Convenience Store,Categories_Deli / Bodega,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,2
67,"Moore Park,Summerhill East",Categories_Restaurant,Categories_Playground,Categories_Tennis Court,Categories_Park,Categories_Comfort Food Restaurant,Categories_Dance Studio,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,2
73,Parkwoods,Categories_Park,Categories_Food & Drink Shop,Categories_Yoga Studio,Categories_Dance Studio,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,2
75,Rosedale,Categories_Park,Categories_Playground,Categories_Trail,Categories_Curling Ice,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,2
90,"The Kingsway,Montgomery Road,Old Mill North",Categories_River,Categories_Park,Categories_Yoga Studio,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,Categories_Department Store,2
94,Weston,Categories_Park,Categories_Convenience Store,Categories_Yoga Studio,Categories_Deli / Bodega,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,2
100,York Mills West,Categories_Park,Categories_Convenience Store,Categories_Bank,Categories_Yoga Studio,Categories_Department Store,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,2


The cluster 2 seems more related to parks

In [153]:
neighborhoods_venues_sorted[neighborhoods_venues_sorted['Label']==3]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Label
68,"Newtonbrook,Willowdale",Categories_Gym,Categories_Deli / Bodega,Categories_Empanada Restaurant,Categories_Electronics Store,Categories_Eastern European Restaurant,Categories_Drugstore,Categories_Dog Run,Categories_Discount Store,Categories_Diner,Categories_Dessert Shop,3


This cluster is small so it is not easy to extract a pattern from it.

## Conclusion

We could see at least three main clusters for those neighborhood. One more related to food and drinks, one more related to parks and one more related to building and construction. This could give us some insight on what we could propose to person on those districts.