# Clustering and Cluster Analysis

 The number of appropriate clusters for Kmeans clustering was found to be 10. This section deals with the visualisation of the clusters. Some of the clusters are also analysed.

In [1]:
import pandas as pd
df=pd.read_csv('toronto.csv')
df=df.drop('Unnamed: 0',1)
df

Unnamed: 0,Postal Code,Borough,Neighborhood,latitude,longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.758800,-79.320197,2,Coffee Shop,Pizza Place,Supermarket,Gas Station,Discount Store,Pharmacy,Fast Food Restaurant,Breakfast Spot,Liquor Store,Café
1,M4A,North York,Victoria Village,43.732658,-79.311189,5,Middle Eastern Restaurant,Chinese Restaurant,Mediterranean Restaurant,Thai Restaurant,Thrift / Vintage Store,French Restaurant,Intersection,Asian Restaurant,Indian Restaurant,Bus Line
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.660706,-79.360457,1,Coffee Shop,Restaurant,Bakery,Café,Park,Thai Restaurant,Pub,Gastropub,Breakfast Spot,Diner
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.722079,-79.437507,1,Coffee Shop,Grocery Store,Pizza Place,Fast Food Restaurant,Bagel Shop,Pharmacy,Bank,Bus Stop,Discount Store,Sushi Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.659659,-79.390340,0,Coffee Shop,Japanese Restaurant,Park,Sushi Restaurant,Art Gallery,Mexican Restaurant,Ramen Restaurant,Café,Dessert Shop,Burger Joint
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,M1N,Scarborough,Birch Cliff,43.691805,-79.264494,1,Park,Thai Restaurant,Restaurant,College Stadium,Diner,Café,Gym,Skating Rink,General Entertainment,Doner Restaurant
95,M5R,Central Toronto,Davenport,43.671545,-79.448322,1,Café,Italian Restaurant,Coffee Shop,Park,Bakery,Sushi Restaurant,Brazilian Restaurant,Bar,Bank,Portuguese Restaurant
96,M9N,York,York South-Weston,43.684466,-79.498818,8,Pizza Place,Park,Coffee Shop,Asian Restaurant,Supermarket,Bus Line,Restaurant,Golf Course,Massage Studio,Fast Food Restaurant
97,M3B,North York,Edwards Gardens,43.731442,-79.358380,6,Intersection,Botanical Garden,Grocery Store,Park,Coffee Shop,Sandwich Place,Bubble Tea Shop,Stables,Fireworks Store,Donut Shop


In [2]:
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
import numpy as np

In [3]:
map_clusters = folium.Map(location=[43.6534817, -79.3839347], zoom_start=11)

# set color scheme for the clusters
x = np.arange(10)
ys = [i + x + (i*x)**2 for i in range(10)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]



markers_colors = []
for lat, lon, poi, cluster in zip(df['latitude'], df['longitude'], df['Neighborhood'], df['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examining Clusters

# Cluster 0 Analysis

Coffee shops cafes and Japanese/sushi
restaurants seem to be the most popular venues here. Whereas Breweries are a rare sight.  We will return neighbourhoods that do not have

In [99]:
cluster0=df.loc[df['Cluster Label'] == 0, df.columns[list(range(0, df.shape[1]))]]
cluster0.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,latitude,longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.659659,-79.39034,0,Coffee Shop,Japanese Restaurant,Park,Sushi Restaurant,Art Gallery,Mexican Restaurant,Ramen Restaurant,Café,Dessert Shop,Burger Joint
7,M3B,North York,Don Mills,43.775347,-79.345944,0,Clothing Store,Coffee Shop,Fast Food Restaurant,Baseball Field,Restaurant,Japanese Restaurant,Bank,Juice Bar,Sporting Goods Shop,Food Court
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.6565,-79.377114,0,Coffee Shop,Café,Gastropub,Japanese Restaurant,Italian Restaurant,Hotel,Ramen Restaurant,Poke Place,Plaza,Middle Eastern Restaurant
13,M3C,North York,Don Mills,43.775347,-79.345944,0,Clothing Store,Coffee Shop,Fast Food Restaurant,Baseball Field,Restaurant,Japanese Restaurant,Bank,Juice Bar,Sporting Goods Shop,Food Court
15,M5C,Downtown Toronto,St. James Town,43.669403,-79.372704,0,Coffee Shop,Restaurant,Japanese Restaurant,Café,Gay Bar,Diner,Grocery Store,Gastropub,Thai Restaurant,Pub


In [77]:
s=cluster0.apply(lambda x: x.mode()).iloc[0,6:11]
s

1st Most Common Venue            Coffee Shop
2nd Most Common Venue                   Café
3rd Most Common Venue             Restaurant
4th Most Common Venue    Japanese Restaurant
5th Most Common Venue             Restaurant
Name: 0, dtype: object

Cluster 0 seems to be a home to a number of restarants

In [95]:
cluster0['3rd Most Common Venue'].value_counts()[:5]

Restaurant              9
Café                    5
Fast Food Restaurant    3
Park                    2
Hotel                   2
Name: 3rd Most Common Venue, dtype: int64

In [98]:
cluster0['2nd Most Common Venue'].value_counts()[:5]

Café              11
Coffee Shop        6
Bar                3
Hotel              3
Clothing Store     2
Name: 2nd Most Common Venue, dtype: int64

In [96]:
cluster0['4th Most Common Venue'].value_counts()[:5]

Japanese Restaurant    5
Gastropub              3
Bar                    2
Park                   2
Baseball Field         2
Name: 4th Most Common Venue, dtype: int64

# Cluster 0 Conclusion

 Cluster 0 is defined by the amount of Restaurants, Coffee shops, Fast food and Bars. Almost all of Downtown Toronto Neighborhoods belong to this cluster, which makes sense. 

# Cluster 1 Analysis

In [75]:
cluster1=df.loc[df['Cluster Label'] == 1, df.columns[list(range(0, df.shape[1]))]]
cluster1.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,latitude,longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.660706,-79.360457,1,Coffee Shop,Restaurant,Bakery,Café,Park,Thai Restaurant,Pub,Gastropub,Breakfast Spot,Diner
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.722079,-79.437507,1,Coffee Shop,Grocery Store,Pizza Place,Fast Food Restaurant,Bagel Shop,Pharmacy,Bank,Bus Stop,Discount Store,Sushi Restaurant
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.679484,-79.538909,1,Bank,Coffee Shop,Electronics Store,Beer Store,Shopping Mall,Golf Course,Café,Liquor Store,Grocery Store,Sandwich Place
14,M4C,East York,Woodbine Heights,43.69993,-79.319132,1,Pizza Place,Pharmacy,Park,Thai Restaurant,Pub,Spa,Skating Rink,Breakfast Spot,Liquor Store,Café
20,M4G,East York,Leaside,43.704798,-79.36809,1,Bakery,Grocery Store,Coffee Shop,Restaurant,Sushi Restaurant,Sporting Goods Shop,Thai Restaurant,Indian Restaurant,Sandwich Place,Baby Store


Some of the popular venues by count:

In [73]:
s=cluster1.apply(lambda x: x.mode()).iloc[0,6:11]
s

1st Most Common Venue           Coffee Shop
2nd Most Common Venue    Italian Restaurant
3rd Most Common Venue                  Park
4th Most Common Venue                  Park
5th Most Common Venue      Sushi Restaurant
Name: 0, dtype: object

In [86]:
cluster1['5th Most Common Venue'].value_counts()[:5]

Sushi Restaurant      5
Italian Restaurant    2
Park                  2
Bank                  2
Café                  2
Name: 5th Most Common Venue, dtype: int64

In [87]:
cluster1['2nd Most Common Venue'].value_counts()[:5]

Italian Restaurant    6
Coffee Shop           5
Bank                  3
Café                  3
Pizza Place           3
Name: 2nd Most Common Venue, dtype: int64

In [88]:
cluster1['1st Most Common Venue'].value_counts()[:5]

Coffee Shop          17
Park                  3
Café                  2
Pub                   1
Indian Restaurant     1
Name: 1st Most Common Venue, dtype: int64

# Cluster 1 Conclusion

As we can see coffee shops and Italian/Sushi Restarants are the defining feature of this cluster on neighborhoods

# Cluster 8 Analysis

In [103]:
cluster8=df.loc[df['Cluster Label'] == 8, df.columns[list(range(0, df.shape[1]))]]
cluster8

Unnamed: 0,Postal Code,Borough,Neighborhood,latitude,longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,M1B,Scarborough,"Malvern, Rouge",43.809196,-79.221701,8,Park,Fast Food Restaurant,Pizza Place,Pharmacy,Grocery Store,Sandwich Place,Restaurant,Bubble Tea Shop,Salon / Barbershop,Skating Rink
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706298,-79.321907,8,Playground,Pharmacy,Park,Skating Rink,Pizza Place,Breakfast Spot,Fast Food Restaurant,Pet Store,Bank,Rock Climbing Spot
10,M6B,North York,Glencairn,43.708712,-79.440685,8,Grocery Store,Pizza Place,Coffee Shop,Asian Restaurant,Shoe Store,Rental Car Location,Japanese Restaurant,Bakery,Trail,Pub
11,M9B,Etobicoke,"West Deane Park, Princess Gardens, Martin Grov...",43.663199,-79.568568,8,Park,Convenience Store,Pharmacy,Eastern European Restaurant,Electronics Store,Skating Rink,Beer Store,Bakery,Home Service,Fast Food Restaurant
12,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.780271,-79.130499,8,Park,Liquor Store,Train Station,Gym,Beer Store,Japanese Restaurant,Coffee Shop,Pet Store,Fast Food Restaurant,Pizza Place
16,M9C,Etobicoke,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",43.662273,-79.576516,8,Convenience Store,Beer Store,Eastern European Restaurant,Electronics Store,Chinese Restaurant,Park,Coffee Shop,Pub,Pizza Place,Filipino Restaurant
17,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.755225,-79.198229,8,Grocery Store,Park,Train Station,Hotel,Storage Facility,Baseball Field,Art Gallery,Coffee Shop,Sandwich Place,Moving Target
24,M2H,North York,Hillcrest Village,43.799664,-79.365019,8,Park,Chinese Restaurant,Bank,Grocery Store,Pizza Place,Shopping Mall,Supermarket,Szechuan Restaurant,Sandwich Place,Korean Restaurant
43,M3L,North York,Jane Sheppard mall,43.740283,-79.51222,8,Park,Coffee Shop,Gym / Fitness Center,Moving Target,Pizza Place,Shopping Mall,Vietnamese Restaurant,Bank,Yoga Studio,Dongbei Restaurant
47,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.71117,-79.248177,8,Park,Coffee Shop,Sports Bar,Sushi Restaurant,Beach,Sandwich Place,Breakfast Spot,Train Station,Grocery Store,Fish & Chips Shop


In [102]:
s=cluster8.apply(lambda x: x.mode()).iloc[0,6:11]
s

1st Most Common Venue                Park
2nd Most Common Venue         Coffee Shop
3rd Most Common Venue       Train Station
4th Most Common Venue    Asian Restaurant
5th Most Common Venue         Pizza Place
Name: 0, dtype: object

In [104]:
cluster8['5th Most Common Venue'].value_counts()[:5]

Pizza Place                  3
Middle Eastern Restaurant    1
Gas Station                  1
Storage Facility             1
Supermarket                  1
Name: 5th Most Common Venue, dtype: int64

In [110]:
cluster8['1st Most Common Venue'].value_counts()[:5]

Park                 6
Grocery Store        2
Pizza Place          2
Playground           1
Convenience Store    1
Name: 1st Most Common Venue, dtype: int64

# Cluster 8 Conclusion

Cluster 8 has an abundance of Parks and Pizza places

# Cluster 2 Analysis

In [111]:
cluster2=df.loc[df['Cluster Label'] == 2, df.columns[list(range(0, df.shape[1]))]]
cluster2

Unnamed: 0,Postal Code,Borough,Neighborhood,latitude,longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.7588,-79.320197,2,Coffee Shop,Pizza Place,Supermarket,Gas Station,Discount Store,Pharmacy,Fast Food Restaurant,Breakfast Spot,Liquor Store,Café
19,M1G,Scarborough,Woburn,43.759824,-79.225291,2,Fast Food Restaurant,Pizza Place,Park,Sandwich Place,Discount Store,Department Store,Coffee Shop,Grocery Store,Bank,Czech Restaurant
23,M1H,Scarborough,Cedarbrae,43.756467,-79.226692,2,Fast Food Restaurant,Grocery Store,Pizza Place,Sandwich Place,Discount Store,Diner,Coffee Shop,Bank,Vietnamese Restaurant,Thrift / Vintage Store
29,M1J,Scarborough,Scarborough Village,43.743742,-79.211632,2,Coffee Shop,Theater,Fast Food Restaurant,Pub,Grocery Store,Gym,Beer Store,Discount Store,Butcher,Big Box Store
31,M3J,North York,"Northwood Park, York University",43.754135,-79.50448,2,Coffee Shop,Caribbean Restaurant,Beer Store,Shopping Mall,Gas Station,Pizza Place,Fast Food Restaurant,Vietnamese Restaurant,Pharmacy,Fried Chicken Joint
35,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.724878,-79.253969,2,Fast Food Restaurant,Sports Bar,Train Station,Convenience Store,Pizza Place,Sporting Goods Shop,Discount Store,Grocery Store,Chinese Restaurant,Asian Restaurant
53,M3N,North York,Jane and Finch,43.757253,-79.517697,2,Hotel,Grocery Store,Fast Food Restaurant,Pizza Place,Discount Store,Coffee Shop,Theater,Falafel Restaurant,Liquor Store,Fried Chicken Joint
90,M9M,North York,"Pelmo park, Humbermede",43.710801,-79.510631,2,Vietnamese Restaurant,Mexican Restaurant,Gas Station,Convenience Store,Fast Food Restaurant,Pharmacy,Asian Restaurant,Furniture / Home Store,Caribbean Restaurant,Bank


In [113]:
s=cluster2.apply(lambda x: x.mode()).iloc[0,6:11]
s

1st Most Common Venue             Coffee Shop
2nd Most Common Venue           Grocery Store
3rd Most Common Venue    Fast Food Restaurant
4th Most Common Venue       Convenience Store
5th Most Common Venue          Discount Store
Name: 0, dtype: object

In [114]:
cluster2['1st Most Common Venue'].value_counts()[:5]

Fast Food Restaurant     3
Coffee Shop              3
Hotel                    1
Vietnamese Restaurant    1
Name: 1st Most Common Venue, dtype: int64

In [116]:
cluster2['2nd Most Common Venue'].value_counts()[:5]

Grocery Store         2
Pizza Place           2
Theater               1
Mexican Restaurant    1
Sports Bar            1
Name: 2nd Most Common Venue, dtype: int64

In [117]:
cluster2['3rd Most Common Venue'].value_counts()[:5]

Fast Food Restaurant    2
Park                    1
Train Station           1
Gas Station             1
Supermarket             1
Name: 3rd Most Common Venue, dtype: int64

In [118]:
cluster2['4th Most Common Venue'].value_counts()[:5]

Sandwich Place       2
Convenience Store    2
Pub                  1
Shopping Mall        1
Gas Station          1
Name: 4th Most Common Venue, dtype: int64

In [119]:
cluster2['5th Most Common Venue'].value_counts()[:5]

Discount Store          4
Fast Food Restaurant    1
Grocery Store           1
Pizza Place             1
Gas Station             1
Name: 5th Most Common Venue, dtype: int64

# Cluster 2 Conclusion

Cluster 2 neighborhoods have plenty of Discount stores/Grocery stores, fast food/sandwich places, which is why they are in the same cluster