# Segmenting and Clustering Toronto

### First section

We will first start by importing data from the Wikipedia site into a Dataframe per specifications

In [2]:
#install BeautifulSoup and import library

!conda install beautifulsoup4

from bs4 import BeautifulSoup
import requests

Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following packages will be UPDATED:

    beautifulsoup4: 4.6.0-py35h442a8c9_1 --> 4.6.3-py35_0

beautifulsoup4 100% |################################| Time: 0:00:00  30.98 MB/s


In [3]:
#start parsing the website
import urllib.request as urllib2

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

with urllib2.urlopen(url) as html_file:
    soup = BeautifulSoup(html_file)

In [4]:
#import data processing libraries and create Dataframe
import pandas as pd
import numpy as np

table = soup.find('table', class_='wikitable sortable')
df = pd.read_html(str(table), header = 0)[0]
df.rename(columns={'Postcode':'Postalcode'}, inplace=True)

#Clean NA Borough values
df['Borough'].replace('Not assigned', np.nan, inplace=True)
df.dropna(subset=['Borough'], axis = 0, inplace=True)
df.reset_index(drop=True,inplace=True)
df.head()

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


In [5]:
#For NA neighbourhoods, fill with Borough
df['Neighbourhood'].replace('Not assigned', df['Borough'], inplace=True)
df.head(10)

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [6]:
#For more than one Postcode, combine neighbourhoods
df_grouped = df.groupby(['Postalcode','Borough'])[['Neighbourhood']].agg(lambda col: ', '.join(col))
df_grouped.reset_index(inplace=True)
df_grouped.head(10)

Unnamed: 0,Postalcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


## Using Geocoder to obtain Latitude and Longitude for each Postalcode

In [7]:
#import geocoder library
!pip install geocoder
import geocoder

Collecting geocoder
  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K    100% |████████████████████████████████| 102kB 6.5MB/s ta 0:00:01
[?25hCollecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Requirement not upgraded as not directly required: future in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from geocoder)
Requirement not upgraded as not directly required: requests in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from geocoder)
Requirement not upgraded as not directly required: six in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from geocoder)
Requirement not upgraded as not directly required: click in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from geocoder)
Requirement 

In [8]:
#Getting lists for longitude and latitude using ArcGIS (World Geocoding Service)

latitude = []
longitude = []

for index, row in df_grouped.iterrows():
    while True:
        g = geocoder.arcgis('%s, Toronto, Ontario' % row['Postalcode'])
        latitude.append(g.lat)
        longitude.append(g.lng)
        if latitude[index] != None:
            break

In [9]:
#Adding columns

df_grouped['Latitude'] = latitude
df_grouped['Longitude'] = longitude

In [10]:
df_grouped.head(10)

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.81165,-79.195561
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785605,-79.158701
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76569,-79.175299
3,M1G,Scarborough,Woburn,43.768216,-79.21761
4,M1H,Scarborough,Cedarbrae,43.769608,-79.23944
5,M1J,Scarborough,Scarborough Village,43.743085,-79.232172
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.72626,-79.26367
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.713213,-79.28491
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.723575,-79.234976
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.69669,-79.260069


#### After having ready the Dataframe, we continue to graph the Borough's centers to determine which ones to analyze

In [11]:
#Getting Borough centers to map them and decide which Neighbourhoods to analyze

new_df = df_grouped.groupby('Borough')['Latitude','Longitude'].mean()
new_df.reset_index(inplace=True)
new_df

Unnamed: 0,Borough,Latitude,Longitude
0,Central Toronto,43.701806,-79.398985
1,Downtown Toronto,43.654154,-79.384989
2,East Toronto,43.667847,-79.337088
3,East York,43.699376,-79.333363
4,Etobicoke,43.66015,-79.539829
5,Mississauga,43.64869,-79.38544
6,North York,43.75107,-79.42947
7,Queen's Park,43.661072,-79.390895
8,Scarborough,43.767385,-79.248044
9,West Toronto,43.651699,-79.444922


In [12]:
#Import folium library to map Toronto's Boroughs

!conda install folium -c conda-forge
import folium

map1 = folium.Map(location=[43.65,-79.38], zoom_start=11)

for lat, lng, label in zip(new_df['Latitude'], new_df['Longitude'], new_df['Borough']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map1)  
    
map1

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.6.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00   3.02 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  27.98 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  24.22 MB/s
folium-0.6.0-p 100% |################################| Time: 0:00:00  33.33 MB/s


#### After reading the map we determine that Toronto center would be an interesting area to analyze. This is reduced to filtering all Boroughs with 'Toronto' in them.

##### We create a new Dataframe with the data to analyze.

In [13]:
df_toronto = df_grouped[df_grouped['Borough'].apply(lambda x: 'Toronto' in x)]
df_toronto.head(10)

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676531,-79.29541
41,M4K,East Toronto,"The Danforth West, Riverdale",43.683262,-79.35512
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.667985,-79.314642
43,M4M,East Toronto,Studio District,43.662766,-79.33483
44,M4N,Central Toronto,Lawrence Park,43.728135,-79.38709
45,M4P,Central Toronto,Davisville North,43.712755,-79.388514
46,M4R,Central Toronto,North Toronto West,43.714523,-79.40696
47,M4S,Central Toronto,Davisville,43.702765,-79.385769
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.690505,-79.382973
49,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686003,-79.402335


## From here on we start analyzing the venues by category to finally cluster them and find simmilarities along the different Toronto neighbourhoods

#### Start by defining the Foursquare credentials

In [14]:
#Foursquare APIs credentials

CLIENT_ID ='TN3QI1FNX0SSSESH0VI50FCPC3PPJWE0LV2BQRIEK25HGXYB'
CLIENT_SECRET = 'WMCOPGIROVIYJHSRB3NNL4M523FRSVF1W3CBKWEKG2N11GB2'
VERSION = '20180605'

In [15]:
#define function to get nearest Venues to each neighbourhood
import json

def getNearbyVenues(name, lat, long, radius = 700, LIMIT = 100):
    venues = []

    for name, lat, long in zip(name, lat, long):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            long, 
            radius, 
            LIMIT)
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        #import relevant data
        venues.append([(name, lat, long,
            venue['venue']['categories'][0]['name'],
            venue['venue']['name'],
            venue['venue']['location']['lat'],
            venue['venue']['location']['lng']) for venue in results])
        
    nearby_venues = pd.DataFrame(item for venue_list in venues for item in venue_list)
    nearby_venues.columns = ['Neighbourhood',
        'Neighbourhood Latitude',
        'Neighbourhood Longitude',
        'Venue Category',
        'Venue',
        'Venue Latitude',
        'Venue Longitude']
    
    return nearby_venues

In [16]:
#Create new df Toronto Venues

toronto_venues = getNearbyVenues(df_toronto['Neighbourhood'], df_toronto['Latitude'], df_toronto['Longitude'])

toronto_venues.head(10)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Category,Venue,Venue Latitude,Venue Longitude
0,The Beaches,43.676531,-79.29541,Bakery,Beaches Bake Shop,43.680363,-79.289692
1,The Beaches,43.676531,-79.29541,Vegetarian / Vegan Restaurant,Tori's Bakeshop,43.672114,-79.290331
2,The Beaches,43.676531,-79.29541,Toy / Game Store,Mastermind Toys,43.671453,-79.293971
3,The Beaches,43.676531,-79.29541,Gastropub,The Beech Tree,43.680493,-79.288846
4,The Beaches,43.676531,-79.29541,Breakfast Spot,Beacher Cafe,43.671938,-79.291238
5,The Beaches,43.676531,-79.29541,French Restaurant,Veloute Bistro,43.672267,-79.289584
6,The Beaches,43.676531,-79.29541,Bar,Castro's Lounge,43.671104,-79.295107
7,The Beaches,43.676531,-79.29541,Japanese Restaurant,Yumei Sushi,43.671108,-79.295064
8,The Beaches,43.676531,-79.29541,Juice Bar,Sanna's Farmacia,43.670929,-79.295969
9,The Beaches,43.676531,-79.29541,Tea Room,Pippins Tea Company,43.670992,-79.295905


In [17]:
#Checking venues per neighbourhood

toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue Category,Venue,Venue Latitude,Venue Longitude
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,100,100,100,100,100,100
"Brockton, Exhibition Place, Parkdale Village",100,100,100,100,100,100
Business reply mail Processing Centre969 Eastern,100,100,100,100,100,100
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",100,100,100,100,100,100
"Cabbagetown, St. James Town",68,68,68,68,68,68
Central Bay Street,100,100,100,100,100,100
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,71,71,71,71,71,71
Church and Wellesley,100,100,100,100,100,100


In [18]:
#Checking some statistics

toronto_venues.describe(include='all')

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue Category,Venue,Venue Latitude,Venue Longitude
count,2406,2406.0,2406.0,2406,2406,2406.0,2406.0
unique,38,,,257,1501,,
top,"Adelaide, King, Richmond",,,Coffee Shop,Starbucks,,
freq,100,,,193,68,,
mean,,43.657529,-79.391008,,,43.657322,-79.391135
std,,0.015769,0.030683,,,0.015455,0.030695
min,,43.62347,-79.475057,,,43.62314,-79.483683
25%,,43.648399,-79.402335,,,43.648292,-79.402996
50%,,43.65121,-79.38493,,,43.651706,-79.385607
75%,,43.66311,-79.37818,,,43.66315,-79.378902


## Analyzing the data

#### We start by one hot encoding the venue types and sorting it to find the most common venue type

In [184]:
#hot encode

one_hot_toronto = pd.get_dummies(toronto_venues[['Venue Category']], prefix='', prefix_sep='')
one_hot_toronto['Neighbourhood'] = toronto_venues['Neighbourhood']
columns = np.concatenate((['Neighbourhood'], one_hot_toronto.columns[:-1].values))
one_hot_toronto = one_hot_toronto[columns]

In [185]:
#normalize encoding by obtaining mean grouping by neighbourhood

toronto_venues_by_n = one_hot_toronto.groupby('Neighbourhood').mean()
toronto_venues_by_n.reset_index(inplace=True)
toronto_venues_by_n.head(10)

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Art Gallery,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,...,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business reply mail Processing Centre969 Eastern,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,...,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
7,"Chinatown, Grange Park, Kensington Market",0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,...,0.01,0.06,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.02
8,Christie,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,...,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,...,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.02


#### Now we try to obtain a table with the n-th most common venue type per neighbourhood

In [186]:
#We need to sort for each row horizontally the columns to obtain a ranking

#Define function to initialize columns

def columns_rank(n):
    
    item = []
    #max allowed n value = 20
    aux = ['st','nd','rd']
    for i in range(n):
        try:
            item.append('{}{} Most Common Venue'.format(i + 1, aux[i]))
        except:
            item.append('{}th Most Common Venue'.format(i + 1))
    
    return item


#Define function to get top n venues

def top_venues(df, top_n = 10):

    top = []
    for i in range(df.shape[0]):
        row = df.iloc[i,1:]
        row.sort_values(ascending = False, inplace=True)
        row = row[:top_n]
        top.append(np.array(row.index))
        
    top_df = pd.DataFrame(top)
    top_df['Neighbourhood'] = df['Neighbourhood']
    columns = np.concatenate((['Neighbourhood'], top_df.columns[:-1].values))
    top_df = top_df[columns]
    top_df.columns=np.concatenate((['Neighbourhood'], columns_rank(top_n)))
    
    return pd.DataFrame(top_df)

top_venues_df = top_venues(toronto_venues_by_n)
top_venues_df.head(10)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Hotel,Café,Steakhouse,Deli / Bodega,Restaurant,Breakfast Spot,Gastropub,American Restaurant,Japanese Restaurant
1,Berczy Park,Coffee Shop,Hotel,Restaurant,Café,Cocktail Bar,Italian Restaurant,Japanese Restaurant,Pub,Bakery,Deli / Bodega
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Bar,Café,Furniture / Home Store,Art Gallery,Restaurant,Bakery,Vegetarian / Vegan Restaurant,Indian Restaurant,Cocktail Bar
3,Business reply mail Processing Centre969 Eastern,Coffee Shop,Café,Restaurant,Steakhouse,Hotel,American Restaurant,Bar,Thai Restaurant,Gastropub,Theater
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Coffee Shop,Restaurant,Italian Restaurant,Café,Gym,Park,Bakery,Sandwich Place,Spa,Japanese Restaurant
5,"Cabbagetown, St. James Town",Restaurant,Coffee Shop,Café,Park,Pizza Place,Pub,Italian Restaurant,Indian Restaurant,Diner,Bakery
6,Central Bay Street,Coffee Shop,Clothing Store,Restaurant,Chinese Restaurant,Italian Restaurant,Ramen Restaurant,Middle Eastern Restaurant,Thai Restaurant,Theater,Spa
7,"Chinatown, Grange Park, Kensington Market",Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Dumpling Restaurant,Bakery,Bar,Mexican Restaurant,Vietnamese Restaurant,Coffee Shop,Ramen Restaurant
8,Christie,Korean Restaurant,Grocery Store,Coffee Shop,Indian Restaurant,Pizza Place,Café,Park,Diner,Ice Cream Shop,Mexican Restaurant
9,Church and Wellesley,Coffee Shop,Japanese Restaurant,Burger Joint,Restaurant,Gay Bar,Diner,Café,Sushi Restaurant,Bookstore,Breakfast Spot


## Machine Learning k-means Clustering

#### We execute the k-means algorithm to cluster the neighbourhoods and map them

In [187]:
#Import KMeans library
from sklearn.cluster import KMeans

#Initialize number of clusters and run KMeans

n_clusters = 6
toronto_cluster_df = toronto_venues_by_n.drop('Neighbourhood', axis = 1)

kmeans = KMeans(n_clusters=n_clusters, init='k-means++',n_init=12)
kmeans.fit(toronto_cluster_df)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=6, n_init=12, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

In [188]:
#Merge complete Toronto Center dataframe with top10 venues dataframe

top_venues_df['Cluster labels'] = kmeans.labels_
columns = np.concatenate((['Cluster labels'], top_venues_df.columns[:-1].values))
top_venues_df = top_venues_df[columns]

top_venues_df.head(15)

Unnamed: 0,Cluster labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,"Adelaide, King, Richmond",Coffee Shop,Hotel,Café,Steakhouse,Deli / Bodega,Restaurant,Breakfast Spot,Gastropub,American Restaurant,Japanese Restaurant
1,1,Berczy Park,Coffee Shop,Hotel,Restaurant,Café,Cocktail Bar,Italian Restaurant,Japanese Restaurant,Pub,Bakery,Deli / Bodega
2,1,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Bar,Café,Furniture / Home Store,Art Gallery,Restaurant,Bakery,Vegetarian / Vegan Restaurant,Indian Restaurant,Cocktail Bar
3,1,Business reply mail Processing Centre969 Eastern,Coffee Shop,Café,Restaurant,Steakhouse,Hotel,American Restaurant,Bar,Thai Restaurant,Gastropub,Theater
4,1,"CN Tower, Bathurst Quay, Island airport, Harbo...",Coffee Shop,Restaurant,Italian Restaurant,Café,Gym,Park,Bakery,Sandwich Place,Spa,Japanese Restaurant
5,5,"Cabbagetown, St. James Town",Restaurant,Coffee Shop,Café,Park,Pizza Place,Pub,Italian Restaurant,Indian Restaurant,Diner,Bakery
6,1,Central Bay Street,Coffee Shop,Clothing Store,Restaurant,Chinese Restaurant,Italian Restaurant,Ramen Restaurant,Middle Eastern Restaurant,Thai Restaurant,Theater,Spa
7,5,"Chinatown, Grange Park, Kensington Market",Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Dumpling Restaurant,Bakery,Bar,Mexican Restaurant,Vietnamese Restaurant,Coffee Shop,Ramen Restaurant
8,5,Christie,Korean Restaurant,Grocery Store,Coffee Shop,Indian Restaurant,Pizza Place,Café,Park,Diner,Ice Cream Shop,Mexican Restaurant
9,1,Church and Wellesley,Coffee Shop,Japanese Restaurant,Burger Joint,Restaurant,Gay Bar,Diner,Café,Sushi Restaurant,Bookstore,Breakfast Spot


In [189]:
toronto_merged = df_toronto

toronto_merged = toronto_merged.join(top_venues_df.set_index('Neighbourhood'), on='Neighbourhood')
toronto_merged.reset_index(drop=True,inplace=True)
toronto_merged.head(10)

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676531,-79.29541,5,Pet Store,Pizza Place,Japanese Restaurant,Bar,Tea Room,Church,Nail Salon,Neighborhood,Juice Bar,French Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.683262,-79.35512,5,Greek Restaurant,Ice Cream Shop,Restaurant,Café,Yoga Studio,Pub,Juice Bar,Coffee Shop,Tailor Shop,Discount Store
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.667985,-79.314642,5,Pet Store,Coffee Shop,Brewery,Bakery,Fast Food Restaurant,Sandwich Place,Café,Park,Gym,Diner
3,M4M,East Toronto,Studio District,43.662766,-79.33483,5,Coffee Shop,Bakery,Café,Bar,Diner,Sandwich Place,Italian Restaurant,American Restaurant,Fast Food Restaurant,Pizza Place
4,M4N,Central Toronto,Lawrence Park,43.728135,-79.38709,0,Restaurant,Bus Line,Bookstore,Gym / Fitness Center,Park,Café,Coffee Shop,Elementary School,Donut Shop,Dumpling Restaurant
5,M4P,Central Toronto,Davisville North,43.712755,-79.388514,5,Brewery,Dessert Shop,Burger Joint,Café,Pizza Place,Breakfast Spot,Food & Drink Shop,Sushi Restaurant,Dog Run,Sandwich Place
6,M4R,Central Toronto,North Toronto West,43.714523,-79.40696,5,Sporting Goods Shop,Café,Clothing Store,Coffee Shop,Restaurant,Steakhouse,Spa,Sandwich Place,Salon / Barbershop,Chinese Restaurant
7,M4S,Central Toronto,Davisville,43.702765,-79.385769,5,Coffee Shop,Dessert Shop,Sandwich Place,Sushi Restaurant,Gym,Pharmacy,Café,Seafood Restaurant,Pizza Place,Italian Restaurant
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.690505,-79.382973,3,Park,Thai Restaurant,Playground,Gym,Grocery Store,Tennis Court,Dumpling Restaurant,Discount Store,Dive Bar,Dog Run
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686003,-79.402335,5,Coffee Shop,Skating Rink,Pub,Pizza Place,Fried Chicken Joint,Sandwich Place,Sushi Restaurant,Supermarket,Bagel Shop,Boutique


## Map the resulting clusters

#### We start mapping the clusters acoording to their neighbourhoods

In [198]:
import matplotlib.cm as cm
import matplotlib.colors as colors

#Get average coordinates for Toronto Center
coord = np.array(df_toronto[['Latitude','Longitude']].mean(axis = 0))

loc_lat = coord[0]
loc_lon = coord[1]


colors_array = cm.rainbow(np.linspace(0, 1, n_clusters))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#Map clusters
toronto_map = folium.Map(location=[loc_lat, loc_lon], zoom_start=12)

for borough, name, lat, lon, cluster in zip(toronto_merged['Borough'], toronto_merged['Neighbourhood'],toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Cluster labels']):
    label = folium.Popup(str(name) + ', ' + str(borough) + ' | Cluster: ' + str(cluster+1))
    folium.CircleMarker([lat,lon], radius=5, popup=label, color=rainbow[cluster-1],fill=True,fill_color=rainbow[cluster-1],fill_opacity=0.7).add_to(toronto_map)
    
toronto_map

## Explore each cluster

#### Now we can explore each cluster by showing its top 10 venue categories. The map helps us to know geographically where they are mostly located. We can clearly note a pattern in this.

### Cluster 1

In [192]:
toronto_merged[toronto_merged['Cluster labels']==0]

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,M4N,Central Toronto,Lawrence Park,43.728135,-79.38709,0,Restaurant,Bus Line,Bookstore,Gym / Fitness Center,Park,Café,Coffee Shop,Elementary School,Donut Shop,Dumpling Restaurant
23,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.694785,-79.414405,0,Bank,Ice Cream Shop,Bus Line,Café,Bookstore,Salon / Barbershop,Bakery,Coffee Shop,Park,Burger Joint


### Cluster 2

In [193]:
toronto_merged[toronto_merged['Cluster labels']==1]

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,M4Y,Downtown Toronto,Church and Wellesley,43.666585,-79.381302,1,Coffee Shop,Japanese Restaurant,Burger Joint,Restaurant,Gay Bar,Diner,Café,Sushi Restaurant,Bookstore,Breakfast Spot
13,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65512,-79.36264,1,Coffee Shop,Café,Park,Italian Restaurant,Pub,Bakery,Gym / Fitness Center,Theater,Restaurant,Breakfast Spot
14,M5B,Downtown Toronto,"Ryerson, Garden District",43.657363,-79.37818,1,Coffee Shop,Clothing Store,Restaurant,Tea Room,Café,Ramen Restaurant,Pizza Place,Middle Eastern Restaurant,Japanese Restaurant,Italian Restaurant
15,M5C,Downtown Toronto,St. James Town,43.65121,-79.375481,1,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Breakfast Spot,American Restaurant,Seafood Restaurant,Cosmetics Shop
16,M5E,Downtown Toronto,Berczy Park,43.64516,-79.373675,1,Coffee Shop,Hotel,Restaurant,Café,Cocktail Bar,Italian Restaurant,Japanese Restaurant,Pub,Bakery,Deli / Bodega
17,M5G,Downtown Toronto,Central Bay Street,43.656091,-79.38493,1,Coffee Shop,Clothing Store,Restaurant,Chinese Restaurant,Italian Restaurant,Ramen Restaurant,Middle Eastern Restaurant,Thai Restaurant,Theater,Spa
18,M5H,Downtown Toronto,"Adelaide, King, Richmond",43.6497,-79.382582,1,Coffee Shop,Hotel,Café,Steakhouse,Deli / Bodega,Restaurant,Breakfast Spot,Gastropub,American Restaurant,Japanese Restaurant
20,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.648399,-79.383939,1,Coffee Shop,Café,Hotel,Steakhouse,Restaurant,American Restaurant,Deli / Bodega,Gastropub,Gym,Asian Restaurant
21,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648395,-79.378865,1,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Steakhouse,Gastropub,Japanese Restaurant,Deli / Bodega,Beer Bar
24,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67484,-79.403698,1,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Mediterranean Restaurant,Pizza Place,French Restaurant,History Museum,Jewish Restaurant,Park


### Cluster 3

In [194]:
toronto_merged[toronto_merged['Cluster labels']==2]

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,M5N,Central Toronto,Roselawn,43.711941,-79.41912,2,Playground,Business Service,Pet Store,Garden,Falafel Restaurant,Farm,Event Space,Ethiopian Restaurant,Dog Run,Farmers Market


### Cluster 4

In [195]:
toronto_merged[toronto_merged['Cluster labels']==3]

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.690505,-79.382973,3,Park,Thai Restaurant,Playground,Gym,Grocery Store,Tennis Court,Dumpling Restaurant,Discount Store,Dive Bar,Dog Run
10,M4W,Downtown Toronto,Rosedale,43.68196,-79.378445,3,Park,Grocery Store,Playground,Candy Store,Bank,Electronics Store,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant


### Cluster 5

In [196]:
toronto_merged[toronto_merged['Cluster labels']==4]

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,M5J,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",43.62347,-79.393979,4,Sculpture Garden,Harbor / Marina,Boat or Ferry,Music Venue,Falafel Restaurant,Farm,Farmers Market,Event Space,Dive Bar,Ethiopian Restaurant


### Cluster 6

In [197]:
toronto_merged[toronto_merged['Cluster labels']==5]

Unnamed: 0,Postalcode,Borough,Neighbourhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676531,-79.29541,5,Pet Store,Pizza Place,Japanese Restaurant,Bar,Tea Room,Church,Nail Salon,Neighborhood,Juice Bar,French Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.683262,-79.35512,5,Greek Restaurant,Ice Cream Shop,Restaurant,Café,Yoga Studio,Pub,Juice Bar,Coffee Shop,Tailor Shop,Discount Store
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.667985,-79.314642,5,Pet Store,Coffee Shop,Brewery,Bakery,Fast Food Restaurant,Sandwich Place,Café,Park,Gym,Diner
3,M4M,East Toronto,Studio District,43.662766,-79.33483,5,Coffee Shop,Bakery,Café,Bar,Diner,Sandwich Place,Italian Restaurant,American Restaurant,Fast Food Restaurant,Pizza Place
5,M4P,Central Toronto,Davisville North,43.712755,-79.388514,5,Brewery,Dessert Shop,Burger Joint,Café,Pizza Place,Breakfast Spot,Food & Drink Shop,Sushi Restaurant,Dog Run,Sandwich Place
6,M4R,Central Toronto,North Toronto West,43.714523,-79.40696,5,Sporting Goods Shop,Café,Clothing Store,Coffee Shop,Restaurant,Steakhouse,Spa,Sandwich Place,Salon / Barbershop,Chinese Restaurant
7,M4S,Central Toronto,Davisville,43.702765,-79.385769,5,Coffee Shop,Dessert Shop,Sandwich Place,Sushi Restaurant,Gym,Pharmacy,Café,Seafood Restaurant,Pizza Place,Italian Restaurant
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686003,-79.402335,5,Coffee Shop,Skating Rink,Pub,Pizza Place,Fried Chicken Joint,Sandwich Place,Sushi Restaurant,Supermarket,Bagel Shop,Boutique
11,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.668155,-79.3666,5,Restaurant,Coffee Shop,Café,Park,Pizza Place,Pub,Italian Restaurant,Indian Restaurant,Diner,Bakery
25,M5S,Downtown Toronto,"Harbord, University of Toronto",43.66311,-79.401801,5,Coffee Shop,Café,Bar,Restaurant,Bookstore,Sushi Restaurant,Japanese Restaurant,Bakery,Electronics Store,Music Venue


## Conclusion

We can name the 6 clusters as follows:

1. __Cluster 1__: Financial Center and Bus Stations
1. __Cluster 2__: Coffee and bars
1. __Cluster 3__: Business Center
1. __Cluster 4__: Park and Playground
1. __Cluster 5__: Harbour and airport
1. __Cluster 6__: Restaurant

Geographically we can conclude that any financial centers, bookstores, gym and bus lines can be found in two zones in Central Toronto, northwards Downtown Toronto. Restaurants are mainly distributed evenly north, east and west of Downtown Toronto mainly, but remains grouped near each other. Coffee shops and nightlife remains in Downtown Toronto. The business center is northwest and we can easily locate airport and harbours south of Downtown Toronto. Finally northbound Downtown Toronto we can find parks and playground sites, very near.