# Toronto's Neighborhood Analysis

In this notebook I'll be analising the clusters related to Toronto's negihborhood.

## First Part

In [1]:
#let's import everything needed.
import pandas as pd
import numpy as np
import folium
from bs4 import BeautifulSoup
import geocoder
import requests
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

So, first let's get the data in the wikipedia website.

In [2]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header=0)

data = dfs[0]
data

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
...,...,...,...
175,M5Z,Not assigned,
176,M6Z,Not assigned,
177,M7Z,Not assigned,
178,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...


With the df, now the lat and long data will be concatenated to it

In [3]:
data = data[data.Borough != 'Not assigned']
data

Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
160,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,Business reply mail Processing CentrE
169,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


Let's reorganize the index and later check for duplicates in the Postal Code to Merge the neighborhoods associated to both.

In [4]:
data = data.reset_index(drop=True)
data

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


In [5]:
print(data.duplicated(subset = 'Postal code', keep = False).value_counts())
print(data.shape)
print('103 rows and non postal code replicated. Therefore good to go')

False    103
dtype: int64
(103, 3)
103 rows and non postal code replicated. Therefore good to go


## Second Part

It was imposible to connect to the data using geocoder, so the csv data will be used

In [6]:
latlng_df = pd.read_csv('http://cocl.us/Geospatial_data')
latlng_df

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


Good, so now the merge of both df will be made.

In [7]:
merged_df = data.merge(latlng_df, how='left', left_on='Postal code',
                       right_on = 'Postal Code')

In [8]:
merged_df

Unnamed: 0,Postal code,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,M5A,43.654260,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,M6A,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,M7A,43.662301,-79.389494
...,...,...,...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,M8X,43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,M4Y,43.665860,-79.383160
100,M7Y,East Toronto,Business reply mail Processing CentrE,M7Y,43.662744,-79.321558
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,M8Y,43.636258,-79.498509


In [9]:
merged_df.drop(labels = 'Postal Code', axis = 1, inplace = True)

In [10]:
merged_df

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.654260,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business reply mail Processing CentrE,43.662744,-79.321558
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,43.636258,-79.498509


Now the merged dataframe is go to go.

## Third Part

In [11]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Let's create a map to see Toronto and its Neighborhoods.

In [12]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(merged_df['Latitude'], merged_df['Longitude'], merged_df['Borough'], merged_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Foursquare credentials

In [13]:
CLIENT_ID = 'TU233DL4CRNQQNZC3LUK0LDIRUCMP5H3KOIXJVUWD04520UN'
# your Foursquare ID
CLIENT_SECRET = 'QQXSBQ0BVKRTQC02YVUDE0FGCIIWT13PIOTHMYHLF5IAYGGU'
# your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TU233DL4CRNQQNZC3LUK0LDIRUCMP5H3KOIXJVUWD04520UN
CLIENT_SECRET:QQXSBQ0BVKRTQC02YVUDE0FGCIIWT13PIOTHMYHLF5IAYGGU


In [14]:
# type your answer here
query = 'coffee'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&query={}&ll={},{}&v={}'.format(CLIENT_ID, CLIENT_SECRET, query, latitude, longitude, VERSION)
venues_data = requests.get(url).json()

Let's build a df with the venues data

In [15]:
venues = venues_data['response']['venues']
nearby_venues = pd.json_normalize(venues)
nearby_venues

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,location.postalCode,location.neighborhood,venuePage.id
0,4b44fc77f964a520cc0026e3,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587746204,False,427 University Avenue,43.654053,-79.38809,"[{'label': 'display', 'lat': 43.65405317976302...",340,CA,Toronto,ON,Canada,"[427 University Avenue, Toronto ON, Canada]",,,,
1,59f784dd28122f14f9d5d63d,HotBlack Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587746204,False,245 Queen Street West,43.650364,-79.388669,"[{'label': 'display', 'lat': 43.65036434800487...",515,CA,Toronto,ON,Canada,"[245 Queen Street West (at St Patrick St), Tor...",at St Patrick St,M5V 1Z4,Entertainment District,463001529.0
2,4b0aaa8ef964a520272623e3,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587746204,False,"483 Bay St,Bell Trinity Square",43.653436,-79.382314,"[{'label': 'display', 'lat': 43.653436, 'lng':...",130,CA,Toronto,ON,Canada,"[483 Bay St,Bell Trinity Square (Bell Trinity ...",Bell Trinity Square,M5G 2C9,,
3,4baa9f6cf964a520817a3ae3,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587746204,False,401 Bay St.,43.652135,-79.381172,"[{'label': 'display', 'lat': 43.65213455850074...",268,CA,Toronto,ON,Canada,"[401 Bay St. (at Richmond St. W), Toronto ON M...",at Richmond St. W,M5H 2Y4,,
4,4c19447c4ff90f4765ac0f49,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587746204,False,"595 Bay St,Atrium On Bay",43.656219,-79.38329,"[{'label': 'display', 'lat': 43.656219, 'lng':...",309,CA,Toronto,ON,Canada,"[595 Bay St,Atrium On Bay (at Atrium on Bay), ...",at Atrium on Bay,M5G 2C2,,
5,53e8acc4498ee294fb100183,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587746204,False,425 University Ave,43.65427,-79.387448,"[{'label': 'display', 'lat': 43.65427, 'lng': ...",296,CA,Toronto,ON,Canada,"[425 University Ave (Dundas), Toronto ON M5G 1...",Dundas,M5G 1T6,,
6,4baa31def964a52037523ae3,Coffee office,[],v-1587746204,False,350 Bay St - 7th Floor,43.649498,-79.386479,"[{'label': 'display', 'lat': 43.649498, 'lng':...",488,CA,Toronto,ON,Canada,"[350 Bay St - 7th Floor, Toronto ON, Canada]",,,,
7,4fff1f96e4b042ae8acddca5,Fahrenheit Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587746204,False,120 Lombard St,43.652384,-79.372719,"[{'label': 'display', 'lat': 43.65238358726612...",911,CA,Toronto,ON,Canada,"[120 Lombard St (at Jarvis St), Toronto ON M5C...",at Jarvis St,M5C 3H5,,
8,4ec514ec9911232436e364af,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587746204,False,Yonge,43.6567,-79.379941,"[{'label': 'display', 'lat': 43.65669995833159...",481,CA,Toronto,ON,Canada,"[Yonge (Dundas), Toronto ON M5B 2G9, Canada]",Dundas,M5B 2G9,,
9,4fccaa8fe4b05a98df3d9417,Sam James Coffee Bar (SJCB),"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",v-1587746204,False,150 King St. W,43.647881,-79.384332,"[{'label': 'display', 'lat': 43.64788137014028...",624,CA,Toronto,ON,Canada,"[150 King St. W (in the PATH), Toronto ON M5H ...",in the PATH,M5H 4B6,,


Let's clean and prepare the df

In [16]:
curated_df = nearby_venues.drop(labels = ['id', 'referralId', 'hasPerk', 'location.address', 'location.labeledLatLngs', 'location.distance', 'location.cc', 'location.city', 'location.country', 'location.formattedAddress', 'location.crossStreet', 'location.neighborhood', 'venuePage.id', 'location.state'], axis = 1)
curated_df = curated_df.rename(columns={'location.lat': 'lat', 'location.lng': 'lng', 'location.postalCode': 'Postal code'})

In [17]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [18]:
curated_df['categories'] = curated_df.apply(get_category_type, axis=1)

In [19]:
curated_df

Unnamed: 0,name,categories,lat,lng,Postal code
0,Timothy's World Coffee,Coffee Shop,43.654053,-79.38809,
1,HotBlack Coffee,Coffee Shop,43.650364,-79.388669,M5V 1Z4
2,Timothy's World Coffee,Coffee Shop,43.653436,-79.382314,M5G 2C9
3,Timothy's World Coffee,Coffee Shop,43.652135,-79.381172,M5H 2Y4
4,Timothy's World Coffee,Coffee Shop,43.656219,-79.38329,M5G 2C2
5,Timothy's World Coffee,Coffee Shop,43.65427,-79.387448,M5G 1T6
6,Coffee office,,43.649498,-79.386479,
7,Fahrenheit Coffee,Coffee Shop,43.652384,-79.372719,M5C 3H5
8,Timothy's World Coffee,Coffee Shop,43.6567,-79.379941,M5B 2G9
9,Sam James Coffee Bar (SJCB),Café,43.647881,-79.384332,M5H 4B6


In [20]:
curated_df.dtypes

name            object
categories      object
lat            float64
lng            float64
Postal code     object
dtype: object

In [21]:
curated_df['Postal code'].astype('object')

0         NaN
1     M5V 1Z4
2     M5G 2C9
3     M5H 2Y4
4     M5G 2C2
5     M5G 1T6
6         NaN
7     M5C 3H5
8     M5B 2G9
9     M5H 4B6
10    M5B 1X8
11    M5C 3G8
12        NaN
13    M5G 1X5
14    M5K 1A1
15    M5B 2H4
16    M5S 1Y9
17        NaN
18    M5J 1E6
19    M5V 3K2
20        NaN
21    M5J 1C3
22    M5T 1P7
23    M5E 1M6
24        NaN
25    M4X 1P3
26    M5T 1R5
27        NaN
28        NaN
29    M4Y 1N6
Name: Postal code, dtype: object

In [22]:
def get_postalcode_standarization(row):
    try:
        postal_codes = row['Postal code']
    except:
        postal_codes = row['Postal Code']
    if postal_codes == 'NaN':
        postal_codes = str(postal_codes)
        return None
    else:
        postal_codes = str(postal_codes)
        postal_codes = postal_codes[0:3]
        return postal_codes

In [23]:
curated_df['Postal code'] = curated_df.apply(get_postalcode_standarization, axis=1)

In [24]:
curated_df

Unnamed: 0,name,categories,lat,lng,Postal code
0,Timothy's World Coffee,Coffee Shop,43.654053,-79.38809,
1,HotBlack Coffee,Coffee Shop,43.650364,-79.388669,M5V
2,Timothy's World Coffee,Coffee Shop,43.653436,-79.382314,M5G
3,Timothy's World Coffee,Coffee Shop,43.652135,-79.381172,M5H
4,Timothy's World Coffee,Coffee Shop,43.656219,-79.38329,M5G
5,Timothy's World Coffee,Coffee Shop,43.65427,-79.387448,M5G
6,Coffee office,,43.649498,-79.386479,
7,Fahrenheit Coffee,Coffee Shop,43.652384,-79.372719,M5C
8,Timothy's World Coffee,Coffee Shop,43.6567,-79.379941,M5B
9,Sam James Coffee Bar (SJCB),Café,43.647881,-79.384332,M5H


In [25]:
curated_df.dtypes

name            object
categories      object
lat            float64
lng            float64
Postal code     object
dtype: object

In [26]:
curated_venues = curated_df[curated_df['Postal code'] != 'NaN']
curated_venues = curated_df[curated_df['Postal code'] != 'nan']
curated_venues

Unnamed: 0,name,categories,lat,lng,Postal code
1,HotBlack Coffee,Coffee Shop,43.650364,-79.388669,M5V
2,Timothy's World Coffee,Coffee Shop,43.653436,-79.382314,M5G
3,Timothy's World Coffee,Coffee Shop,43.652135,-79.381172,M5H
4,Timothy's World Coffee,Coffee Shop,43.656219,-79.38329,M5G
5,Timothy's World Coffee,Coffee Shop,43.65427,-79.387448,M5G
7,Fahrenheit Coffee,Coffee Shop,43.652384,-79.372719,M5C
8,Timothy's World Coffee,Coffee Shop,43.6567,-79.379941,M5B
9,Sam James Coffee Bar (SJCB),Café,43.647881,-79.384332,M5H
10,Balzac's Coffee,Coffee Shop,43.657854,-79.3792,M5B
11,Timothy's World Coffee,Coffee Shop,43.650948,-79.376825,M5C


A function will be defined to get the nearby venues of each neighborhood

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
toronto_venues = getNearbyVenues(names=merged_df['Neighborhood'],
                                   latitudes=merged_df['Latitude'],
                                   longitudes=merged_df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park / Harbourfront
Lawrence Manor / Lawrence Heights
Queen's Park / Ontario Provincial Government
Islington Avenue
Malvern / Rouge
Don Mills
Parkview Hill / Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park / Princess Gardens / Martin Grove / Islington / Cloverdale
Rouge Hill / Port Union / Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate / Bloordale Gardens / Old Burnhamthorpe / Markland Wood
Guildwood / Morningside / West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor / Wilson Heights / Downsview North
Thorncliffe Park
Richmond / Adelaide / King
Dufferin / Dovercourt Village
Scarborough Village
Fairview / Henry Farm / Oriole
Northwood Park / York University
East Toronto
Harbourfront East / Union Station / Toronto Islands
Little Portugal / Trinity
Kennedy Park / Ionview / East Birchmount Park
Bayview Village
Do

In [29]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [30]:
print(toronto_venues.shape)
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

(1340, 7)
There are 232 uniques categories.


In [31]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()


Unnamed: 0,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
toronto_onehot.shape

(1340, 232)

In [33]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
1,Alderwood / Long Branch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
2,Bathurst Manor / Wilson Heights / Downsview North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.050000,0.000000,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
4,Bedford Park / Lawrence Manor East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,Willowdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.000000,0.027027,0.0,0.0,0.0,0.0,0.0
91,Willowdale / Newtonbrook,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
92,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0
93,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,...,0.0,0.0,0.0,0.090909,0.000000,0.0,0.0,0.0,0.0,0.0


In [34]:
toronto_grouped.shape

(95, 232)

As they are many, let's pick only the five most frequent

In [35]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0                     Lounge   0.2
1  Latin American Restaurant   0.2
2             Breakfast Spot   0.2
3             Clothing Store   0.2
4               Skating Rink   0.2


----Alderwood / Long Branch----
            venue  freq
0     Pizza Place   0.2
1     Coffee Shop   0.1
2             Gym   0.1
3        Pharmacy   0.1
4  Sandwich Place   0.1


----Bathurst Manor / Wilson Heights / Downsview North----
                venue  freq
0         Coffee Shop  0.10
1                Bank  0.10
2  Frozen Yogurt Shop  0.05
3       Shopping Mall  0.05
4      Sandwich Place  0.05


----Bayview Village----
                 venue  freq
0                 Café  0.25
1  Japanese Restaurant  0.25
2                 Bank  0.25
3   Chinese Restaurant  0.25
4          Yoga Studio  0.00


----Bedford Park / Lawrence Manor East----
                venue  freq
0     Thai Restaurant  0.08
1         Coffee Shop  0.08
2          Restaurant  0.08
3      Sa

4           Mobile Phone Shop  0.00


----Queen's Park / Ontario Provincial Government----
              venue  freq
0       Coffee Shop  0.20
1  Sushi Restaurant  0.10
2             Diner  0.07
3       Yoga Studio  0.03
4      Burger Joint  0.03


----Regent Park / Harbourfront----
            venue  freq
0     Coffee Shop  0.20
1            Park  0.10
2  Breakfast Spot  0.07
3          Bakery  0.07
4     Yoga Studio  0.03


----Richmond / Adelaide / King----
                 venue  freq
0          Coffee Shop  0.10
1                 Café  0.10
2  American Restaurant  0.07
3   Seafood Restaurant  0.07
4        Deli / Bodega  0.03


----Rosedale----
               venue  freq
0               Park  0.50
1         Playground  0.25
2              Trail  0.25
3        Yoga Studio  0.00
4  Mobile Phone Shop  0.00


----Roselawn----
                        venue  freq
0                      Garden   1.0
1                 Yoga Studio   0.0
2  Modern European Restaurant   0.0
3                

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Skating Rink,Clothing Store,Breakfast Spot,Lounge,Department Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run
1,Alderwood / Long Branch,Pizza Place,Dance Studio,Sandwich Place,Coffee Shop,Skating Rink,Pharmacy,Gym,Athletics & Sports,Pub,Deli / Bodega
2,Bathurst Manor / Wilson Heights / Downsview North,Coffee Shop,Bank,Gas Station,Middle Eastern Restaurant,Diner,Sandwich Place,Bridal Shop,Restaurant,Ice Cream Shop,Supermarket
3,Bayview Village,Japanese Restaurant,Bank,Chinese Restaurant,Café,Women's Store,Department Store,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop
4,Bedford Park / Lawrence Manor East,Italian Restaurant,Sandwich Place,Coffee Shop,Restaurant,Thai Restaurant,Pizza Place,Spa,Liquor Store,Indian Restaurant,Butcher


Let's define the clusters' function

In [38]:
# set number of clusters
kclusters = 3

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Let's merge the clusters labels with the neighborhood df

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = merged_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Construction & Landscaping,Food & Drink Shop,Women's Store,Deli / Bodega,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Hockey Arena,Coffee Shop,Pizza Place,French Restaurant,Intersection,Portuguese Restaurant,Dim Sum Restaurant,Deli / Bodega,Department Store,Dessert Shop
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,1.0,Coffee Shop,Park,Breakfast Spot,Bakery,Mexican Restaurant,Restaurant,Café,Pub,Chocolate Shop,Performing Arts Venue
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,1.0,Clothing Store,Women's Store,Furniture / Home Store,Gift Shop,Coffee Shop,Miscellaneous Shop,Boutique,Event Space,Accessories Store,Vietnamese Restaurant
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1.0,Coffee Shop,Sushi Restaurant,Diner,Mexican Restaurant,Beer Bar,Spa,Sandwich Place,Burger Joint,Burrito Place,Creperie


The df must be cleaned and prepared first

In [40]:
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(str)
toronto_merged = toronto_merged[toronto_merged['Cluster Labels'] != 'NaN']
toronto_merged = toronto_merged[toronto_merged['Cluster Labels'] != 'nan']

In [41]:
toronto_merged

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Construction & Landscaping,Food & Drink Shop,Women's Store,Deli / Bodega,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Hockey Arena,Coffee Shop,Pizza Place,French Restaurant,Intersection,Portuguese Restaurant,Dim Sum Restaurant,Deli / Bodega,Department Store,Dessert Shop
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.654260,-79.360636,1.0,Coffee Shop,Park,Breakfast Spot,Bakery,Mexican Restaurant,Restaurant,Café,Pub,Chocolate Shop,Performing Arts Venue
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,1.0,Clothing Store,Women's Store,Furniture / Home Store,Gift Shop,Coffee Shop,Miscellaneous Shop,Boutique,Event Space,Accessories Store,Vietnamese Restaurant
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1.0,Coffee Shop,Sushi Restaurant,Diner,Mexican Restaurant,Beer Bar,Spa,Sandwich Place,Burger Joint,Burrito Place,Creperie
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,43.653654,-79.506944,0.0,Park,River,College Stadium,Dance Studio,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,1.0,Salon / Barbershop,Thai Restaurant,Juice Bar,Ramen Restaurant,Pub,Hobby Shop,Diner,Smoke Shop,Café,Sushi Restaurant
100,M7Y,East Toronto,Business reply mail Processing CentrE,43.662744,-79.321558,1.0,Light Rail Station,Yoga Studio,Auto Workshop,Smoke Shop,Spa,Brewery,Burrito Place,Farmers Market,Fast Food Restaurant,Restaurant
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,43.636258,-79.498509,1.0,Locksmith,Construction & Landscaping,Baseball Field,Women's Store,Department Store,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run


In [42]:
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(float).astype(int)

In [43]:
toronto_merged

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Park,Construction & Landscaping,Food & Drink Shop,Women's Store,Deli / Bodega,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Hockey Arena,Coffee Shop,Pizza Place,French Restaurant,Intersection,Portuguese Restaurant,Dim Sum Restaurant,Deli / Bodega,Department Store,Dessert Shop
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.654260,-79.360636,1,Coffee Shop,Park,Breakfast Spot,Bakery,Mexican Restaurant,Restaurant,Café,Pub,Chocolate Shop,Performing Arts Venue
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,1,Clothing Store,Women's Store,Furniture / Home Store,Gift Shop,Coffee Shop,Miscellaneous Shop,Boutique,Event Space,Accessories Store,Vietnamese Restaurant
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1,Coffee Shop,Sushi Restaurant,Diner,Mexican Restaurant,Beer Bar,Spa,Sandwich Place,Burger Joint,Burrito Place,Creperie
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,43.653654,-79.506944,0,Park,River,College Stadium,Dance Studio,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,1,Salon / Barbershop,Thai Restaurant,Juice Bar,Ramen Restaurant,Pub,Hobby Shop,Diner,Smoke Shop,Café,Sushi Restaurant
100,M7Y,East Toronto,Business reply mail Processing CentrE,43.662744,-79.321558,1,Light Rail Station,Yoga Studio,Auto Workshop,Smoke Shop,Spa,Brewery,Burrito Place,Farmers Market,Fast Food Restaurant,Restaurant
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,43.636258,-79.498509,1,Locksmith,Construction & Landscaping,Baseball Field,Women's Store,Department Store,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run


In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Park,Construction & Landscaping,Food & Drink Shop,Women's Store,Deli / Bodega,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
21,York,0,Park,Pool,Women's Store,Gay Bar,Cupcake Shop,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
35,East York,0,Park,Convenience Store,Women's Store,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
66,North York,0,Park,Convenience Store,Bank,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center
85,Scarborough,0,Park,Playground,Curling Ice,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
91,Downtown Toronto,0,Park,Playground,Trail,Curling Ice,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store
98,Etobicoke,0,Park,River,College Stadium,Dance Studio,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store


In [46]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,Hockey Arena,Coffee Shop,Pizza Place,French Restaurant,Intersection,Portuguese Restaurant,Dim Sum Restaurant,Deli / Bodega,Department Store,Dessert Shop
2,Downtown Toronto,1,Coffee Shop,Park,Breakfast Spot,Bakery,Mexican Restaurant,Restaurant,Café,Pub,Chocolate Shop,Performing Arts Venue
3,North York,1,Clothing Store,Women's Store,Furniture / Home Store,Gift Shop,Coffee Shop,Miscellaneous Shop,Boutique,Event Space,Accessories Store,Vietnamese Restaurant
4,Downtown Toronto,1,Coffee Shop,Sushi Restaurant,Diner,Mexican Restaurant,Beer Bar,Spa,Sandwich Place,Burger Joint,Burrito Place,Creperie
6,Scarborough,1,Fast Food Restaurant,Women's Store,Deli / Bodega,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,1,Café,Restaurant,Coffee Shop,Seafood Restaurant,Tea Room,Gastropub,Gym / Fitness Center,Bookstore,Speakeasy,Pub
99,Downtown Toronto,1,Salon / Barbershop,Thai Restaurant,Juice Bar,Ramen Restaurant,Pub,Hobby Shop,Diner,Smoke Shop,Café,Sushi Restaurant
100,East Toronto,1,Light Rail Station,Yoga Studio,Auto Workshop,Smoke Shop,Spa,Brewery,Burrito Place,Farmers Market,Fast Food Restaurant,Restaurant
101,Etobicoke,1,Locksmith,Construction & Landscaping,Baseball Field,Women's Store,Department Store,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run


In [47]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Etobicoke,2,Home Service,Women's Store,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner


It is easy to see that the first cluster has many park and is more 'familiar' than the second cluster. Meanwhile the second has more cafes and restaurants. Clearly the third cluster can be merged in one of this two. It will probably fit the second one due to its characteristics.