# Segmenting and Clustering Neighborhoods in Toronto

## Introduction

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#menu1">Download and Explore Dataset</a>

2. <a href="#menu2">Explore Neighborhoods in New York City</a>

3. <a href="#menu3">Analyze Each Neighborhood</a>

4. <a href="#menu4">Cluster Neighborhoods</a>

5. <a href="#menu5">Examine Clusters</a>    
</font>
</div>

Importing libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
import geopandas as gpd #geospatial data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='menu1'></a>

## 1. Download and Explore Dataset

__Data source__:

1. Community Council: https://portal0.cf.opendata.inter.sandbox-toronto.ca/dataset/community-council-boundaries/
2. Wiki: https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto

In [2]:
gdf_wards = gpd.read_file('data/Community Council Boundaries Data.geojson')
gdf_wards.head()

Unnamed: 0,_id,AREA_ID,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
0,25,2476922,26002906,,EA,EA,Scarborough Community Council,Scarborough Community Council,325977.228,4848710.992,-79.236918,43.778057,17503889,361646300.0,99909.825113,POLYGON ((-79.15179162455914 43.81409053376601...
1,26,2476921,26002905,,SO,SO,Toronto and East York Community Council,Toronto and East York Community Council,313987.302,4836217.155,-79.38608,43.665843,17503905,224308000.0,127947.22322,POLYGON ((-79.29863870134947 43.71514786351841...
2,27,2476920,26002904,,NO,NO,North York Community Council,North York Community Council,312734.494,4845517.247,-79.401477,43.74957,17503921,298087100.0,82762.050989,POLYGON ((-79.31326282774432 43.75221330812462...
3,28,2476919,26002903,,WE,WE,Etobicoke York Community Council,Etobicoke York Community Council,301398.386,4838724.172,-79.542195,43.688458,17503937,347140900.0,124437.576898,POLYGON ((-79.48847930500921 43.75332810175874...


Maps of wards

In [3]:
map_city = folium.Map(location=[43.6532, -79.3832], zoom_start=10)

geojson_wards = gdf_wards.to_crs(epsg=4326)

feat_wards = folium.features.GeoJson(geojson_wards)

map_city.add_child(feat_wards)

for lat, long, label in zip(gdf_wards.LATITUDE, gdf_wards.LONGITUDE, gdf_wards.AREA_NAME):
        folium.CircleMarker(
            [lat, long],
            radius=5,
            color='yellow',
            fill=True,
            popup=label,
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(map_city)

map_city

We are going to get the Scarborough's neighborhood data by scraping from Wikipedia. 

In [4]:
from bs4 import BeautifulSoup

In [51]:
wiki = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

r = requests.get(wiki)

In [81]:
soup = BeautifulSoup(r.content)
trs = soup.find('table','wikitable sortable').find_all('tr')
df_postcode = pd.DataFrame(columns=['postcode', 'borough', 'neighborhood'])
for tr in trs:
    tds = tr.find_all('td')
    if len(tds) == 0: continue
    postcode = tds[0].get_text().strip()
    borough = tds[1].get_text().strip()
    neighborhood = tds[2].get_text().strip()
    if borough != 'Not assigned':
#         print("{} - {} - {}".format(postcode, borough, neighborhood))
        df_postcode = df_postcode.append({'postcode': postcode, 'borough': borough, 'neighborhood':neighborhood}, ignore_index=True)

print(df_postcode.shape[0])
df_postcode.head()

211


Unnamed: 0,postcode,borough,neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


Find not assigned neighborhood and replace it with borough value

In [107]:
df_postcode[df_postcode.neighborhood=='Not assigned']

Unnamed: 0,postcode,borough,neighborhood


In [108]:
df_postcode['neighborhood'] = df_postcode.apply(lambda x: x.borough if x.neighborhood == 'Not assigned' else x.neighborhood, axis=1)

In [109]:
df_postcode[df_postcode.neighborhood=='Not assigned']

Unnamed: 0,postcode,borough,neighborhood


In [110]:
df_postcode.groupby('postcode').count()

Unnamed: 0_level_0,borough,neighborhood
postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,2,2
M1C,3,3
M1E,3,3
M1G,1,1
M1H,1,1
M1J,1,1
M1K,3,3
M1L,3,3
M1M,3,3
M1N,2,2


In [91]:
df_postcode.groupby('neighborhood').count()

Unnamed: 0_level_0,postcode,borough
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Adelaide,1,1
Agincourt,1,1
Agincourt North,1,1
Albion Gardens,1,1
Alderwood,1,1
Bathurst Manor,1,1
Bathurst Quay,1,1
Bayview Village,1,1
Beaumond Heights,1,1
Bedford Park,1,1


In [6]:
# DISTRICT = 'Scarborough'
DISTRICT = 'Etobicoke'

soup = BeautifulSoup(r.content)
district = soup.find(id=DISTRICT)

df_neighbourhood = pd.DataFrame(columns=['neighborhood', 'latitude', 'longitude'])
lis = district.parent.find_next_sibling('div').find_all('li')
for li in lis:
    text = li.a.get_text()
    try:
        geolocator = Nominatim(user_agent="ny_explorer")
        location = geolocator.geocode("{}, {}, Toronto, Canada".format(text, DISTRICT))
        latitude = location.latitude
        longitude = location.longitude
        print("The geographical coordinate of {} are {}, {}.".format(text, latitude, longitude))
        df_neighbourhood = df_neighbourhood.append({'neighborhood': text, 'latitude': latitude, 'longitude': longitude}, ignore_index=True)
    except:
        continue


The geographical coordinate of Alderwood are 43.6017173, -79.5452325.
The geographical coordinate of Centennial Park are 43.65359045, -79.5888688813873.
The geographical coordinate of Eatonville are 43.6462843, -79.5600005.
The geographical coordinate of The Elms are 43.6969975, -79.5218834.
The geographical coordinate of Eringate are 43.6622732, -79.5765162.
The geographical coordinate of Humber Bay are 43.6400463, -79.495028.
The geographical coordinate of Humber Heights – Westmount are 43.6860721, -79.5288038497315.
The geographical coordinate of Humber Valley Village are 43.6664717, -79.5243136.
The geographical coordinate of Humberwood are 43.722525, -79.5460241478979.
The geographical coordinate of Islington–City Centre West are 43.6487953, -79.5490002.
The geographical coordinate of Kingsview Village are 43.6995391, -79.5563459.
The geographical coordinate of The Kingsway are 43.6473811, -79.5113328.
The geographical coordinate of Long Branch are 43.5930751, -79.541212.
The geog

Neighborhood from Wikipedia

In [7]:
df_neighbourhood

Unnamed: 0,neighborhood,latitude,longitude
0,Alderwood,43.601717,-79.545232
1,Centennial Park,43.65359,-79.588869
2,Eatonville,43.646284,-79.560001
3,The Elms,43.696998,-79.521883
4,Eringate,43.662273,-79.576516
5,Humber Bay,43.640046,-79.495028
6,Humber Heights – Westmount,43.686072,-79.528804
7,Humber Valley Village,43.666472,-79.524314
8,Humberwood,43.722525,-79.546024
9,Islington–City Centre West,43.648795,-79.549


Let's add neighborhood layer on top of district on the map

In [8]:
map_city = folium.Map(location=[43.6532, -79.3832], zoom_start=10)
# gpd_wards.plot(figsize=(20,20))
# plt.show()

geojson_wards = gdf_wards.to_crs(epsg=4326)

feat_wards = folium.features.GeoJson(geojson_wards)

map_city.add_child(feat_wards)

for lat, long, label in zip(df_neighbourhood.latitude, df_neighbourhood.longitude, df_neighbourhood.neighborhood):
        folium.CircleMarker(
            [lat, long],
            radius=5,
            color='yellow',
            fill=True,
            popup=label,
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(map_city)

map_city

#### Define Foursquare Credentials and Version

In [9]:
CLIENT_ID = 'xxxx' # your Foursquare ID
CLIENT_SECRET = 'xxxx' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WXNUH2PMIPGTENRW1DBJDNR0YB4PZSG3VKW3EFFMB435QANG
CLIENT_SECRET:IMYN2GET53TGRICI2MGHNN0JJIREVQLOAW1BOI2V0PD2TC3N


#### Let's explore the first neighborhood in our dataframe.

In [10]:
neighborhood_latitude, neighborhood_longitude = df_neighbourhood.iloc[0].latitude, df_neighbourhood.iloc[0].longitude
url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v=20180605&ll={},{}&radius=500&limit=100".format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude)
print(url)

https://api.foursquare.com/v2/venues/explore?client_id=WXNUH2PMIPGTENRW1DBJDNR0YB4PZSG3VKW3EFFMB435QANG&client_secret=IMYN2GET53TGRICI2MGHNN0JJIREVQLOAW1BOI2V0PD2TC3N&v=20180605&ll=43.6017173,-79.5452325&radius=500&limit=100


In [11]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cd0375ef594df21bfbafa31'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Alderwood',
  'headerFullLocation': 'Alderwood, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 9,
  'suggestedBounds': {'ne': {'lat': 43.6062173045, 'lng': -79.53902992444722},
   'sw': {'lat': 43.597217295499995, 'lng': -79.55143507555277}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c47927c1ddec928fbec9d32',
       'name': 'Il Paesano Pizzeria & Restaurant',
       'location': {'address': '396 Browns Line',
        'crossStreet': 'at Horner Ave',
        'lat': 43.60128,
        'lng': -79.545028,
        'labeledLatLngs': [{'label': '

Borrow get_category_type from the course

In [12]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [13]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Il Paesano Pizzeria & Restaurant,Pizza Place,43.60128,-79.545028
1,Timothy's Pub,Pub,43.600165,-79.544699
2,Toronto Gymnastics International,Gym,43.599832,-79.542924
3,Tim Hortons,Coffee Shop,43.602396,-79.545048
4,Subway,Sandwich Place,43.599262,-79.54434


In [14]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

9 venues were returned by Foursquare.


<a id='menu2'></a>

## 2. Explore Neighborhoods in City of Toronto

Again borrowing funtion from the course

In [15]:
LIMIT = 100
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])


    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
district_venues = getNearbyVenues(names=df_neighbourhood['neighborhood'], latitudes=df_neighbourhood['latitude'],longitudes=df_neighbourhood['longitude'],radius=1000)

Alderwood
Centennial Park
Eatonville
The Elms
Eringate
Humber Bay
Humber Heights – Westmount
Humber Valley Village
Humberwood
Islington–City Centre West
Kingsview Village
The Kingsway
Long Branch
Markland Wood
Mimico
New Toronto
Princess Gardens
Rexdale
Richview
Smithfield
Stonegate-Queensway
Sunnylea
Thistletown
Thorncrest Village
West Deane Park
Willowridge


And the venues are:

In [17]:
district_venues.shape[0]

514

In [18]:
district_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alderwood,43.601717,-79.545232,Il Paesano Pizzeria & Restaurant,43.60128,-79.545028,Pizza Place
1,Alderwood,43.601717,-79.545232,Timothy's Pub,43.600165,-79.544699,Pub
2,Alderwood,43.601717,-79.545232,Toronto Gymnastics International,43.599832,-79.542924,Gym
3,Alderwood,43.601717,-79.545232,Farm Boy,43.610012,-79.547581,Grocery Store
4,Alderwood,43.601717,-79.545232,Dollarama,43.60951,-79.547686,Discount Store


Let's check how many venues were returned for each neighborhood

In [19]:
district_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alderwood,22,22,22,22,22,22
Centennial Park,28,28,28,28,28,28
Eatonville,15,15,15,15,15,15
Eringate,9,9,9,9,9,9
Humber Bay,4,4,4,4,4,4
Humber Heights – Westmount,15,15,15,15,15,15
Humber Valley Village,23,23,23,23,23,23
Humberwood,5,5,5,5,5,5
Islington–City Centre West,8,8,8,8,8,8
Kingsview Village,21,21,21,21,21,21


#### Let's find out how many unique categories can be curated from all the returned venues

In [20]:
print('There are {} uniques categories.'.format(len(district_venues['Venue Category'].unique())))

There are 114 uniques categories.


<a id='menu3'></a>

## 3. Analyze Each Neighborhood

In [21]:
# one hot encoding
district_onehot = pd.get_dummies(district_venues[['Venue Category']], prefix="", prefix_sep="")

#Add Neighborhood name
district_onehot = pd.concat([district_venues[['Neighborhood']], district_onehot], axis=1)

district_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,American Restaurant,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Breakfast Spot,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Stop,Café,Camera Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,College Rec Center,Concert Hall,Convenience Store,Cosmetics Shop,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Electronics Store,Event Service,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,History Museum,Hobby Shop,Hockey Arena,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Laundromat,Light Rail Station,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Moroccan Restaurant,Movie Theater,Optical Shop,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Pool Hall,Pub,Racetrack,Record Shop,Recreation Center,Restaurant,River,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shopping Mall,Skating Rink,Ski Area,Ski Chalet,Smoothie Shop,Soccer Field,Social Club,South American Restaurant,Spa,Sporting Goods Shop,Stadium,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Video Game Store,Wings Joint,Yoga Studio
0,Alderwood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Alderwood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Alderwood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Alderwood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alderwood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [22]:
district_onehot.shape

(514, 115)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [23]:
district_grouped = district_onehot.groupby('Neighborhood').mean().reset_index()
district_grouped.head()

Unnamed: 0,Neighborhood,Adult Boutique,American Restaurant,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Breakfast Spot,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Stop,Café,Camera Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,College Rec Center,Concert Hall,Convenience Store,Cosmetics Shop,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Electronics Store,Event Service,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,History Museum,Hobby Shop,Hockey Arena,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Laundromat,Light Rail Station,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Moroccan Restaurant,Movie Theater,Optical Shop,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Pool Hall,Pub,Racetrack,Record Shop,Recreation Center,Restaurant,River,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shopping Mall,Skating Rink,Ski Area,Ski Chalet,Smoothie Shop,Soccer Field,Social Club,South American Restaurant,Spa,Sporting Goods Shop,Stadium,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Video Game Store,Wings Joint,Yoga Studio
0,Alderwood,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.136364,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.090909,0.090909,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Centennial Park,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.107143,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.071429,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Eatonville,0.0,0.066667,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
3,Eringate,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Humber Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [24]:
district_grouped.shape

(26, 115)

#### Let's print each neighborhood along with the top 5 most common venues

In [25]:
num_top_venues = 5

for hood in district_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = district_grouped[district_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True)[:5])
    print('\n')

----Alderwood----
            venue  freq
0  Discount Store  0.14
1     Pizza Place  0.09
2        Pharmacy  0.09
3   Grocery Store  0.09
4            Pool  0.05


----Centennial Park----
                venue  freq
0      Baseball Field  0.14
1         Coffee Shop  0.11
2               Hotel  0.07
3   Fish & Chips Shop  0.04
4  College Rec Center  0.04


----Eatonville----
               venue  freq
0               Bank  0.13
1        Pizza Place  0.13
2  Recreation Center  0.07
3      Grocery Store  0.07
4     Clothing Store  0.07


----Eringate----
                venue  freq
0                 Pub  0.11
1   Convenience Store  0.11
2                Park  0.11
3  Chinese Restaurant  0.11
4          Beer Store  0.11


----Humber Bay----
             venue  freq
0            River  0.25
1  Harbor / Marina  0.25
2    Shopping Mall  0.25
3             Park  0.25
4    Movie Theater  0.00


----Humber Heights – Westmount----
                       venue  freq
0                Pizza Place  0

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = district_grouped['Neighborhood']

for ind in np.arange(district_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(district_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alderwood,Discount Store,Grocery Store,Pizza Place,Pharmacy,Gym,Donut Shop,Pub,Convenience Store,Park,Coffee Shop
1,Centennial Park,Baseball Field,Coffee Shop,Hotel,Pizza Place,Ski Area,Chinese Restaurant,College Rec Center,Record Shop,Pub,Hockey Arena
2,Eatonville,Bank,Pizza Place,Recreation Center,Convenience Store,Coffee Shop,Clothing Store,Farmers Market,Fish & Chips Shop,Mexican Restaurant,Gym
3,Eringate,Pub,Convenience Store,Pizza Place,Coffee Shop,Eastern European Restaurant,Chinese Restaurant,Electronics Store,Beer Store,Park,Fish & Chips Shop
4,Humber Bay,Harbor / Marina,River,Shopping Mall,Park,Yoga Studio,Fast Food Restaurant,Department Store,Dessert Shop,Diner,Discount Store


<a id='menu4'></a>

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [28]:
# set number of clusters
kclusters = 5

district_grouped_clustering = district_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(district_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 3, 0, 0, 2, 4, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

district_merged = df_neighbourhood

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
district_merged = district_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='neighborhood')

district_merged.head() # check the last columns!

Unnamed: 0,neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alderwood,43.601717,-79.545232,1,Discount Store,Grocery Store,Pizza Place,Pharmacy,Gym,Donut Shop,Pub,Convenience Store,Park,Coffee Shop
1,Centennial Park,43.65359,-79.588869,1,Baseball Field,Coffee Shop,Hotel,Pizza Place,Ski Area,Chinese Restaurant,College Rec Center,Record Shop,Pub,Hockey Arena
2,Eatonville,43.646284,-79.560001,1,Bank,Pizza Place,Recreation Center,Convenience Store,Coffee Shop,Clothing Store,Farmers Market,Fish & Chips Shop,Mexican Restaurant,Gym
3,The Elms,43.696998,-79.521883,1,Pizza Place,Coffee Shop,Train Station,Discount Store,Sandwich Place,Pharmacy,Chinese Restaurant,Café,Soccer Field,Laundromat
4,Eringate,43.662273,-79.576516,1,Pub,Convenience Store,Pizza Place,Coffee Shop,Eastern European Restaurant,Chinese Restaurant,Electronics Store,Beer Store,Park,Fish & Chips Shop


Finally, let's visualize the resulting clusters

In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(district_merged['latitude'], district_merged['longitude'], district_merged['neighborhood'], district_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='menu5'></a>

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Cluster 1

In [31]:
district_merged.loc[district_merged['Cluster Labels'] == 0, district_merged.columns[[0] + list(range(4, district_merged.shape[1]))]]

Unnamed: 0,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Humber Heights – Westmount,Pizza Place,Convenience Store,Café,Ice Cream Shop,Middle Eastern Restaurant,Park,Discount Store,Plaza,Restaurant,Sandwich Place
7,Humber Valley Village,Park,Pharmacy,Grocery Store,Shopping Mall,Camera Store,Café,Spa,Liquor Store,Restaurant,Bus Stop
14,Mimico,Park,Mexican Restaurant,Café,Liquor Store,Dessert Shop,Pharmacy,Convenience Store,Restaurant,Sandwich Place,Coffee Shop
15,New Toronto,Park,Pizza Place,Seafood Restaurant,Mexican Restaurant,Café,Liquor Store,Pharmacy,Fast Food Restaurant,Coffee Shop,Italian Restaurant
24,West Deane Park,Park,Sushi Restaurant,Convenience Store,Bakery,Eastern European Restaurant,Beer Store,Pharmacy,Electronics Store,Fast Food Restaurant,Farmers Market
25,Willowridge,Park,Intersection,Pizza Place,Athletics & Sports,Convenience Store,Pharmacy,Liquor Store,Sandwich Place,Coffee Shop,Beer Store


#### Cluster 2

In [32]:
district_merged.loc[district_merged['Cluster Labels'] == 1, district_merged.columns[[0] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alderwood,Grocery Store,Pizza Place,Pharmacy,Gym,Donut Shop,Pub,Convenience Store,Park,Coffee Shop
1,Centennial Park,Coffee Shop,Hotel,Pizza Place,Ski Area,Chinese Restaurant,College Rec Center,Record Shop,Pub,Hockey Arena
2,Eatonville,Pizza Place,Recreation Center,Convenience Store,Coffee Shop,Clothing Store,Farmers Market,Fish & Chips Shop,Mexican Restaurant,Gym
3,The Elms,Coffee Shop,Train Station,Discount Store,Sandwich Place,Pharmacy,Chinese Restaurant,Café,Soccer Field,Laundromat
4,Eringate,Convenience Store,Pizza Place,Coffee Shop,Eastern European Restaurant,Chinese Restaurant,Electronics Store,Beer Store,Park,Fish & Chips Shop
10,Kingsview Village,Gym,Breakfast Spot,Mobile Phone Shop,Event Service,Chinese Restaurant,Sandwich Place,Middle Eastern Restaurant,Shopping Mall,Café
11,The Kingsway,Pub,Pizza Place,Restaurant,Dessert Shop,Thai Restaurant,Breakfast Spot,Sushi Restaurant,Italian Restaurant,Pool Hall
12,Long Branch,Pharmacy,Coffee Shop,Café,Convenience Store,Sandwich Place,Restaurant,Gym,Pub,Park
16,Princess Gardens,Pharmacy,Coffee Shop,Supermarket,Sandwich Place,Shopping Mall,Café,Smoothie Shop,Beer Store,Pizza Place
17,Rexdale,Pizza Place,Coffee Shop,Convenience Store,Clothing Store,Department Store,Salon / Barbershop,Sandwich Place,Hardware Store,Bakery


#### Cluster 3

In [33]:
district_merged.loc[district_merged['Cluster Labels'] == 2, district_merged.columns[[0] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Humberwood,Grocery Store,Storage Facility,Sandwich Place,Video Game Store,Department Store,Dessert Shop,Dance Studio,Diner,Fish & Chips Shop


#### Cluster 4

In [34]:
district_merged.loc[district_merged['Cluster Labels'] == 3, district_merged.columns[[0] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Humber Bay,River,Shopping Mall,Park,Yoga Studio,Fast Food Restaurant,Department Store,Dessert Shop,Diner,Discount Store


#### Cluster 5

In [35]:
district_merged.loc[district_merged['Cluster Labels'] == 4, district_merged.columns[[0] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Islington–City Centre West,Pizza Place,American Restaurant,Park,Mexican Restaurant,Fish & Chips Shop,Yoga Studio,Flower Shop,Diner,Discount Store
13,Markland Wood,Fast Food Restaurant,Discount Store,Park,Grocery Store,Bank,Flower Shop,Baseball Field,Golf Course,Event Service
