<h1 align=left><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>


## Introduction

In this project, we will -  
1) Scrape the Wikipedia page to get the PostalCode, Borough and Neighborhood details of Toronto.  
2) Use Foursquare API to explore neighborhoods.  
3) Get the most common venue categories in each neighborhood, and then group these neighborhoods into clusters using k-means   algorithm and use the Folium library to visualize the neighborhoods and their clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#item1">Scrape and Get Postal Code Data</a>

2.  <a href="#item2">Get Location Coordinates For Each Postal Code</a>

3.  <a href="#item3">Explore and Cluster the Neighborhoods</a>

    </font>
    </div>


In [1]:
import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import requests
import json
from bs4 import BeautifulSoup

import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

from sklearn.cluster import KMeans

## 1. Scrape and Get Postal Code Data

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

response = requests.get(url)
content = response.content
data = BeautifulSoup(content, 'html.parser')

In [3]:
data.head.title.text

'List of postal codes of Canada: M - Wikipedia'

In [4]:
tables = data.find_all('table')
len(tables)

3

In [5]:
postal_codes = pd.read_html(str(tables), flavor='bs4')[0]
postal_codes.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,M1ANot assigned,M2ANot assigned,M3ANorth York(Parkwoods),M4ANorth York(Victoria Village),M5ADowntown Toronto(Regent Park / Harbourfront),M6ANorth York(Lawrence Manor / Lawrence Heights),M7AQueen's Park(Ontario Provincial Government),M8ANot assigned,M9AEtobicoke(Islington Avenue)
1,M1BScarborough(Malvern / Rouge),M2BNot assigned,M3BNorth York(Don Mills)North,M4BEast York(Parkview Hill / Woodbine Gardens),"M5BDowntown Toronto(Garden District, Ryerson)",M6BNorth York(Glencairn),M7BNot assigned,M8BNot assigned,M9BEtobicoke(West Deane Park / Princess Garden...
2,M1CScarborough(Rouge Hill / Port Union / Highl...,M2CNot assigned,M3CNorth York(Don Mills)South(Flemingdon Park),M4CEast York(Woodbine Heights),M5CDowntown Toronto(St. James Town),M6CYork(Humewood-Cedarvale),M7CNot assigned,M8CNot assigned,M9CEtobicoke(Eringate / Bloordale Gardens / Ol...
3,M1EScarborough(Guildwood / Morningside / West ...,M2ENot assigned,M3ENot assigned,M4EEast Toronto(The Beaches),M5EDowntown Toronto(Berczy Park),M6EYork(Caledonia-Fairbanks),M7ENot assigned,M8ENot assigned,M9ENot assigned
4,M1GScarborough(Woburn),M2GNot assigned,M3GNot assigned,M4GEast York(Leaside),M5GDowntown Toronto(Central Bay Street),M6GDowntown Toronto(Christie),M7GNot assigned,M8GNot assigned,M9GNot assigned


In [6]:
postal_codes = postal_codes.stack().to_frame().reset_index(drop=True)
postal_codes.head()

Unnamed: 0,0
0,M1ANot assigned
1,M2ANot assigned
2,M3ANorth York(Parkwoods)
3,M4ANorth York(Victoria Village)
4,M5ADowntown Toronto(Regent Park / Harbourfront)


In [7]:
postal_codes.columns = ['details']

In [8]:
postal_codes['PostalCode'] = postal_codes['details'].str[:3]

postal_codes['Borough'] = postal_codes['details'].str[3:].str.split('(').str[0]

postal_codes['Neighborhood'] = postal_codes['details'].str[3:].str.split('(').str[1].str.replace(')','').str.replace('/',',')

postal_codes.drop(['details'], axis=1, inplace=True)

postal_codes.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park , Harbourfront"


In [9]:
toronto_neighborhoods = postal_codes[postal_codes['Borough'] != 'Not assigned']
toronto_neighborhoods.reset_index(inplace=True, drop=True)
toronto_neighborhoods.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern , Rouge"
7,M3B,North York,Don MillsNorth
8,M4B,East York,"Parkview Hill , Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [10]:
# Checking for any null values
toronto_neighborhoods.isnull().sum()

PostalCode      0
Borough         0
Neighborhood    0
dtype: int64

In [11]:
# Checking for any rows where Neighborhood is not assigned
toronto_neighborhoods[toronto_neighborhoods['Neighborhood'].isin(['Not Assigned','Not assigned','not assigned'])]

Unnamed: 0,PostalCode,Borough,Neighborhood


In [12]:
# Checking for unexpected values in Borough
toronto_neighborhoods['Borough'].unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'East YorkEast Toronto', 'Central Toronto',
       'MississaugaCanada Post Gateway Processing Centre',
       'Downtown TorontoStn A PO Boxes25 The Esplanade',
       'EtobicokeNorthwest',
       'East TorontoBusiness reply mail Processing Centre969 Eastern'],
      dtype=object)

In [13]:
# Checking for unexpected values in Neighborhood
toronto_neighborhoods['Neighborhood'].unique()

array(['Parkwoods', 'Victoria Village', 'Regent Park , Harbourfront',
       'Lawrence Manor , Lawrence Heights',
       'Ontario Provincial Government', 'Islington Avenue',
       'Malvern , Rouge', 'Don MillsNorth',
       'Parkview Hill , Woodbine Gardens', 'Garden District, Ryerson',
       'Glencairn',
       'West Deane Park , Princess Gardens , Martin Grove , Islington , Cloverdale',
       'Rouge Hill , Port Union , Highland Creek', 'Don MillsSouth',
       'Woodbine Heights', 'St. James Town', 'Humewood-Cedarvale',
       'Eringate , Bloordale Gardens , Old Burnhamthorpe , Markland Wood',
       'Guildwood , Morningside , West Hill', 'The Beaches',
       'Berczy Park', 'Caledonia-Fairbanks', 'Woburn', 'Leaside',
       'Central Bay Street', 'Christie', 'Cedarbrae', 'Hillcrest Village',
       'Bathurst Manor , Wilson Heights , Downsview North',
       'Thorncliffe Park', 'Richmond , Adelaide , King',
       'Dufferin , Dovercourt Village', 'Scarborough Village',
       'Fairv

In [14]:
toronto_neighborhoods.shape

(103, 3)

Out dataframe is in good shape, with no 'Null', 'Not Assigned' or any other unexpected values

## 2. Get Location Coordinates for Each Postal Code

Geocoder package is not working, so using the csv file to load coordinates

In [15]:
coordinates = pd.read_csv('Geospatial_Coordinates.csv')
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
toronto_neighborhoods = pd.merge(left=toronto_neighborhoods, right=coordinates, how='left', left_on='PostalCode', right_on='Postal Code')
toronto_neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",M5A,43.65426,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",M6A,43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,M7A,43.662301,-79.389494


In [17]:
toronto_neighborhoods.drop('Postal Code', axis=1, inplace=True)

In [19]:
toronto_neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


## 3. Explore and Cluster the Neighborhoods

#### Use https://nominatim.openstreetmap.org to get the latitude and longitude coordinates of Toronto.

In [20]:
address = 'Toronto, Canada'

url = 'https://nominatim.openstreetmap.org/search?q={}&format=jsonv2'.format(address)
response = requests.get(url)
content = response.json()
content

[{'place_id': 258679753,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'relation',
  'osm_id': 324211,
  'boundingbox': ['43.5802533', '43.8554425', '-79.6392727', '-79.1132193'],
  'lat': '43.6534817',
  'lon': '-79.3839347',
  'display_name': 'Toronto, Golden Horseshoe, Ontario, Canada',
  'place_rank': 12,
  'category': 'boundary',
  'type': 'administrative',
  'importance': 0.9330149417022804,
  'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png'},
 {'place_id': 297123388,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'relation',
  'osm_id': 12030531,
  'boundingbox': ['46.4305437', '46.4662257', '-63.3982902', '-63.3671092'],
  'lat': '46.4524682',
  'lon': '-63.3799629',
  'display_name': 'Toronto, Queens County, Prince Edward Island, Canada',
  'place_rank': 16,
  'category': 'boundary',
  'type': 'administrative',
  'importanc

In [21]:
latitude = float(content[0]['lat'])
longitude = float(content[0]['lon'])
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


#### Create a map of Toronto with neighborhoods superimposed on top.

In [23]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_neighborhoods['Latitude'], toronto_neighborhoods['Longitude'], toronto_neighborhoods['Borough'], toronto_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Let's work with only boroughs that contain the word Toronto**

In [24]:
neighborhoods = toronto_neighborhoods[toronto_neighborhoods['Borough'].str.contains('Toronto')].reset_index(drop=True)
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [25]:
neighborhoods.shape

(39, 5)

Next, we will use the Foursquare API to explore the neighborhoods in above filtered boroughs and segment them.

#### Define Foursquare Credentials and Version

In [26]:
CLIENT_ID = 'PW54YAHQ35ZL0CTABP4IO0F0X0J1N4G3TRLFIBDRJ5505XRF' # Foursquare ID
CLIENT_SECRET = 'Y5VT40CIP4JGDQ43RMIQCCDHN4BR2TSGEXRFDLAFWOU5EXLQ' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # Foursquare API limit value

#### Let's get the venues in all these neighborhoods

In [27]:
cols = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']

neighborhood_venues = pd.DataFrame(columns = cols)

In [28]:
for columns, neighborhood in neighborhoods.iterrows():
    radius = 500
    neigborhood_name = neighborhood['Neighborhood']
    neighborhood_lat = neighborhood['Latitude']
    neighborhood_lon = neighborhood['Longitude']
        
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'\
    .format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_lat, neighborhood_lon, radius, LIMIT)

    results = requests.get(url).json()
    
    venues = results['response']['groups'][0]['items']
    
    for v in venues:
        neighborhood_venues = neighborhood_venues.append({'Neighborhood': neigborhood_name, 
                                              'Neighborhood Latitude': neighborhood_lat,
                                              'Neighborhood Longitude': neighborhood_lon,
                                              'Venue': v['venue']['name'],
                                              'Venue Latitude': v['venue']['location']['lat'],
                                              'Venue Longitude': v['venue']['location']['lng'],
                                              'Venue Category': v['venue']['categories'][0]['name']}, ignore_index=True)

neighborhood_venues.head()                                                                           

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park , Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,"Regent Park , Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
2,"Regent Park , Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park , Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park , Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [29]:
neighborhood_venues.shape

(1590, 7)

#### Let's find out unique neighborhoods

In [61]:
len(neighborhood_venues['Neighborhood'].unique())

39

#### Let's find out unique venue categories

In [31]:
len(neighborhood_venues['Venue Category'].unique())

228

**Let's check how many venues were returned for each neighborhood**

In [30]:
neighborhood_venues.groupby('Neighborhood')['Venue Category'].count()

Neighborhood
Berczy Park                                                                                                          58
Brockton , Parkdale Village , Exhibition Place                                                                       25
CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport     16
Central Bay Street                                                                                                   65
Christie                                                                                                             16
Church and Wellesley                                                                                                 82
Commerce Court , Victoria Hotel                                                                                     100
Davisville                                                                                                           37
Davisville North           

There are about 8 neighborhoods with less than 10 venues, so we will exclude them from clustering

#### Let's print each neighborhood along with the top 10 most common venues (where no of venues is > 10)

In [76]:
cols = ['Neighborhood'] + ['Most Common Venue '+str(i) for i in range(1,11)]

neighborhood_top_venues = pd.DataFrame(columns=cols)

unique_neighborhoods = neighborhood_venues['Neighborhood'].unique()

for neighbor in unique_neighborhoods:
    df = neighborhood_venues[neighborhood_venues['Neighborhood'] == neighbor]
    top_venues = df['Venue Category'].value_counts().sort_values(ascending=False).head(10).index.to_list()
    
    if len(top_venues) >= 10:
        
        neighborhood_top_venues = neighborhood_top_venues.append({cols[0]:neighbor,
                                                              cols[1]:top_venues[0],
                                                              cols[2]:top_venues[1],
                                                              cols[3]:top_venues[2],
                                                              cols[4]:top_venues[3],
                                                              cols[5]:top_venues[4],
                                                              cols[6]:top_venues[5],
                                                              cols[7]:top_venues[6],
                                                              cols[8]:top_venues[7],
                                                              cols[9]:top_venues[8],
                                                              cols[10]:top_venues[9]}, ignore_index=True)
        
neighborhood_top_venues.head()

Unnamed: 0,Neighborhood,Most Common Venue 1,Most Common Venue 2,Most Common Venue 3,Most Common Venue 4,Most Common Venue 5,Most Common Venue 6,Most Common Venue 7,Most Common Venue 8,Most Common Venue 9,Most Common Venue 10
0,"Regent Park , Harbourfront",Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Theater,Café,Spa,Mexican Restaurant,Performing Arts Venue
1,"Garden District, Ryerson",Coffee Shop,Clothing Store,Sandwich Place,Café,Italian Restaurant,Japanese Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Hotel,Movie Theater
2,St. James Town,Café,Coffee Shop,Beer Bar,Cocktail Bar,Restaurant,Gym,Japanese Restaurant,Cosmetics Shop,Bakery,Diner
3,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Cheese Shop,Restaurant,Pub,Farmers Market,Beer Bar,Seafood Restaurant,Pharmacy
4,Central Bay Street,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Restaurant,Japanese Restaurant,Burger Joint,Bubble Tea Shop,Salad Place,Furniture / Home Store


In [77]:
# Sort the neighborhoods
neighborhood_top_venues = neighborhood_top_venues.sort_values('Neighborhood').reset_index(drop=True)
neighborhood_top_venues.head()

Unnamed: 0,Neighborhood,Most Common Venue 1,Most Common Venue 2,Most Common Venue 3,Most Common Venue 4,Most Common Venue 5,Most Common Venue 6,Most Common Venue 7,Most Common Venue 8,Most Common Venue 9,Most Common Venue 10
0,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Cheese Shop,Restaurant,Pub,Farmers Market,Beer Bar,Seafood Restaurant,Pharmacy
1,"Brockton , Parkdale Village , Exhibition Place",Café,Breakfast Spot,Coffee Shop,Bakery,Bar,Restaurant,Intersection,Performing Arts Venue,Stadium,Burrito Place
2,"CN Tower , King and Spadina , Railway Lands , ...",Airport Service,Airport Terminal,Airport,Boat or Ferry,Coffee Shop,Airport Lounge,Bar,Harbor / Marina,Airport Food Court,Sculpture Garden
3,Central Bay Street,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Restaurant,Japanese Restaurant,Burger Joint,Bubble Tea Shop,Salad Place,Furniture / Home Store
4,Christie,Grocery Store,Café,Park,Baby Store,Athletics & Sports,Coffee Shop,Nightclub,Candy Store,Italian Restaurant,Restaurant


In [78]:
neighborhood_top_venues.shape

(31, 11)

**Now, let's cluster these neighborhoods based on venue categories**

First, we will need to filter neighborhoods from neighborhood_venues, for which top 10 veues are taken

In [79]:
top_neighborhoods_list = list(neighborhood_top_venues['Neighborhood'].unique())
top_neighborhoods_list

['Berczy Park',
 'Brockton , Parkdale Village , Exhibition Place',
 'CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport',
 'Central Bay Street',
 'Christie',
 'Church and Wellesley',
 'Commerce Court , Victoria Hotel',
 'Davisville',
 'Dufferin , Dovercourt Village',
 'Enclave of M4L',
 'Enclave of M5E',
 'First Canadian Place , Underground city',
 'Garden District, Ryerson',
 'Harbourfront East , Union Station , Toronto Islands',
 'High Park , The Junction South',
 'India Bazaar , The Beaches West',
 'Kensington Market , Chinatown , Grange Park',
 'Little Portugal , Trinity',
 'North Toronto West',
 'Parkdale , Roncesvalles',
 'Regent Park , Harbourfront',
 'Richmond , Adelaide , King',
 'Runnymede , Swansea',
 'St. James Town',
 'St. James Town , Cabbagetown',
 'Studio District',
 'Summerhill West , Rathnelly , South Hill , Forest Hill SE , Deer Park',
 'The Annex , North Midtown , Yorkville',
 'The Danforth West , Riverda

In [80]:
neighborhood_venues_filtered = neighborhood_venues[neighborhood_venues['Neighborhood'].isin(top_neighborhoods_list)].copy()

**Now, we can proceed with One Hot Encoding and Clustering**

In [81]:
# one hot encoding
venues_encoded = pd.get_dummies(neighborhood_venues_filtered[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe at first position
venues_encoded.insert(0, 'Neighborhoods', neighborhood_venues_filtered['Neighborhood'].values)

venues_encoded.head()

Unnamed: 0,Neighborhoods,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Regent Park , Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [84]:
venues_encoded.shape

(1562, 225)

#### Next, let's group rows by neighborhood and take the mean of the frequency of occurrence of each category

In [85]:
venues_encoded = venues_encoded.groupby('Neighborhoods').mean().reset_index()
venues_encoded.head()

Unnamed: 0,Neighborhoods,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.051724,0.0,0.0,0.0,0.017241,0.017241,0.0,0.034483,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.017241,0.051724,0.051724,0.0,0.0,0.0,0.0,0.017241,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.017241,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.017241,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0
1,"Brockton , Parkdale Village , Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower , King and Spadina , Railway Lands , ...",0.0625,0.0625,0.0625,0.1875,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030769,0.0,0.030769,0.0,0.0,0.0,0.046154,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.169231,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.015385,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.046154,0.030769,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.015385,0.0,0.0,0.046154,0.0,0.030769,0.0,0.046154,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.015385
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [86]:
venues_encoded.shape

(31, 225)

**Run _k_-means clustering algorithm to cluster these neighborhoods into 4 clusters, based on venue categories as found above**

In [87]:
X = venues_encoded.drop('Neighborhoods', axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=4, random_state=0)
kmeans.fit(X)

# check cluster labels generated for each row in the dataframe
set(kmeans.labels_)

{0, 1, 2, 3}

**Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood**

First, add Cluster Labels to Neighborhoods Data with Top 10 Most Common Venues

In [88]:
# add clustering labels
neighborhood_top_venues.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhood_top_venues.head()

Unnamed: 0,Cluster Labels,Neighborhood,Most Common Venue 1,Most Common Venue 2,Most Common Venue 3,Most Common Venue 4,Most Common Venue 5,Most Common Venue 6,Most Common Venue 7,Most Common Venue 8,Most Common Venue 9,Most Common Venue 10
0,1,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Cheese Shop,Restaurant,Pub,Farmers Market,Beer Bar,Seafood Restaurant,Pharmacy
1,1,"Brockton , Parkdale Village , Exhibition Place",Café,Breakfast Spot,Coffee Shop,Bakery,Bar,Restaurant,Intersection,Performing Arts Venue,Stadium,Burrito Place
2,0,"CN Tower , King and Spadina , Railway Lands , ...",Airport Service,Airport Terminal,Airport,Boat or Ferry,Coffee Shop,Airport Lounge,Bar,Harbor / Marina,Airport Food Court,Sculpture Garden
3,1,Central Bay Street,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Restaurant,Japanese Restaurant,Burger Joint,Bubble Tea Shop,Salad Place,Furniture / Home Store
4,3,Christie,Grocery Store,Café,Park,Baby Store,Athletics & Sports,Coffee Shop,Nightclub,Candy Store,Italian Restaurant,Restaurant


Now, merge above top venues dataset with original neighborhoods dataset containing Postal Codes, Boroughs, Lat-Lon information

In [89]:
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [91]:
# Filter above dataset also, so that it only has neighborhoods which we used to get top 10 venues
neighborhood_filtered = neighborhoods[neighborhoods['Neighborhood'].isin(top_neighborhoods_list)].copy()

In [92]:
neighborhood_filtered.shape

(31, 5)

In [93]:
toronto_clusters = pd.merge(left=neighborhood_filtered, right=neighborhood_top_venues, how='left', on='Neighborhood')
toronto_clusters.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,Most Common Venue 1,Most Common Venue 2,Most Common Venue 3,Most Common Venue 4,Most Common Venue 5,Most Common Venue 6,Most Common Venue 7,Most Common Venue 8,Most Common Venue 9,Most Common Venue 10
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636,1,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Theater,Café,Spa,Mexican Restaurant,Performing Arts Venue
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,Coffee Shop,Clothing Store,Sandwich Place,Café,Italian Restaurant,Japanese Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Hotel,Movie Theater
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Café,Coffee Shop,Beer Bar,Cocktail Bar,Restaurant,Gym,Japanese Restaurant,Cosmetics Shop,Bakery,Diner
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Cocktail Bar,Bakery,Cheese Shop,Restaurant,Pub,Farmers Market,Beer Bar,Seafood Restaurant,Pharmacy
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,1,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Restaurant,Japanese Restaurant,Burger Joint,Bubble Tea Shop,Salad Place,Furniture / Home Store


In [94]:
toronto_clusters.shape

(31, 16)

**Finally, let's visualize the resulting clusters**

In [99]:
# create map of manhattan
toronto_clusters_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(4)
ys = [i + x + (i*x)**2 for i in range(4)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_clusters['Latitude'], toronto_clusters['Longitude'], toronto_clusters['Neighborhood'], toronto_clusters['Cluster Labels']):
    label = folium.Popup(str(poi) + ' :Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(toronto_clusters_map)
       
toronto_clusters_map

<a id='item5'></a>


**Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster**

#### Cluster 0


In [100]:
toronto_clusters.loc[toronto_clusters['Cluster Labels'] == 0, toronto_clusters.columns[[1] + list(range(5, toronto_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,Most Common Venue 1,Most Common Venue 2,Most Common Venue 3,Most Common Venue 4,Most Common Venue 5,Most Common Venue 6,Most Common Venue 7,Most Common Venue 8,Most Common Venue 9,Most Common Venue 10
25,Downtown Toronto,0,Airport Service,Airport Terminal,Airport,Boat or Ferry,Coffee Shop,Airport Lounge,Bar,Harbor / Marina,Airport Food Court,Sculpture Garden


#### Cluster 1


In [101]:
toronto_clusters.loc[toronto_clusters['Cluster Labels'] == 1, toronto_clusters.columns[[1] + list(range(5, toronto_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,Most Common Venue 1,Most Common Venue 2,Most Common Venue 3,Most Common Venue 4,Most Common Venue 5,Most Common Venue 6,Most Common Venue 7,Most Common Venue 8,Most Common Venue 9,Most Common Venue 10
0,Downtown Toronto,1,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Theater,Café,Spa,Mexican Restaurant,Performing Arts Venue
1,Downtown Toronto,1,Coffee Shop,Clothing Store,Sandwich Place,Café,Italian Restaurant,Japanese Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Hotel,Movie Theater
2,Downtown Toronto,1,Café,Coffee Shop,Beer Bar,Cocktail Bar,Restaurant,Gym,Japanese Restaurant,Cosmetics Shop,Bakery,Diner
3,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Bakery,Cheese Shop,Restaurant,Pub,Farmers Market,Beer Bar,Seafood Restaurant,Pharmacy
4,Downtown Toronto,1,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Restaurant,Japanese Restaurant,Burger Joint,Bubble Tea Shop,Salad Place,Furniture / Home Store
6,Downtown Toronto,1,Coffee Shop,Café,Thai Restaurant,Restaurant,Clothing Store,Deli / Bodega,Gym,Hotel,Bookstore,Concert Hall
8,Downtown Toronto,1,Coffee Shop,Café,Scenic Lookout,Pizza Place,Hotel,Aquarium,Restaurant,Sandwich Place,Steakhouse,Bank
10,East Toronto,1,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Ice Cream Shop,American Restaurant,Tibetan Restaurant,Trail,Juice Bar,Caribbean Restaurant
11,Downtown Toronto,1,Coffee Shop,Hotel,Café,Restaurant,Bakery,Japanese Restaurant,Salad Place,Breakfast Spot,Asian Restaurant,Seafood Restaurant
12,West Toronto,1,Café,Breakfast Spot,Coffee Shop,Bakery,Bar,Restaurant,Intersection,Performing Arts Venue,Stadium,Burrito Place


#### Cluster 2


In [102]:
toronto_clusters.loc[toronto_clusters['Cluster Labels'] == 2, toronto_clusters.columns[[1] + list(range(5, toronto_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,Most Common Venue 1,Most Common Venue 2,Most Common Venue 3,Most Common Venue 4,Most Common Venue 5,Most Common Venue 6,Most Common Venue 7,Most Common Venue 8,Most Common Venue 9,Most Common Venue 10
7,West Toronto,2,Pharmacy,Bakery,Bus Stop,Supermarket,Park,Brewery,Bank,Pizza Place,Music Venue,Grocery Store
9,West Toronto,2,Bar,Café,Coffee Shop,Asian Restaurant,Men's Store,Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Japanese Restaurant,Yoga Studio
13,East Toronto,2,Fast Food Restaurant,Pub,Pizza Place,Gym,Park,Sushi Restaurant,Brewery,Burrito Place,Fish & Chips Shop,Liquor Store
16,West Toronto,2,Mexican Restaurant,Thai Restaurant,Café,Bakery,Fast Food Restaurant,Park,Furniture / Home Store,Flea Market,Speakeasy,Antique Shop
18,Central Toronto,2,Sandwich Place,Café,Coffee Shop,Park,Pharmacy,Burger Joint,Pizza Place,Donut Shop,BBQ Joint,Middle Eastern Restaurant
20,Central Toronto,2,Sandwich Place,Dessert Shop,Thai Restaurant,Sushi Restaurant,Coffee Shop,Pizza Place,Gym,Italian Restaurant,Café,American Restaurant
21,Downtown Toronto,2,Café,Bar,Bookstore,Japanese Restaurant,Bakery,Nightclub,College Arts Building,Restaurant,Poutine Place,Sushi Restaurant
22,West Toronto,2,Café,Pizza Place,Sushi Restaurant,Pub,Coffee Shop,Italian Restaurant,Bank,Bookstore,Gym,Comic Shop
23,Downtown Toronto,2,Bar,Café,Vietnamese Restaurant,Bakery,Vegetarian / Vegan Restaurant,Coffee Shop,Caribbean Restaurant,Mexican Restaurant,Gaming Cafe,Burger Joint
30,East TorontoBusiness reply mail Processing Cen...,2,Light Rail Station,Park,Comic Shop,Burrito Place,Brewery,Garden Center,Butcher,Farmers Market,Yoga Studio,Auto Workshop


#### Cluster 3

In [103]:
toronto_clusters.loc[toronto_clusters['Cluster Labels'] == 3, toronto_clusters.columns[[1] + list(range(5, toronto_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,Most Common Venue 1,Most Common Venue 2,Most Common Venue 3,Most Common Venue 4,Most Common Venue 5,Most Common Venue 6,Most Common Venue 7,Most Common Venue 8,Most Common Venue 9,Most Common Venue 10
5,Downtown Toronto,3,Grocery Store,Café,Park,Baby Store,Athletics & Sports,Coffee Shop,Nightclub,Candy Store,Italian Restaurant,Restaurant


**Observations**  
Cluster 0 is the Toronto Airport. Makes sense to cluster it separately as it does not relate to general commercial areas.    
Cluster 1 is mostly Downtown Toronto with Cafes and Restaturants. This seems the best commercial area with most of hotspots.  
Cluster 2 is mostly mix of West, East, Central Toronto, with Fast Food Joints and Restaurants.  
Cluster 3 is just one neighborhood in Downtown Toronto, consisting of a large grocery stroe and some other general shops, not see in other neighborhoods.