# A Consumer Business Comparison of New York and Toronto Based on the Top Venues of Their Key Boroughs

## Abstract

The consumer business in New York and Toronto are studied and compared by analyzing the top venues of their key boroughs---Manhattan and Toronto borough.  The venue data are collected from Foursquare based on which the neighborhoods are clustered and the distinguishing categories of the venues are identified.  The analysis reflects a more developed and well-rounded consumer business in New York suggesting a better opportunity of starting up or expanding a consumer business in Toronto.  The insight obtained from this study should be useful to consumer business managers and/or investors exploring business opportunities in one of these two cities.

## Methodology

The consumer business of the key boroughs of New York and Toronto are depicted using the venue location data retrieved from Foursquare.  The key boroughs chosen are Manhattan and Toronto borough respectively, the latter including east Toronto, west Toronto, central Toronto and downtown Toronto.  The choice of borough is partly based on the business districts, financial status and population of the borough compared to the other boroughs of the cities and partly for the sake of comparison---each borough contains roughly the same number of neighborhoods, 40 for Manhattan borough and 38 for Toronto borough.  To avoid verbose exposition in the following the word borough will be dropped and the two boroughs are simply referred to as Manhattan and Toronto (the latter should not be confused with Toronto as a city).

First a list of top 100 venues within 500 meters of the neighborhood's geographic coordinates (i.e.\ latitude and longitude) are collected from Foursquare for each neighborhood.  The rankings are based on the popularity of the venues among Foursquare users.  Giving that Foursquare is among the top three location data providers hosting over 60 million users the bias should be minimal.  Moreover the number of venues considered is 100 which should be sufficient to portraying the consumer business in the neighborhood.  The radius of the search is chosen according to the typical size of a neighborhood.  Therefore the choice of 500 meters should be appropriate to not only include the vast majority of the consumer business within a particular neighborhood but also prevent the analysis from being affected by the consumer business of nearby neighborhoods.

After the top 100 venues are collected for each neighborhood within a distance of 500 meters the categories of the venues are retrieved and consolidated to reveal the richness of the consumer business of New York and Toronto.  Then the top 10 categories of venues are identified for each neighborhood based on the mean of the frequency of their occurrence.  The purpose of the identification is to characterize the consumer business of the neighborhoods and group neighborhoods with similar characteristics.  The neighborhood clustering is determined using the k-means clustering algorithm where the coordinates of the neighborhoods are the mean of the occurrence frequency of the various venue categories.  Finally the neighborhoods are and plotted in the maps of the two cities and labeled with their cluster labels.  The distributions of the various neighborhood clusters of New York and Toronto are compared to reflect the distribution of the types of consumer business and the business opportunities are assessed.

## Preparation for Analysis

#### Import necessary libraries and modules

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image
from IPython.core.display import HTML

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    cryptography-2.4.2         |   py36h1ba5d50_0         618 KB
    openssl-1.1.1a             |    h14c3975_1000         4.0 MB  conda-forge
    libarchive-3.3.3           |       h5d8350f_5         1.5 MB
    grpcio-1.16.1              |   py36hf8bcb03_1         1.1 MB
    geopy-1.18.1               |             py_0          51 KB  conda-forge
    conda-4.6.2                |           py36_0         869 KB  conda-forge
    libssh2-1.8.0              |                1         239 KB  conda-forge
    python-3.6.8               |       h0371630_0        34.4 MB
    ------------------------------------------------------------
      

#### Define custom functions for analysis

In [2]:
# get nearby venues via query to Foursquare
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [3]:
# return the most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

#### Specify credentials to send query to Foursquare

In [4]:
CLIENT_ID = 'M1TDCQ3JGPKQ3UFHSISGBL3WUSH5PA5D125LMKBESYRYFCAW' # your Foursquare ID
CLIENT_SECRET = 'YL2X5EZYLFGFYEN4GXMUJ3DRYGQ0B32TYDDC4BI0UD2YYY0M' # your Foursquare Secret
VERSION = '20190203'
LIMIT = 100
radius = 500

#### Specify clustering parameters and colors for cluster plotting

In [5]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
printcolumns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        printcolumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        printcolumns.append('{}th Most Common Venue'.format(ind+1))

In [6]:
# set number of clusters
kclusters = 5
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# rainbow = purple, blue, green, orange, red

### Manhattan Neighborhood Analysis 

#### Gather geographic data of New York

In [7]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

In [8]:
with open('newyork_data.json') as newyork_json_data:
    newyork_data = json.load(newyork_json_data)
neighborhoods_data = newyork_data['features']

In [9]:
# instantiate the dataframe
newyork_nbhd = pd.DataFrame(['Borough', 'Neighborhood', 'Latitude', 'Longitude'])
for data in neighborhoods_data:
    borough = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
    
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    newyork_nbhd = newyork_nbhd.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [10]:
address = 'New York City, NY'
geolocator = Nominatim(user_agent="nyc_explorer")
location = geolocator.geocode(address)
newyork_lat = location.latitude
newyork_lng = location.longitude

#### Get the top 100 venues within 500 meters for each neighborhood and extract their categories

In [11]:
manhattan_data = newyork_nbhd[newyork_nbhd['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [12]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 331 uniques categories.


In [13]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood']
# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]
# normalize using the mean of the occurrence frequency of the venue categories
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

#### List the top 10 venue categories for each neighborhood

In [14]:
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=printcolumns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

#### Cluster neighborhoods based on the mean frequency of occurrence of the venue categories

In [15]:
manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4, 4,
       1, 0, 3, 0, 0, 0, 4, 0, 2, 0, 0, 4, 0, 0, 0, 4, 0, 0], dtype=int32)

In [16]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
manhattan_merged = manhattan_data
# add neighborhood names
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

In [17]:
# create map
map_clusters = folium.Map(location=[newyork_lat, newyork_lng], zoom_start=11)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

#### List the clusters

In [18]:
# cluster 1
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Manhattan,0,Chinese Restaurant,Bubble Tea Shop,American Restaurant,Cocktail Bar,Vietnamese Restaurant,Dim Sum Restaurant,Hotpot Restaurant,Salon / Barbershop,Noodle House,Bakery
6,Manhattan,0,African Restaurant,Chinese Restaurant,American Restaurant,Fried Chicken Joint,French Restaurant,Cosmetics Shop,Seafood Restaurant,Gym / Fitness Center,Gym,Bar
8,Manhattan,0,Italian Restaurant,Exhibit,Art Gallery,Coffee Shop,Bakery,Juice Bar,Hotel,French Restaurant,Boutique,Gym / Fitness Center
9,Manhattan,0,Gym,Bar,Italian Restaurant,Coffee Shop,Pizza Place,Japanese Restaurant,Mexican Restaurant,Sushi Restaurant,Deli / Bodega,Diner
10,Manhattan,0,Italian Restaurant,Coffee Shop,Sushi Restaurant,Gym / Fitness Center,Pizza Place,Burger Joint,Gym,Sporting Goods Shop,Bakery,Thai Restaurant
12,Manhattan,0,Italian Restaurant,Bar,Coffee Shop,Wine Bar,Bakery,Burger Joint,Indian Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant,Gym / Fitness Center
13,Manhattan,0,Theater,Gym / Fitness Center,Concert Hall,Plaza,Italian Restaurant,Café,French Restaurant,Performing Arts Venue,Park,Opera House
14,Manhattan,0,Theater,Coffee Shop,American Restaurant,Italian Restaurant,Gym / Fitness Center,Wine Shop,Gym,Hotel,Spa,Sandwich Place
15,Manhattan,0,Hotel,Food Truck,Coffee Shop,Steakhouse,Theater,Spa,Sporting Goods Shop,Bakery,Clothing Store,Cocktail Bar
16,Manhattan,0,Coffee Shop,Hotel,Spa,Italian Restaurant,Gym,Bar,Japanese Restaurant,French Restaurant,Sandwich Place,Salon / Barbershop


In [19]:
# cluster 2
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,1,Discount Store,Coffee Shop,Yoga Studio,Deli / Bodega,Supplement Shop,Steakhouse,Shopping Mall,Shoe Store,Seafood Restaurant,Sandwich Place


In [20]:
# cluster 3
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Manhattan,2,Bar,Boat or Ferry,Park,Playground,Basketball Court,Pet Service,Coffee Shop,Cocktail Bar,German Restaurant,Harbor / Marina


In [21]:
# cluster 4
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Manhattan,3,Korean Restaurant,Cosmetics Shop,Hotel,Hotel Bar,Japanese Restaurant,Coffee Shop,Bakery,Cocktail Bar,Boutique,Gym / Fitness Center


In [22]:
# cluster 5
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Manhattan,4,Café,Bakery,Mobile Phone Shop,Deli / Bodega,Sandwich Place,Spanish Restaurant,Mexican Restaurant,Grocery Store,Gym,Supermarket
3,Manhattan,4,Mexican Restaurant,Café,Lounge,Pizza Place,Wine Bar,Bakery,Deli / Bodega,Park,Restaurant,Frozen Yogurt Shop
4,Manhattan,4,Mexican Restaurant,Coffee Shop,Café,Deli / Bodega,Pizza Place,Liquor Store,Cocktail Bar,Sandwich Place,School,Chinese Restaurant
5,Manhattan,4,Deli / Bodega,Italian Restaurant,Mexican Restaurant,Seafood Restaurant,Japanese Curry Restaurant,Sushi Restaurant,Liquor Store,Beer Garden,Falafel Restaurant,Other Nightlife
7,Manhattan,4,Mexican Restaurant,Bakery,Deli / Bodega,Latin American Restaurant,Thai Restaurant,Spa,Café,Taco Place,Chinese Restaurant,Street Art
11,Manhattan,4,Sandwich Place,Deli / Bodega,Greek Restaurant,Train,Coffee Shop,Scenic Lookout,School,Gym,Park,Residential Building (Apartment / Condo)
25,Manhattan,4,Pizza Place,Coffee Shop,Yoga Studio,Spa,Szechuan Restaurant,Mexican Restaurant,Thai Restaurant,Italian Restaurant,Deli / Bodega,Bar
36,Manhattan,4,Mexican Restaurant,Park,Café,Pizza Place,Greek Restaurant,Deli / Bodega,Diner,Hotel,Dog Run,Spa


### Toronto Neighborhood Analysis

#### Gather geographic data of Toronto

In [23]:
%run ./WikiTableScrape.ipynb
%run ./GeoCoord.ipynb
df = pd.read_csv('GeoPCodeTable.csv')

['PostalCode', 'Borough', 'Neighbourhood']
--2019-02-04 03:49:26--  http://cocl.us/Geospatial_data/Geospatial_Coordinates.csv
Resolving cocl.us (cocl.us)... 169.48.113.201
Connecting to cocl.us (cocl.us)|169.48.113.201|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv [following]
--2019-02-04 03:49:26--  https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv
Connecting to cocl.us (cocl.us)|169.48.113.201|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-02-04 03:49:26--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.27.197, 107.152.26.197
Connecting to ibm.box.com (ibm.box.com)|107.152.27.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: htt

In [24]:
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
toronto_lat = location.latitude
toronto_lng = location.longitude

#### Get the top 100 venues within 500 meters for each neighborhood of Toronto and extract their categories

In [25]:
toronto_data = df[df['Borough'].str.contains("Toronto")]
toronto_data = toronto_data.reset_index(drop=True)
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )
toronto_venues.rename(columns={'Neighborhood': 'Neighbourhood'}, inplace=True)

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

In [26]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 236 uniques categories.


#### List neighborhoods based on venue categories

In [27]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood']
# move neighborhood column to the first column
fixed_column = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_column]
# normalize using the mean of the occurrence frequency of the venue categories
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()

#### List the top 10 venue categories for each neighborhood

In [28]:
# create a new dataframe
neighborhood_venues_sorted = pd.DataFrame(columns=printcolumns)
neighborhood_venues_sorted.rename(columns={'Neighborhood': 'Neighbourhood'}, inplace=True)
neighborhood_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']
for ind in np.arange(toronto_grouped.shape[0]):
    neighborhood_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

#### Cluster neighborhoods based on the mean frequency of occurrence of the venue categories

In [29]:
toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
kmeans.labels_

array([2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2,
       1, 2, 0, 2, 2, 0, 4, 2, 2, 2, 2, 2, 2, 2, 1, 2], dtype=int32)

In [30]:
# add clustering labels
neighborhood_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_merged = toronto_data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhood_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

In [31]:
# create map
map_clusters = folium.Map(location=[toronto_lat, toronto_lng], zoom_start=11)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### List the clusters

In [32]:
# cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Toronto,0,Playground,Park,Restaurant,Yoga Studio,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
10,Downtown Toronto,0,Park,Playground,Trail,Yoga Studio,Diner,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store


In [33]:
# cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,East Toronto,1,Sandwich Place,Park,Brewery,Steakhouse,Italian Restaurant,Food & Drink Shop,Fish & Chips Shop,Fast Food Restaurant,Liquor Store,Pet Store
4,Central Toronto,1,Bus Line,Park,Lake,Dim Sum Restaurant,Swim School,Yoga Studio,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
37,East Toronto,1,Light Rail Station,Yoga Studio,Garden,Pizza Place,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park,Farmers Market


In [34]:
# cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,2,Neighborhood,Coffee Shop,Pub,Dance Studio,Discount Store,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
1,East Toronto,2,Greek Restaurant,Coffee Shop,Ice Cream Shop,Bookstore,Italian Restaurant,Grocery Store,Brewery,Bubble Tea Shop,Restaurant,Caribbean Restaurant
3,East Toronto,2,Café,Coffee Shop,Yoga Studio,American Restaurant,Bakery,Italian Restaurant,Convenience Store,Coworking Space,Juice Bar,New American Restaurant
5,Central Toronto,2,Hotel,Gym,Park,Breakfast Spot,Sandwich Place,Restaurant,Food & Drink Shop,Burger Joint,Yoga Studio,Dumpling Restaurant
6,Central Toronto,2,Coffee Shop,Clothing Store,Sporting Goods Shop,Yoga Studio,Bagel Shop,Cosmetics Shop,Gym / Fitness Center,Chinese Restaurant,Dessert Shop,Diner
7,Central Toronto,2,Dessert Shop,Sandwich Place,Pizza Place,Coffee Shop,Restaurant,Café,Pharmacy,Seafood Restaurant,Sushi Restaurant,Italian Restaurant
9,Central Toronto,2,Coffee Shop,Pub,American Restaurant,Sushi Restaurant,Fried Chicken Joint,Bagel Shop,Sports Bar,Supermarket,Pizza Place,Light Rail Station
11,Downtown Toronto,2,Restaurant,Coffee Shop,Park,Bakery,Pizza Place,Café,Pub,Italian Restaurant,Market,Beer Store
12,Downtown Toronto,2,Japanese Restaurant,Sushi Restaurant,Coffee Shop,Gay Bar,Restaurant,Burger Joint,Gastropub,Café,Fast Food Restaurant,Men's Store
13,Downtown Toronto,2,Coffee Shop,Pub,Park,Café,Bakery,Breakfast Spot,Restaurant,Theater,Mexican Restaurant,Shoe Store


In [35]:
# cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,3,Trail,Jewelry Store,Sushi Restaurant,Bus Line,Yoga Studio,Dog Run,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


In [36]:
# cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


#### Compare the venue categories of Manhattan and Toronto

In [37]:
to = toronto_venues['Venue Category'].unique().tolist()
to = set(to)

In [38]:
ma = manhattan_venues['Venue Category'].unique().tolist()
ma = set(ma)

In [39]:
print(to)

{'Butcher', 'Sporting Goods Shop', 'Creperie', 'Yoga Studio', 'Afghan Restaurant', 'Fountain', 'Recording Studio', 'Bistro', 'Skating Rink', 'Sculpture Garden', 'Wings Joint', 'Thai Restaurant', 'Pool', 'General Travel', 'Harbor / Marina', 'Dessert Shop', 'Pet Store', 'Breakfast Spot', 'Tanning Salon', 'Pharmacy', 'Deli / Bodega', 'Trail', 'Building', 'Salad Place', 'Plane', 'Mexican Restaurant', 'Strip Club', 'Irish Pub', 'Airport Lounge', 'Pub', 'Toy / Game Store', 'Fried Chicken Joint', 'Art Museum', 'Train Station', 'Food & Drink Shop', 'Airport Food Court', 'Burger Joint', 'Airport Service', 'Discount Store', 'Cajun / Creole Restaurant', 'Bubble Tea Shop', 'Doner Restaurant', 'Baby Store', 'Intersection', 'Dance Studio', 'Swim School', 'Food Truck', 'Beer Store', 'Scenic Lookout', 'Italian Restaurant', 'Playground', 'Arts & Crafts Store', 'Monument / Landmark', 'Garden Center', 'Jewish Restaurant', 'Vietnamese Restaurant', 'Taiwanese Restaurant', 'Brazilian Restaurant', 'Boat or F

In [40]:
# common venue categories of Manhattan and Toronto
toma = to.intersection(ma)

In [41]:
len(toma)

195

In [42]:
# venue categories of Toronto that is not found in Manhattan
tomiss = to - toma
print(tomiss)

{'Recording Studio', 'College Arts Building', 'Fish & Chips Shop', 'Hospital', 'Stationery Store', 'Baseball Stadium', 'General Travel', 'Fruit & Vegetable Store', 'Comfort Food Restaurant', 'Airport Terminal', 'Tanning Salon', 'Aquarium', 'Lake', 'Plane', 'Smoothie Shop', 'Airport Gate', 'Light Rail Station', 'Costume Shop', 'Neighborhood', 'Airport Lounge', 'Train Station', 'Airport Food Court', 'College Rec Center', 'Poutine Place', 'Basketball Stadium', 'Airport Service', 'Stadium', 'Cajun / Creole Restaurant', 'Doner Restaurant', 'Comic Shop', 'Intersection', 'Swim School', 'Food', 'Airport', 'Mac & Cheese Joint', 'Brewery', 'Home Service', 'Church', 'College Gym', 'Other Great Outdoors', 'Beach'}


In [43]:
# venue categories of Manhattan that is not found in Toronto
mamiss = ma - toma
print(mamiss)

{'Arcade', 'Sports Club', 'Pet Service', 'Szechuan Restaurant', 'Newsstand', 'Library', 'Bus Station', 'Pet Café', 'Exhibit', 'Lebanese Restaurant', 'Himalayan Restaurant', 'Australian Restaurant', 'Russian Restaurant', 'Train', 'Comedy Club', 'Venezuelan Restaurant', 'Video Store', 'Laundry Service', 'Spanish Restaurant', 'Veterinarian', 'General College & University', 'Rest Area', 'Tennis Court', 'Hardware Store', 'Temple', 'Boxing Gym', 'Camera Store', 'Stables', 'Caucasian Restaurant', 'Hot Dog Joint', 'Club House', 'Tennis Stadium', 'Pilates Studio', 'Scandinavian Restaurant', 'Board Shop', 'Design Studio', 'Peruvian Restaurant', 'Christmas Market', 'Resort', 'Swiss Restaurant', 'Other Nightlife', 'Austrian Restaurant', 'Animal Shelter', 'Bridal Shop', 'Massage Studio', 'Tech Startup', 'Paper / Office Supplies Store', 'Hawaiian Restaurant', 'Rock Club', 'Social Club', 'Non-Profit', 'Public Art', 'Basketball Court', 'College Bookstore', 'Moroccan Restaurant', 'Turkish Restaurant', 