# Capstone Project: The Battle of Neighborhoods

This Notebook is used for the Applied Data Science Capstone Project on Coursera.

## -- Week 1 Submission --

### I. Introduction

In the past decade, Canada has been attracting huge numbers of migrants all over the globe due to its high Global Liveability Index, an assessment by the Economist Intelligence Unit of various cities based on various factors such as stability, healthcare, culture and environment, education and infrastructure (source: https://www.eiu.com/topic/liveability). In fact, 3 Canadian cities made it to the top 10 list: Toronto, Vancouver and Calgary (source: https://www.businessinsider.com/most-livable-cities-in-the-world-2018-8).

Immigrants form a large part of the Canadian population. From Canada's 2016 Census, 7.5 million people migrated to Canada, representing 1 in 5 of people in Canada. The Filipino population in Canada has been steadily growing since 1996, and more than 188,805 Filipino immigrants have been recorded in the 2016 census (source: https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2017028-eng.htm). Filipinos make up the third-largest Asian group in Canada and is the 7th largest ethnic group in the country (source: https://en.wikipedia.org/wiki/Ethnic_origins_of_people_in_Canada#Evolution_from_1996_to_2016). Filipinos are spread out in Canada, but the largest communities are concentrated in Toronto and Vancouver (https://en.wikipedia.org/wiki/Filipino_Canadians).

This report aims to explore the cities of Toronto and Vancouver, both listed as top liveable cities and top immigration destinations for Filipinos. As each city is composed of several neighborhoods, the goal of this report is to assist future Filipino immigrants in choosing a place of residence or neighborhood considering the (a) availability of Filipino stores, and (b) variety and accessibility to essential stores. Furthermore, Canadians relocating to/from the two cities would find it interesting to discover what kinds of stores or amenities are available in their prospective neighborhood.

### II. Data

This report will utilize venue data from the Foursquare API. Given this, it is important to note that the analysis will only factor in the venues listed in Foursquare, and this report may not be as "wholistic" as desired, as there are other factors in choosing a neighborhood, such as work opportunities and transportation, that are outside the scope of available data.

To map out the neighborhoods, geographic locations will be obtained from the GeoPy library and mapped out using Folium.

#### A. Toronto Data

#### A.1. Toronto Neighborhood Dataframe

In [1]:
import pandas as pd
import numpy as np

In [2]:
df_toronto = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df_toronto

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern, Rouge"


In [3]:
df_toronto = df_toronto.set_index('Borough')
df_toronto.head()

Unnamed: 0_level_0,Postal Code,Neighborhood
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
Not assigned,M1A,
Not assigned,M2A,
North York,M3A,Parkwoods
North York,M4A,Victoria Village
Downtown Toronto,M5A,"Regent Park, Harbourfront"


In [4]:
df_toronto.drop(index='Not assigned', inplace=True)
df_toronto.head()

Unnamed: 0_level_0,Postal Code,Neighborhood
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
North York,M3A,Parkwoods
North York,M4A,Victoria Village
Downtown Toronto,M5A,"Regent Park, Harbourfront"
North York,M6A,"Lawrence Manor, Lawrence Heights"
Downtown Toronto,M7A,"Queen's Park, Ontario Provincial Government"


In [5]:
df_toronto.reset_index(inplace = True)
df_toronto.head()

Unnamed: 0,Borough,Postal Code,Neighborhood
0,North York,M3A,Parkwoods
1,North York,M4A,Victoria Village
2,Downtown Toronto,M5A,"Regent Park, Harbourfront"
3,North York,M6A,"Lawrence Manor, Lawrence Heights"
4,Downtown Toronto,M7A,"Queen's Park, Ontario Provincial Government"


In [6]:
df_toronto['Neighborhood'] = df_toronto['Neighborhood'].astype(str)
df_toronto['Neighborhood'] = df_toronto['Neighborhood'].str.replace(' /', ',')
df_toronto.head()

Unnamed: 0,Borough,Postal Code,Neighborhood
0,North York,M3A,Parkwoods
1,North York,M4A,Victoria Village
2,Downtown Toronto,M5A,"Regent Park, Harbourfront"
3,North York,M6A,"Lawrence Manor, Lawrence Heights"
4,Downtown Toronto,M7A,"Queen's Park, Ontario Provincial Government"


In [7]:
df_toronto = df_toronto[['Postal Code', 'Borough', 'Neighborhood']]

In [8]:
df_geodata = pd.read_csv('http://cocl.us/Geospatial_data')
df_geodata

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [9]:
df_geodata = df_geodata.set_index('Postal Code')
df_geodata.head()

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


In [10]:
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


##### Combining the list of Toronto neighborhoods and their respective geographical locations:

In [11]:
df_torontogeodata = df_toronto.join(df_geodata, how='outer', on='Postal Code')
df_torontogeodata

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [12]:
df_torontogeodata[['Neighborhood', 'Latitude', 'Longitude']]

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Parkwoods,43.753259,-79.329656
1,Victoria Village,43.725882,-79.315572
2,"Regent Park, Harbourfront",43.654260,-79.360636
3,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,Islington Avenue,43.667856,-79.532242
6,"Malvern, Rouge",43.806686,-79.194353
7,Don Mills,43.745906,-79.352188
8,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,"Garden District, Ryerson",43.657162,-79.378937


#### A.2. Toronto Neighborhood Map

In [13]:
!conda install -c conda-forge folium=0.5.0 --yes 
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                       

In [14]:
map_toronto = folium.Map(location=[43.651070, -79.347015], zoom_start=12)
map_toronto

In [15]:
for lat, lng, borough, neighborhood in zip(df_torontogeodata['Latitude'], df_torontogeodata['Longitude'], df_torontogeodata['Borough'], df_torontogeodata['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto

#### A.3. Toronto Venues from Foursquare

In [16]:
CLIENT_ID = '3KUESZ32SIS1FHTZTDDEAPJUB5XJOA3JD1FWZ2AJHDL5K0IT' # your Foursquare ID
CLIENT_SECRET = 'SR1ZQ1UTULNY3D45Q0K2K4P0ID2H2JRYVHVJGU3XT3RF1BU1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3KUESZ32SIS1FHTZTDDEAPJUB5XJOA3JD1FWZ2AJHDL5K0IT
CLIENT_SECRET:SR1ZQ1UTULNY3D45Q0K2K4P0ID2H2JRYVHVJGU3XT3RF1BU1


In [17]:
df_torontogeodata.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [18]:
df_torontogeodata.loc[0, 'Neighborhood']

'Parkwoods'

In [19]:
neighborhood_latitude = df_torontogeodata.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_torontogeodata.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_torontogeodata.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


In [20]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=3KUESZ32SIS1FHTZTDDEAPJUB5XJOA3JD1FWZ2AJHDL5K0IT&client_secret=SR1ZQ1UTULNY3D45Q0K2K4P0ID2H2JRYVHVJGU3XT3RF1BU1&v=20180605&ll=43.7532586,-79.3296565&radius=500&limit=100'

In [21]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [22]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eb61ce614a126001b99640a'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

In [23]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [25]:
df_torontovenues = getNearbyVenues(names=df_torontogeodata['Neighborhood'],
                                   latitudes=df_torontogeodata['Latitude'],
                                   longitudes=df_torontogeodata['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [79]:
df_torontovenues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [27]:
df_torontovenues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",10,10,10,10,10,10
"Bathurst Manor, Wilson Heights, Downsview North",19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",22,22,22,22,22,22
Berczy Park,57,57,57,57,57,57
"Birch Cliff, Cliffside West",4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",22,22,22,22,22,22
Business reply mail Processing Centre,18,18,18,18,18,18
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16


In [28]:
print('There are {} uniques categories.'.format(len(df_torontovenues['Venue Category'].unique())))

There are 266 uniques categories.


In [29]:
# one hot encoding
toronto_onehot = pd.get_dummies(df_torontovenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = df_torontovenues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### B. Vancouver Data

#### B.1. Vancouver Neighborhood Data

In [30]:
neighborhood_list = ['Arbutus Ridge, Vancouver',
'Downtown, Vancouver',
'Dunbar-Southlands, Vancouver',
'Fairview, Vancouver',
'Grandview-Woodland, Vancouver',
'Hastings-Sunrise, Vancouver',
'Kensington-Cedar Cottage, Vancouver',
'Kerrisdale, Vancouver',
'Killarney, Vancouver',
'Kitsilano, Vancouver',
'Marpole, Vancouver',
'Mount Pleasant, Vancouver',
'Oakridge, Vancouver',
'Renfrew-Collingwood, Vancouver',
'Riley Park, Vancouver',
'Shaughnessy, Vancouver',
'South Cambie, Vancouver',
'Strathcona, Vancouver',
'Sunset, Vancouver',
'Victoria-Fraserview, Vancouver',
'West End, Vancouver',
'West Point Grey, Vancouver']

(source: https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Vancouver#Official_Neighbourhoods) 

In [31]:
df_vancouver = pd.DataFrame(neighborhood_list, columns=['Neighborhood'])
df_vancouver

Unnamed: 0,Neighborhood
0,"Arbutus Ridge, Vancouver"
1,"Downtown, Vancouver"
2,"Dunbar-Southlands, Vancouver"
3,"Fairview, Vancouver"
4,"Grandview-Woodland, Vancouver"
5,"Hastings-Sunrise, Vancouver"
6,"Kensington-Cedar Cottage, Vancouver"
7,"Kerrisdale, Vancouver"
8,"Killarney, Vancouver"
9,"Kitsilano, Vancouver"


In [32]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="explorer")

from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
df_vancouver['Location'] = df_vancouver['Neighborhood'].apply(geocode)

df_vancouver['Point'] = df_vancouver['Location'].apply(lambda x: (x.latitude, x.longitude))
df_vancouver

Unnamed: 0,Neighborhood,Location,Point
0,"Arbutus Ridge, Vancouver","(Arbutus Ridge, Vancouver, Metro Vancouver Reg...","(49.2409677, -123.1670008)"
1,"Downtown, Vancouver","(Downtown, Vancouver, Metro Vancouver Regional...","(49.283393, -123.1174563)"
2,"Dunbar-Southlands, Vancouver","(Dunbar-Southlands, Vancouver, Metro Vancouver...","(49.2534601, -123.1850439)"
3,"Fairview, Vancouver","(Fairview, Vancouver, Metro Vancouver Regional...","(49.2641128, -123.1268352)"
4,"Grandview-Woodland, Vancouver","(Grandview-Woodland, Vancouver, Metro Vancouve...","(49.2705588, -123.0679417)"
5,"Hastings-Sunrise, Vancouver","(Hastings-Sunrise, Vancouver, Metro Vancouver ...","(49.2775935, -123.0439199)"
6,"Kensington-Cedar Cottage, Vancouver","(Kensington-Cedar Cottage, Vancouver, Metro Va...","(49.2476321, -123.0842067)"
7,"Kerrisdale, Vancouver","(Kerrisdale, Vancouver, Metro Vancouver Region...","(49.2346728, -123.1553893)"
8,"Killarney, Vancouver","(Killarney, Vancouver, Metro Vancouver Regiona...","(49.2242738, -123.0462504)"
9,"Kitsilano, Vancouver","(Kitsilano, Vancouver, Metro Vancouver Regiona...","(49.2694099, -123.155267)"


In [33]:
df_vancouver[['Latitude', 'Longitude']] = df_vancouver['Point'].apply(pd.Series)
df_vancouver

Unnamed: 0,Neighborhood,Location,Point,Latitude,Longitude
0,"Arbutus Ridge, Vancouver","(Arbutus Ridge, Vancouver, Metro Vancouver Reg...","(49.2409677, -123.1670008)",49.240968,-123.167001
1,"Downtown, Vancouver","(Downtown, Vancouver, Metro Vancouver Regional...","(49.283393, -123.1174563)",49.283393,-123.117456
2,"Dunbar-Southlands, Vancouver","(Dunbar-Southlands, Vancouver, Metro Vancouver...","(49.2534601, -123.1850439)",49.25346,-123.185044
3,"Fairview, Vancouver","(Fairview, Vancouver, Metro Vancouver Regional...","(49.2641128, -123.1268352)",49.264113,-123.126835
4,"Grandview-Woodland, Vancouver","(Grandview-Woodland, Vancouver, Metro Vancouve...","(49.2705588, -123.0679417)",49.270559,-123.067942
5,"Hastings-Sunrise, Vancouver","(Hastings-Sunrise, Vancouver, Metro Vancouver ...","(49.2775935, -123.0439199)",49.277594,-123.04392
6,"Kensington-Cedar Cottage, Vancouver","(Kensington-Cedar Cottage, Vancouver, Metro Va...","(49.2476321, -123.0842067)",49.247632,-123.084207
7,"Kerrisdale, Vancouver","(Kerrisdale, Vancouver, Metro Vancouver Region...","(49.2346728, -123.1553893)",49.234673,-123.155389
8,"Killarney, Vancouver","(Killarney, Vancouver, Metro Vancouver Regiona...","(49.2242738, -123.0462504)",49.224274,-123.04625
9,"Kitsilano, Vancouver","(Kitsilano, Vancouver, Metro Vancouver Regiona...","(49.2694099, -123.155267)",49.26941,-123.155267


In [34]:
df_vancouver = df_vancouver.drop(['Location', 'Point'], axis=1)
df_vancouver.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Arbutus Ridge, Vancouver",49.240968,-123.167001
1,"Downtown, Vancouver",49.283393,-123.117456
2,"Dunbar-Southlands, Vancouver",49.25346,-123.185044
3,"Fairview, Vancouver",49.264113,-123.126835
4,"Grandview-Woodland, Vancouver",49.270559,-123.067942


In [35]:
address = 'Vancouver, Canada'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Vancouver are 49.2608724, -123.1139529.


#### B.2. Vancouver Neighborhood Map

In [36]:
map_vancouver = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_vancouver['Latitude'], df_vancouver['Longitude'], df_vancouver['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_vancouver)  
    
map_vancouver

#### B.3. Vancouver Venues from Foursquare

In [37]:
df_vancouvervenues = getNearbyVenues(names=df_vancouver['Neighborhood'],
                                   latitudes=df_vancouver['Latitude'],
                                   longitudes=df_vancouver['Longitude']
                                  )

Arbutus Ridge, Vancouver
Downtown, Vancouver
Dunbar-Southlands, Vancouver
Fairview, Vancouver
Grandview-Woodland, Vancouver
Hastings-Sunrise, Vancouver
Kensington-Cedar Cottage, Vancouver
Kerrisdale, Vancouver
Killarney, Vancouver
Kitsilano, Vancouver
Marpole, Vancouver
Mount Pleasant, Vancouver
Oakridge, Vancouver
Renfrew-Collingwood, Vancouver
Riley Park, Vancouver
Shaughnessy, Vancouver
South Cambie, Vancouver
Strathcona, Vancouver
Sunset, Vancouver
Victoria-Fraserview, Vancouver
West End, Vancouver
West Point Grey, Vancouver


In [80]:
df_vancouvervenues.groupby('Neighborhood').head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Arbutus Ridge, Vancouver",49.240968,-123.167001,Butter Baked Goods,49.242209,-123.170381,Bakery
1,"Arbutus Ridge, Vancouver",49.240968,-123.167001,The Haven,49.241377,-123.166331,Spa
2,"Arbutus Ridge, Vancouver",49.240968,-123.167001,Barktholomews Pet Supplies,49.242746,-123.170193,Pet Store
3,"Arbutus Ridge, Vancouver",49.240968,-123.167001,The Dragon's Layer,49.238518,-123.169029,Nightlife Spot
4,"Arbutus Ridge, Vancouver",49.240968,-123.167001,The Heights Market,49.237902,-123.170949,Grocery Store
5,"Downtown, Vancouver",49.283393,-123.117456,Rosewood Hotel Georgia,49.283429,-123.118911,Hotel
6,"Downtown, Vancouver",49.283393,-123.117456,Gotham Steakhouse & Cocktail Bar,49.282830,-123.115865,Steakhouse
7,"Downtown, Vancouver",49.283393,-123.117456,Hawksworth Restaurant,49.283362,-123.119462,Lounge
8,"Downtown, Vancouver",49.283393,-123.117456,SEPHORA,49.284092,-123.117204,Cosmetics Shop
9,"Downtown, Vancouver",49.283393,-123.117456,Abercrombie & Fitch,49.282274,-123.118685,Clothing Store


In [39]:
df_vancouvervenues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Arbutus Ridge, Vancouver",5,5,5,5,5,5
"Downtown, Vancouver",100,100,100,100,100,100
"Dunbar-Southlands, Vancouver",10,10,10,10,10,10
"Fairview, Vancouver",26,26,26,26,26,26
"Grandview-Woodland, Vancouver",69,69,69,69,69,69
"Hastings-Sunrise, Vancouver",13,13,13,13,13,13
"Kensington-Cedar Cottage, Vancouver",21,21,21,21,21,21
"Kerrisdale, Vancouver",39,39,39,39,39,39
"Killarney, Vancouver",4,4,4,4,4,4
"Kitsilano, Vancouver",43,43,43,43,43,43


In [40]:
print('There are {} uniques categories.'.format(len(df_vancouvervenues['Venue Category'].unique())))

There are 155 uniques categories.


In [41]:
# one hot encoding
vancouver_onehot = pd.get_dummies(df_vancouvervenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vancouver_onehot['Neighborhood'] = df_vancouvervenues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vancouver_onehot.columns[-1]] + list(vancouver_onehot.columns[:-1])
vancouver_onehot = vancouver_onehot[fixed_columns]

vancouver_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Track,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
1,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## -- Week 2 Submission --

### III. Methodology

To analyze each neighborhood, we first took the frequency of the mean of each category and grouped the results per neighborhood, and created a new dataframe consisting of the top 10 venues/locations in each neighborhood.

#### A.1. Toronto Neighborhoods Grouped with the Frequency Mean of Each Venue Category

In [42]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0.000000,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000
1,"Alderwood, Long Branch",0.000000,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.052632,0.00000,0.000000,0.000000,0.0,0.000000
3,Bayview Village,0.000000,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000
5,Berczy Park,0.000000,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.017544,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000
6,"Birch Cliff, Cliffside West",0.000000,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.000000,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000
8,Business reply mail Processing Centre,0.055556,0.0,0.000000,0.0000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.0,0.000000,0.0625,0.0625,0.0625,0.125,0.125,0.125,...,0.00000,0.00,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.0,0.000000


Getting the top 5 venues in each neighborhood:

In [43]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0                     Lounge  0.25
1               Skating Rink  0.25
2  Latin American Restaurant  0.25
3             Breakfast Spot  0.25
4   Mediterranean Restaurant  0.00


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place   0.2
1             Gym   0.1
2  Sandwich Place   0.1
3             Pub   0.1
4            Pool   0.1


----Bathurst Manor, Wilson Heights, Downsview North----
                       venue  freq
0                       Bank  0.11
1                Coffee Shop  0.11
2        Fried Chicken Joint  0.05
3  Middle Eastern Restaurant  0.05
4                Gas Station  0.05


----Bayview Village----
                 venue  freq
0                 Café  0.25
1                 Bank  0.25
2  Japanese Restaurant  0.25
3   Chinese Restaurant  0.25
4          Yoga Studio  0.00


----Bedford Park, Lawrence Manor East----
                venue  freq
0  Italian Restaurant  0.09
1         Coffee Shop  0.09

Creating a dataframe with the top 10 venues:

In [44]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [81]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

toronto_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Breakfast Spot,Lounge,Skating Rink,Colombian Restaurant,Comfort Food Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,College Rec Center
1,"Alderwood, Long Branch",Pizza Place,Skating Rink,Sandwich Place,Pharmacy,Pool,Pub,Dance Studio,Coffee Shop,Gym,Gas Station
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Sushi Restaurant,Bridal Shop,Middle Eastern Restaurant,Deli / Bodega,Restaurant,Pizza Place,Pharmacy,Fried Chicken Joint
3,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Women's Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Restaurant,Sandwich Place,Italian Restaurant,Grocery Store,Sushi Restaurant,Comfort Food Restaurant,Pharmacy,Pizza Place,Café


#### A.2. Vancouver Neighborhoods Grouped with the Frequency Mean of Each Venue Category

In [46]:
vancouver_grouped = vancouver_onehot.groupby('Neighborhood').mean().reset_index()
vancouver_grouped

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Track,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,"Arbutus Ridge, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Downtown, Vancouver",0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.01
2,"Dunbar-Southlands, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Fairview, Vancouver",0.0,0.0,0.0,0.076923,0.0,0.0,0.038462,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0
4,"Grandview-Woodland, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.028986,...,0.014493,0.0,0.0,0.0,0.014493,0.0,0.014493,0.0,0.014493,0.0
5,"Hastings-Sunrise, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0
6,"Kensington-Cedar Cottage, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0
7,"Kerrisdale, Vancouver",0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,...,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0
8,"Killarney, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
9,"Kitsilano, Vancouver",0.046512,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.046512,...,0.046512,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256


Getting the top 5 venues in each neighborhood:

In [47]:
num_top_venues = 5

for hood in vancouver_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = vancouver_grouped[vancouver_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Arbutus Ridge, Vancouver----
            venue  freq
0             Spa   0.2
1  Nightlife Spot   0.2
2          Bakery   0.2
3       Pet Store   0.2
4   Grocery Store   0.2


----Downtown, Vancouver----
            venue  freq
0           Hotel  0.09
1     Coffee Shop  0.05
2      Food Truck  0.05
3  Clothing Store  0.03
4       Bookstore  0.03


----Dunbar-Southlands, Vancouver----
                 venue  freq
0   Italian Restaurant   0.1
1         Liquor Store   0.1
2  Japanese Restaurant   0.1
3          Pizza Place   0.1
4          Coffee Shop   0.1


----Fairview, Vancouver----
              venue  freq
0       Coffee Shop  0.15
1  Asian Restaurant  0.08
2              Park  0.08
3          Pharmacy  0.08
4        Restaurant  0.04


----Grandview-Woodland, Vancouver----
                 venue  freq
0          Coffee Shop  0.09
1   Italian Restaurant  0.06
2  Japanese Restaurant  0.04
3          Pizza Place  0.04
4                 Park  0.04


----Hastings-Sunrise, Vancouver---

Create a new dataframe with top 10 venues for each neighborhood:

In [52]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [82]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
vancouver_venues_sorted = pd.DataFrame(columns=columns)
vancouver_venues_sorted['Neighborhood'] = vancouver_grouped['Neighborhood']

for ind in np.arange(vancouver_grouped.shape[0]):
    vancouver_venues_sorted.iloc[ind, 1:] = return_most_common_venues(vancouver_grouped.iloc[ind, :], num_top_venues)

vancouver_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Arbutus Ridge, Vancouver",Grocery Store,Pet Store,Nightlife Spot,Bakery,Spa,Falafel Restaurant,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
1,"Downtown, Vancouver",Hotel,Coffee Shop,Food Truck,Restaurant,Seafood Restaurant,Concert Hall,Steakhouse,Clothing Store,Bookstore,Sandwich Place
2,"Dunbar-Southlands, Vancouver",Italian Restaurant,Sushi Restaurant,Indian Restaurant,Japanese Restaurant,Liquor Store,Pizza Place,Coffee Shop,Salon / Barbershop,Sporting Goods Shop,Ice Cream Shop
3,"Fairview, Vancouver",Coffee Shop,Pharmacy,Asian Restaurant,Park,Sushi Restaurant,Szechuan Restaurant,Salon / Barbershop,Malay Restaurant,Restaurant,Camera Store
4,"Grandview-Woodland, Vancouver",Coffee Shop,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Japanese Restaurant,Park,Café,Pizza Place,Clothing Store,Burger Joint


Using K-Means, we clustered the neighborhoods according to their similarities.

In [54]:
from sklearn.cluster import KMeans

#### B.1. Toronto clustering

In [84]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [85]:
# add clustering labels
toronto_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_torontogeodata

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Bus Stop,College Rec Center,Dance Studio,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,French Restaurant,Coffee Shop,Hockey Arena,Portuguese Restaurant,Dim Sum Restaurant,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Park,Pub,Bakery,Café,Theater,Restaurant,Breakfast Spot,Yoga Studio,Event Space
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Accessories Store,Event Space,Shoe Store,Vietnamese Restaurant,Miscellaneous Shop,Coffee Shop,Boutique,Arts & Crafts Store,Furniture / Home Store
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Sushi Restaurant,Burger Joint,Bar,Beer Bar,Italian Restaurant,Juice Bar,Sandwich Place,Distribution Center,Restaurant


In [86]:
toronto_merged['Cluster Labels']

0      0.0
1      1.0
2      1.0
3      1.0
4      1.0
5      NaN
6      3.0
7      1.0
8      1.0
9      1.0
10     0.0
11     1.0
12     1.0
13     1.0
14     1.0
15     1.0
16     1.0
17     1.0
18     1.0
19     1.0
20     1.0
21     0.0
22     1.0
23     1.0
24     1.0
25     1.0
26     1.0
27     1.0
28     1.0
29     1.0
      ... 
73     1.0
74     1.0
75     1.0
76     1.0
77     1.0
78     1.0
79     1.0
80     1.0
81     1.0
82     1.0
83     2.0
84     1.0
85     0.0
86     1.0
87     1.0
88     1.0
89     1.0
90     1.0
91     0.0
92     1.0
93     1.0
94     1.0
95     NaN
96     1.0
97     1.0
98     0.0
99     1.0
100    1.0
101    1.0
102    1.0
Name: Cluster Labels, Length: 103, dtype: float64

In [87]:
toronto_merged.dropna(subset = ['Cluster Labels'], inplace=True)

In [88]:
toronto_merged['Cluster Labels'].astype(int)

0      0
1      1
2      1
3      1
4      1
6      3
7      1
8      1
9      1
10     0
11     1
12     1
13     1
14     1
15     1
16     1
17     1
18     1
19     1
20     1
21     0
22     1
23     1
24     1
25     1
26     1
27     1
28     1
29     1
30     1
      ..
72     1
73     1
74     1
75     1
76     1
77     1
78     1
79     1
80     1
81     1
82     1
83     2
84     1
85     0
86     1
87     1
88     1
89     1
90     1
91     0
92     1
93     1
94     1
96     1
97     1
98     0
99     1
100    1
101    1
102    1
Name: Cluster Labels, Length: 101, dtype: int64

In [89]:
toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Bus Stop,College Rec Center,Dance Studio,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,French Restaurant,Coffee Shop,Hockey Arena,Portuguese Restaurant,Dim Sum Restaurant,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Park,Pub,Bakery,Café,Theater,Restaurant,Breakfast Spot,Yoga Studio,Event Space
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1.0,Clothing Store,Accessories Store,Event Space,Shoe Store,Vietnamese Restaurant,Miscellaneous Shop,Coffee Shop,Boutique,Arts & Crafts Store,Furniture / Home Store
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Sushi Restaurant,Burger Joint,Bar,Beer Bar,Italian Restaurant,Juice Bar,Sandwich Place,Distribution Center,Restaurant


Visualize map of neighborhood clusters:

In [90]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [91]:
# create map
toronto_clusters = folium.Map(location=[43.651070, -79.347015], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(toronto_clusters)
       
toronto_clusters

#### B.2. Vancouver clustering

In [117]:
# set number of clusters
kclusters = 5

vancouver_grouped_clustering = vancouver_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vancouver_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 1, 1, 1, 1, 1, 1, 1, 3, 1], dtype=int32)

In [118]:
# add clustering labels
vancouver_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

vancouver_merged = df_vancouver

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
vancouver_merged = vancouver_merged.join(vancouver_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

vancouver_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Arbutus Ridge, Vancouver",49.240968,-123.167001,4.0,Grocery Store,Pet Store,Nightlife Spot,Bakery,Spa,Falafel Restaurant,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
1,"Downtown, Vancouver",49.283393,-123.117456,1.0,Hotel,Coffee Shop,Food Truck,Restaurant,Seafood Restaurant,Concert Hall,Steakhouse,Clothing Store,Bookstore,Sandwich Place
2,"Dunbar-Southlands, Vancouver",49.25346,-123.185044,1.0,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Japanese Restaurant,Liquor Store,Pizza Place,Coffee Shop,Salon / Barbershop,Sporting Goods Shop,Ice Cream Shop
3,"Fairview, Vancouver",49.264113,-123.126835,1.0,Coffee Shop,Pharmacy,Asian Restaurant,Park,Sushi Restaurant,Szechuan Restaurant,Salon / Barbershop,Malay Restaurant,Restaurant,Camera Store
4,"Grandview-Woodland, Vancouver",49.270559,-123.067942,1.0,Coffee Shop,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Japanese Restaurant,Park,Café,Pizza Place,Clothing Store,Burger Joint


In [119]:
vancouver_merged['Cluster Labels']

0     4.0
1     1.0
2     1.0
3     1.0
4     1.0
5     1.0
6     1.0
7     1.0
8     3.0
9     1.0
10    1.0
11    1.0
12    1.0
13    1.0
14    1.0
15    2.0
16    1.0
17    NaN
18    0.0
19    1.0
20    1.0
21    1.0
Name: Cluster Labels, dtype: float64

In [120]:
vancouver_merged.dropna(subset = ["Cluster Labels"], inplace=True)

In [121]:
vancouver_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Arbutus Ridge, Vancouver",49.240968,-123.167001,4.0,Grocery Store,Pet Store,Nightlife Spot,Bakery,Spa,Falafel Restaurant,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
1,"Downtown, Vancouver",49.283393,-123.117456,1.0,Hotel,Coffee Shop,Food Truck,Restaurant,Seafood Restaurant,Concert Hall,Steakhouse,Clothing Store,Bookstore,Sandwich Place
2,"Dunbar-Southlands, Vancouver",49.25346,-123.185044,1.0,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Japanese Restaurant,Liquor Store,Pizza Place,Coffee Shop,Salon / Barbershop,Sporting Goods Shop,Ice Cream Shop
3,"Fairview, Vancouver",49.264113,-123.126835,1.0,Coffee Shop,Pharmacy,Asian Restaurant,Park,Sushi Restaurant,Szechuan Restaurant,Salon / Barbershop,Malay Restaurant,Restaurant,Camera Store
4,"Grandview-Woodland, Vancouver",49.270559,-123.067942,1.0,Coffee Shop,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Japanese Restaurant,Park,Café,Pizza Place,Clothing Store,Burger Joint


In [122]:
# create map
vancouver_clusters = folium.Map(location=[49.2608724, -123.1139529], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vancouver_merged['Latitude'], vancouver_merged['Longitude'], vancouver_merged['Neighborhood'], vancouver_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(vancouver_clusters)
       
vancouver_clusters

### IV. Results

We take a closer look at the clusters in each neighborhood:

#### A. Toronto

Cluster 1:

In [92]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,0.0,Park,Food & Drink Shop,Bus Stop,College Rec Center,Dance Studio,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
10,Glencairn,0.0,Park,Pub,Metro Station,Japanese Restaurant,Women's Store,Dim Sum Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
21,Caledonia-Fairbanks,0.0,Park,Pool,Women's Store,Afghan Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
35,East Toronto,0.0,Park,Coffee Shop,Convenience Store,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center
49,"North Park, Maple Leaf Park, Upwood Park",0.0,Park,Construction & Landscaping,Bakery,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dance Studio
61,Lawrence Park,0.0,Park,Swim School,Bus Line,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Ethiopian Restaurant,Diner
64,Weston,0.0,Park,Curling Ice,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
66,York Mills West,0.0,Park,Bank,Convenience Store,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center,Dance Studio
68,Forest Hill North & West,0.0,Trail,Jewelry Store,Sushi Restaurant,Park,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Curling Ice
85,"Milliken, Agincourt North, Steeles East, L'Amo...",0.0,Park,Bakery,Playground,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center


Cluster 2:

In [93]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,1.0,French Restaurant,Coffee Shop,Hockey Arena,Portuguese Restaurant,Dim Sum Restaurant,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
2,"Regent Park, Harbourfront",1.0,Coffee Shop,Park,Pub,Bakery,Café,Theater,Restaurant,Breakfast Spot,Yoga Studio,Event Space
3,"Lawrence Manor, Lawrence Heights",1.0,Clothing Store,Accessories Store,Event Space,Shoe Store,Vietnamese Restaurant,Miscellaneous Shop,Coffee Shop,Boutique,Arts & Crafts Store,Furniture / Home Store
4,"Queen's Park, Ontario Provincial Government",1.0,Coffee Shop,Sushi Restaurant,Burger Joint,Bar,Beer Bar,Italian Restaurant,Juice Bar,Sandwich Place,Distribution Center,Restaurant
7,Don Mills,1.0,Coffee Shop,Asian Restaurant,Beer Store,Japanese Restaurant,Gym,Restaurant,Bike Shop,Italian Restaurant,Discount Store,Sporting Goods Shop
8,"Parkview Hill, Woodbine Gardens",1.0,Pizza Place,Fast Food Restaurant,Bank,Gym / Fitness Center,Athletics & Sports,Pet Store,Gastropub,Pharmacy,Intersection,Dim Sum Restaurant
9,"Garden District, Ryerson",1.0,Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Japanese Restaurant,Cosmetics Shop,Restaurant,Middle Eastern Restaurant,Italian Restaurant,Hotel
11,"West Deane Park, Princess Gardens, Martin Grov...",1.0,Golf Course,Women's Store,Discount Store,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center
12,"Rouge Hill, Port Union, Highland Creek",1.0,Construction & Landscaping,Bar,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store,Dance Studio
13,Don Mills,1.0,Coffee Shop,Asian Restaurant,Beer Store,Japanese Restaurant,Gym,Restaurant,Bike Shop,Italian Restaurant,Discount Store,Sporting Goods Shop


Cluster 3:

In [94]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough Village,2.0,Convenience Store,Playground,Women's Store,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center
83,"Moore Park, Summerhill East",2.0,Playground,Women's Store,Discount Store,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center


Cluster 4:

In [95]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,"Malvern, Rouge",3.0,Fast Food Restaurant,Women's Store,Curling Ice,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
56,"Del Ray, Mount Dennis, Keelsdale and Silverthorn",3.0,Fast Food Restaurant,Fried Chicken Joint,Sandwich Place,Bar,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center


Cluster 5:

In [96]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,"Humberlea, Emery",4.0,Baseball Field,Women's Store,Dance Studio,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


Looking at the clusters, we see that there Filipino restaurants are not popular in Toronto, so we would want to check again if there are Filipino restaurants in the city.

In [99]:
toronto_grouped['Filipino Restaurant'].sum()

0.017857142857142856

In [104]:
toronto_filipino = toronto_grouped.sort_values(by = 'Filipino Restaurant', ascending = False)
toronto_filipino.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
43,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.035714,0.0,0.0,0.053571,0.0,0.017857,0.0,0.017857
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
61,Parkwoods,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
70,Scarborough Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
69,"Runnymede, The Junction North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [116]:
toronto_filipino = toronto_filipino[['Neighborhood', 'Filipino Restaurant']]
toronto_filipino.head()

Unnamed: 0,Neighborhood,Filipino Restaurant
43,"Kensington Market, Chinatown, Grange Park",0.017857
0,Agincourt,0.0
61,Parkwoods,0.0
70,Scarborough Village,0.0
69,"Runnymede, The Junction North",0.0


After further checking, we find that there are Filipino restaurants in Kensington Market, Chinatown and Grange Park, which are in Cluster 2.

#### B. Vancouver

Cluster 1:

In [123]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 0, vancouver_merged.columns[[0] + list(range(3, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,"Sunset, Vancouver",0.0,Indian Restaurant,Cosmetics Shop,Dessert Shop,South Indian Restaurant,Falafel Restaurant,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant,Fast Food Restaurant


Cluster 2:

In [124]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 1, vancouver_merged.columns[[0] + list(range(3, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Downtown, Vancouver",1.0,Hotel,Coffee Shop,Food Truck,Restaurant,Seafood Restaurant,Concert Hall,Steakhouse,Clothing Store,Bookstore,Sandwich Place
2,"Dunbar-Southlands, Vancouver",1.0,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Japanese Restaurant,Liquor Store,Pizza Place,Coffee Shop,Salon / Barbershop,Sporting Goods Shop,Ice Cream Shop
3,"Fairview, Vancouver",1.0,Coffee Shop,Pharmacy,Asian Restaurant,Park,Sushi Restaurant,Szechuan Restaurant,Salon / Barbershop,Malay Restaurant,Restaurant,Camera Store
4,"Grandview-Woodland, Vancouver",1.0,Coffee Shop,Italian Restaurant,Sushi Restaurant,Indian Restaurant,Japanese Restaurant,Park,Café,Pizza Place,Clothing Store,Burger Joint
5,"Hastings-Sunrise, Vancouver",1.0,Vietnamese Restaurant,Coffee Shop,Liquor Store,Gas Station,Event Space,Sandwich Place,Bakery,Park,Sushi Restaurant,Food Truck
6,"Kensington-Cedar Cottage, Vancouver",1.0,Coffee Shop,Vietnamese Restaurant,Chinese Restaurant,Filipino Restaurant,Bus Stop,Ice Cream Shop,Malay Restaurant,Convenience Store,Electronics Store,Restaurant
7,"Kerrisdale, Vancouver",1.0,Chinese Restaurant,Coffee Shop,Tea Room,Sandwich Place,Sushi Restaurant,Pharmacy,Bubble Tea Shop,Café,Portuguese Restaurant,Hobby Shop
9,"Kitsilano, Vancouver",1.0,American Restaurant,Coffee Shop,Sushi Restaurant,Japanese Restaurant,French Restaurant,Food Truck,Tea Room,Thai Restaurant,Bakery,Ice Cream Shop
10,"Marpole, Vancouver",1.0,Pizza Place,Sushi Restaurant,Chinese Restaurant,Dessert Shop,Japanese Restaurant,Bubble Tea Shop,Vietnamese Restaurant,Coffee Shop,Falafel Restaurant,Sporting Goods Shop
11,"Mount Pleasant, Vancouver",1.0,Coffee Shop,Diner,Thrift / Vintage Store,Sandwich Place,Breakfast Spot,Sushi Restaurant,Indian Restaurant,Vietnamese Restaurant,Brewery,Arts & Crafts Store


Cluster 3:

In [125]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 2, vancouver_merged.columns[[0] + list(range(3, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,"Shaughnessy, Vancouver",2.0,French Restaurant,Bus Stop,Park,Diner,Discount Store,Donut Shop,Electronics Store,Ethiopian Restaurant,Event Space,Food Truck


Cluster 4:

In [126]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 3, vancouver_merged.columns[[0] + list(range(3, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Killarney, Vancouver",3.0,Italian Restaurant,Pool,Track,Gym,Event Space,Food,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market


Cluster 5:

In [127]:
vancouver_merged.loc[vancouver_merged['Cluster Labels'] == 4, vancouver_merged.columns[[0] + list(range(3, vancouver_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Arbutus Ridge, Vancouver",4.0,Grocery Store,Pet Store,Nightlife Spot,Bakery,Spa,Falafel Restaurant,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant


On the other hand, in Vancouver, 4 out of the 5 clusters have Filipino Restaurants. Among these clusters, there are 5 neighborhoods with popular Filipino Restaurants: Sunset in Cluster 1; Kensington-Cedar Cottage and Victoria-Fraserview in Cluster 2; Killarney in Cluster 4; and Arbutus Ridge in Cluster 5.

We want to take a look at the neighborhoods with Filipino restaurants further.

In [135]:
vancouver_filipino = vancouver_merged.loc[[0,8,6,18,19]]
vancouver_filipino

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Arbutus Ridge, Vancouver",49.240968,-123.167001,4.0,Grocery Store,Pet Store,Nightlife Spot,Bakery,Spa,Falafel Restaurant,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant
8,"Killarney, Vancouver",49.224274,-123.04625,3.0,Italian Restaurant,Pool,Track,Gym,Event Space,Food,Flower Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market
6,"Kensington-Cedar Cottage, Vancouver",49.247632,-123.084207,1.0,Coffee Shop,Vietnamese Restaurant,Chinese Restaurant,Filipino Restaurant,Bus Stop,Ice Cream Shop,Malay Restaurant,Convenience Store,Electronics Store,Restaurant
18,"Sunset, Vancouver",49.219593,-123.090239,0.0,Indian Restaurant,Cosmetics Shop,Dessert Shop,South Indian Restaurant,Falafel Restaurant,Food & Drink Shop,Food,Flower Shop,Filipino Restaurant,Fast Food Restaurant
19,"Victoria-Fraserview, Vancouver",49.218416,-123.073287,1.0,Sandwich Place,Convenience Store,Pizza Place,Fast Food Restaurant,Gas Station,Ethiopian Restaurant,Flower Shop,Filipino Restaurant,Farmers Market,Falafel Restaurant


### V. Discussion

Based on our results, while Toronto has a bigger area and a bigger population of Filipino immigrants, Vancouver has more Filipino restaurants. It can also be noted that Cluster 2 of both cities appears to be more diverse compared to other clusters due to the variety of cuisines available in their respective neighborhoods. In terms of availability of Filipino stores and overall diversity, Vancouver would be a reasonable choice of residence for Filipino immigrants, specifically the neighborhoods of Kensington-Cedar Cottage and Victoria-Fraserview, which are both in Cluster 2.

On the other side of the country, it can be observed that Toronoto has more parks compared to Vancouver. In this case, a Cluster 2 neighborhood (diversity of venues and cuisines) that is in near proximity to a Cluster 1 neighborhood (accessibility to parks) would be an excellent selection.

Looking at the problem a different way, the places *without* Filipino restaurants and communities can be seen as growth opportunities for business owners as these areas are far from being market-saturated.

### VI. Conclusion

As more and more Filipinos (and people from other Third-World countries) search for better opportunities across the globe, Canada continuously proves itself to be a melting pot of different cultures, as evidenced by two of its major cities. In this study, we explored the neighborhoods of Toronto and Vancouver to find the ideal location or residence for Filipino immigrants. While we have narrowed down the results to a few neighborhoods, it is important to note how the study can be further improved using additional information on cost of living per city, education and most importantly, employment opportunities for immigrants.