# Capstone Project: The Battle of Neighborhoods

This Notebook is used for the Applied Data Science Capstone Project on Coursera.

## -- Week 1 Submission --

### Introduction

In the past decade, Canada has been attracting huge numbers of migrants all over the globe due to its high Global Liveability Index, an assessment by the Economist Intelligence Unit of various cities based on various factors such as stability, healthcare, culture and environment, education and infrastructure (source: https://www.eiu.com/topic/liveability). In fact, 3 Canadian cities made it to the top 10 list: Toronto, Vancouver and Calgary (source: https://www.businessinsider.com/most-livable-cities-in-the-world-2018-8).

Immigrants form a large part of the Canadian population. From Canada's 2016 Census, 7.5 million people migrated to Canada, representing 1 in 5 of people in Canada. The Filipino population in Canada has been steadily growing since 1996, and more than 188,805 Filipino immigrants have been recorded in the 2016 census (source: https://www150.statcan.gc.ca/n1/pub/11-627-m/11-627-m2017028-eng.htm). Filipinos make up the third-largest Asian group in Canada and is the 7th largest ethnic group in the country (source: https://en.wikipedia.org/wiki/Ethnic_origins_of_people_in_Canada#Evolution_from_1996_to_2016). Filipinos are spread out in Canada, but the largest communities are concentrated in Toronto and Vancouver (https://en.wikipedia.org/wiki/Filipino_Canadians).

This report aims to explore the cities of Toronto and Vancouver, both listed as top liveable cities and top immigration destinations for Filipinos. As each city is comprised of several neighborhoods, and the goal of this report is to discover which neighborhoods are ideal for future Filipino immigrants.

### Methodology / Data

This report will utilize venue data from the Foursquare API. Given this, it is important to note that the analysis will only factor in the venues listed in Foursquare, and this report may not be as "wholistic" as desired, as there are other factors in choosing a neighborhood, such as work opportunities and transportation, that are outside the scope of available data.

To map out the neighborhoods, geographic locations will be obtained from the GeoPy library and mapped out using Folium.

#### A. Toronto Data

#### A.1. Toronto Neighborhood Dataframe

In [1]:
import pandas as pd
import numpy as np

In [2]:
df_toronto = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df_toronto

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern / Rouge


In [3]:
df_toronto = df_toronto.set_index('Borough')
df_toronto.head()

Unnamed: 0_level_0,Postal code,Neighborhood
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
Not assigned,M1A,
Not assigned,M2A,
North York,M3A,Parkwoods
North York,M4A,Victoria Village
Downtown Toronto,M5A,Regent Park / Harbourfront


In [4]:
df_toronto.drop(index='Not assigned', inplace=True)
df_toronto.head()

Unnamed: 0_level_0,Postal code,Neighborhood
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
North York,M3A,Parkwoods
North York,M4A,Victoria Village
Downtown Toronto,M5A,Regent Park / Harbourfront
North York,M6A,Lawrence Manor / Lawrence Heights
Downtown Toronto,M7A,Queen's Park / Ontario Provincial Government


In [5]:
df_toronto.reset_index(inplace = True)
df_toronto.head()

Unnamed: 0,Borough,Postal code,Neighborhood
0,North York,M3A,Parkwoods
1,North York,M4A,Victoria Village
2,Downtown Toronto,M5A,Regent Park / Harbourfront
3,North York,M6A,Lawrence Manor / Lawrence Heights
4,Downtown Toronto,M7A,Queen's Park / Ontario Provincial Government


In [6]:
df_toronto['Neighborhood'] = df_toronto['Neighborhood'].astype(str)
df_toronto['Neighborhood'] = df_toronto['Neighborhood'].str.replace(' /', ',')
df_toronto.head()

Unnamed: 0,Borough,Postal code,Neighborhood
0,North York,M3A,Parkwoods
1,North York,M4A,Victoria Village
2,Downtown Toronto,M5A,"Regent Park, Harbourfront"
3,North York,M6A,"Lawrence Manor, Lawrence Heights"
4,Downtown Toronto,M7A,"Queen's Park, Ontario Provincial Government"


In [7]:
df_toronto = df_toronto[['Postal code', 'Borough', 'Neighborhood']]

In [8]:
df_geodata = pd.read_csv('http://cocl.us/Geospatial_data')
df_geodata

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [9]:
df_geodata = df_geodata.set_index('Postal Code')
df_geodata.head()

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


In [10]:
df_toronto.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


##### Combining the list of Toronto neighborhoods and their respective geographical locations:

In [17]:
df_torontogeodata = df_toronto.join(df_geodata, how='outer', on='Postal code')
df_torontogeodata

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


#### A.2. Toronto Neighborhood Map

In [18]:
!conda install -c conda-forge folium=0.5.0 --yes 
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                       

In [42]:
map_toronto = folium.Map(location=[43.651070, -79.347015], zoom_start=12)
map_toronto

In [21]:
for lat, lng, borough, neighborhood in zip(df_torontogeodata['Latitude'], df_torontogeodata['Longitude'], df_torontogeodata['Borough'], df_torontogeodata['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto

#### A.3. Toronto Venues from Foursquare

In [23]:
CLIENT_ID = '3KUESZ32SIS1FHTZTDDEAPJUB5XJOA3JD1FWZ2AJHDL5K0IT' # your Foursquare ID
CLIENT_SECRET = 'SR1ZQ1UTULNY3D45Q0K2K4P0ID2H2JRYVHVJGU3XT3RF1BU1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3KUESZ32SIS1FHTZTDDEAPJUB5XJOA3JD1FWZ2AJHDL5K0IT
CLIENT_SECRET:SR1ZQ1UTULNY3D45Q0K2K4P0ID2H2JRYVHVJGU3XT3RF1BU1


In [25]:
df_torontogeodata.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [27]:
df_torontogeodata.loc[0, 'Neighborhood']

'Parkwoods'

In [29]:
neighborhood_latitude = df_torontogeodata.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_torontogeodata.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_torontogeodata.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


In [30]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=3KUESZ32SIS1FHTZTDDEAPJUB5XJOA3JD1FWZ2AJHDL5K0IT&client_secret=SR1ZQ1UTULNY3D45Q0K2K4P0ID2H2JRYVHVJGU3XT3RF1BU1&v=20180605&ll=43.7532586,-79.3296565&radius=500&limit=100'

In [31]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [32]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eafa63f40a7ea001b2a4559'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

In [33]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [35]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [36]:
df_torontovenues = getNearbyVenues(names=df_torontogeodata['Neighborhood'],
                                   latitudes=df_torontogeodata['Latitude'],
                                   longitudes=df_torontogeodata['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [37]:
df_torontovenues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [38]:
df_torontovenues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",10,10,10,10,10,10
"Bathurst Manor, Wilson Heights, Downsview North",19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",24,24,24,24,24,24
Berczy Park,57,57,57,57,57,57
"Birch Cliff, Cliffside West",4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
Business reply mail Processing Centre,18,18,18,18,18,18
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",17,17,17,17,17,17


In [57]:
print('There are {} uniques categories.'.format(len(df_torontovenues['Venue Category'].unique())))

There are 267 uniques categories.


In [39]:
# one hot encoding
toronto_onehot = pd.get_dummies(df_torontovenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = df_torontovenues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### A.4. Toronto Neighborhoods Grouped with the Frequency Mean of Each Venue Category

In [40]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000
1,"Alderwood, Long Branch",0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.052632,0.000000,0.00,0.000000,0.000000,0.0,0.000000
3,Bayview Village,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.041667,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000
5,Berczy Park,0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.017544,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000
6,"Birch Cliff, Cliffside West",0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.000000,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000
8,Business reply mail Processing Centre,0.055556,0.000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.000,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.000000,...,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.0,0.000000


### B. Vancouver Data

#### B.1. Vancouver Neighborhood Data

In [44]:
neighborhood_list = ['Arbutus Ridge, Vancouver',
'Downtown, Vancouver',
'Dunbar-Southlands, Vancouver',
'Fairview, Vancouver',
'Grandview-Woodland, Vancouver',
'Hastings-Sunrise, Vancouver',
'Kensington-Cedar Cottage, Vancouver',
'Kerrisdale, Vancouver',
'Killarney, Vancouver',
'Kitsilano, Vancouver',
'Marpole, Vancouver',
'Mount Pleasant, Vancouver',
'Oakridge, Vancouver',
'Renfrew-Collingwood, Vancouver',
'Riley Park, Vancouver',
'Shaughnessy, Vancouver',
'South Cambie, Vancouver',
'Strathcona, Vancouver',
'Sunset, Vancouver',
'Victoria-Fraserview, Vancouver',
'West End, Vancouver',
'West Point Grey, Vancouver']

(source: https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Vancouver#Official_Neighbourhoods) 

In [45]:
df_vancouver = pd.DataFrame(neighborhood_list, columns=['Neighborhood'])
df_vancouver

Unnamed: 0,Neighborhood
0,"Arbutus Ridge, Vancouver"
1,"Downtown, Vancouver"
2,"Dunbar-Southlands, Vancouver"
3,"Fairview, Vancouver"
4,"Grandview-Woodland, Vancouver"
5,"Hastings-Sunrise, Vancouver"
6,"Kensington-Cedar Cottage, Vancouver"
7,"Kerrisdale, Vancouver"
8,"Killarney, Vancouver"
9,"Kitsilano, Vancouver"


In [46]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="explorer")

from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
df_vancouver['Location'] = df_vancouver['Neighborhood'].apply(geocode)

df_vancouver['Point'] = df_vancouver['Location'].apply(lambda x: (x.latitude, x.longitude))
df_vancouver

Unnamed: 0,Neighborhood,Location,Point
0,"Arbutus Ridge, Vancouver","(Arbutus Ridge, Vancouver, Metro Vancouver Reg...","(49.2409677, -123.1670008)"
1,"Downtown, Vancouver","(Downtown, Vancouver, Metro Vancouver Regional...","(49.283393, -123.1174563)"
2,"Dunbar-Southlands, Vancouver","(Dunbar-Southlands, Vancouver, Metro Vancouver...","(49.2534601, -123.1850439)"
3,"Fairview, Vancouver","(Fairview, Vancouver, Metro Vancouver Regional...","(49.2641128, -123.1268352)"
4,"Grandview-Woodland, Vancouver","(Grandview-Woodland, Vancouver, Metro Vancouve...","(49.2705588, -123.0679417)"
5,"Hastings-Sunrise, Vancouver","(Hastings-Sunrise, Vancouver, Metro Vancouver ...","(49.2775935, -123.0439199)"
6,"Kensington-Cedar Cottage, Vancouver","(Kensington-Cedar Cottage, Vancouver, Metro Va...","(49.2476321, -123.0842067)"
7,"Kerrisdale, Vancouver","(Kerrisdale, Vancouver, Metro Vancouver Region...","(49.2346728, -123.1553893)"
8,"Killarney, Vancouver","(Killarney, Vancouver, Metro Vancouver Regiona...","(49.2242738, -123.0462504)"
9,"Kitsilano, Vancouver","(Kitsilano, Vancouver, Metro Vancouver Regiona...","(49.2694099, -123.155267)"


In [47]:
df_vancouver[['Latitude', 'Longitude']] = df_vancouver['Point'].apply(pd.Series)
df_vancouver

Unnamed: 0,Neighborhood,Location,Point,Latitude,Longitude
0,"Arbutus Ridge, Vancouver","(Arbutus Ridge, Vancouver, Metro Vancouver Reg...","(49.2409677, -123.1670008)",49.240968,-123.167001
1,"Downtown, Vancouver","(Downtown, Vancouver, Metro Vancouver Regional...","(49.283393, -123.1174563)",49.283393,-123.117456
2,"Dunbar-Southlands, Vancouver","(Dunbar-Southlands, Vancouver, Metro Vancouver...","(49.2534601, -123.1850439)",49.25346,-123.185044
3,"Fairview, Vancouver","(Fairview, Vancouver, Metro Vancouver Regional...","(49.2641128, -123.1268352)",49.264113,-123.126835
4,"Grandview-Woodland, Vancouver","(Grandview-Woodland, Vancouver, Metro Vancouve...","(49.2705588, -123.0679417)",49.270559,-123.067942
5,"Hastings-Sunrise, Vancouver","(Hastings-Sunrise, Vancouver, Metro Vancouver ...","(49.2775935, -123.0439199)",49.277594,-123.04392
6,"Kensington-Cedar Cottage, Vancouver","(Kensington-Cedar Cottage, Vancouver, Metro Va...","(49.2476321, -123.0842067)",49.247632,-123.084207
7,"Kerrisdale, Vancouver","(Kerrisdale, Vancouver, Metro Vancouver Region...","(49.2346728, -123.1553893)",49.234673,-123.155389
8,"Killarney, Vancouver","(Killarney, Vancouver, Metro Vancouver Regiona...","(49.2242738, -123.0462504)",49.224274,-123.04625
9,"Kitsilano, Vancouver","(Kitsilano, Vancouver, Metro Vancouver Regiona...","(49.2694099, -123.155267)",49.26941,-123.155267


In [48]:
df_vancouver = df_vancouver.drop(['Location', 'Point'], axis=1)
df_vancouver.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Arbutus Ridge, Vancouver",49.240968,-123.167001
1,"Downtown, Vancouver",49.283393,-123.117456
2,"Dunbar-Southlands, Vancouver",49.25346,-123.185044
3,"Fairview, Vancouver",49.264113,-123.126835
4,"Grandview-Woodland, Vancouver",49.270559,-123.067942


In [49]:
address = 'Vancouver, Canada'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Vancouver are 49.2608724, -123.1139529.


#### B.2. Vancouver Neighborhood Map

In [56]:
map_vancouver = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_vancouver['Latitude'], df_vancouver['Longitude'], df_vancouver['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_vancouver)  
    
map_vancouver

#### B.3. Vancouver Venues from Foursquare

In [51]:
df_vancouvervenues = getNearbyVenues(names=df_vancouver['Neighborhood'],
                                   latitudes=df_vancouver['Latitude'],
                                   longitudes=df_vancouver['Longitude']
                                  )

Arbutus Ridge, Vancouver
Downtown, Vancouver
Dunbar-Southlands, Vancouver
Fairview, Vancouver
Grandview-Woodland, Vancouver
Hastings-Sunrise, Vancouver
Kensington-Cedar Cottage, Vancouver
Kerrisdale, Vancouver
Killarney, Vancouver
Kitsilano, Vancouver
Marpole, Vancouver
Mount Pleasant, Vancouver
Oakridge, Vancouver
Renfrew-Collingwood, Vancouver
Riley Park, Vancouver
Shaughnessy, Vancouver
South Cambie, Vancouver
Strathcona, Vancouver
Sunset, Vancouver
Victoria-Fraserview, Vancouver
West End, Vancouver
West Point Grey, Vancouver


In [52]:
df_vancouvervenues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Arbutus Ridge, Vancouver",5,5,5,5,5,5
"Downtown, Vancouver",100,100,100,100,100,100
"Dunbar-Southlands, Vancouver",5,5,5,5,5,5
"Fairview, Vancouver",26,26,26,26,26,26
"Grandview-Woodland, Vancouver",70,70,70,70,70,70
"Hastings-Sunrise, Vancouver",14,14,14,14,14,14
"Kensington-Cedar Cottage, Vancouver",16,16,16,16,16,16
"Kerrisdale, Vancouver",38,38,38,38,38,38
"Killarney, Vancouver",4,4,4,4,4,4
"Kitsilano, Vancouver",44,44,44,44,44,44


In [55]:
print('There are {} uniques categories.'.format(len(df_vancouvervenues['Venue Category'].unique())))

There are 155 uniques categories.


In [53]:
# one hot encoding
vancouver_onehot = pd.get_dummies(df_vancouvervenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vancouver_onehot['Neighborhood'] = df_vancouvervenues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vancouver_onehot.columns[-1]] + list(vancouver_onehot.columns[:-1])
vancouver_onehot = vancouver_onehot[fixed_columns]

vancouver_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Track,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Women's Store,Yoga Studio
0,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
1,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Arbutus Ridge, Vancouver",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### B.4. Vancouver Neighborhoods Grouped with the Frequency Mean of Each Venue Category

In [54]:
vancouver_grouped = vancouver_onehot.groupby('Neighborhood').mean().reset_index()
vancouver_grouped

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Track,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Women's Store,Yoga Studio
0,"Arbutus Ridge, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Downtown, Vancouver",0.02,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,...,0.01,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.01
2,"Dunbar-Southlands, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Fairview, Vancouver",0.0,0.0,0.0,0.076923,0.0,0.0,0.038462,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0
4,"Grandview-Woodland, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.028571,...,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.014286,0.0,0.0
5,"Hastings-Sunrise, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0
6,"Kensington-Cedar Cottage, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0
7,"Kerrisdale, Vancouver",0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,...,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0
8,"Killarney, Vancouver",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
9,"Kitsilano, Vancouver",0.045455,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.068182,...,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727
