
<h2 align=center>Segmenting and Exploring specific areas of the city of Vancouver</h2>
<h4 align=center>Second Part Capstone Project - The Battle of Neighborhoods</h4>


### 1. Introduction

The city of Vancouver is part of the province of British Columbia and has the highest population density in Canada, with more than 5,400 people per square kilometer. It is one of the cities with the greatest ethnic and linguistic diversity in Canada, with 52% of the population speaking a first language other than English. With its panoramic views, mild climate and friendly people, Vancouver was one of the venues for famous international events such as the 2010 Winter Olympics and the 2010 Winter Paralympic Games.

### 2.   Business Understanding

According to the official page of the city of Vancouver, the city consists of a number of smaller neighborhoods and communities, where these neighborhoods being divided into 22 distinct areas. We will focus only on these 22 neighborhoods.

Our client is interested in exploring the neighborhood with more different places and from there, exploring the best hotels and restaurants  with the best reviews during their stay in Vancouver. We will also find Italian restaurant options in the region, in case he is interested.

Therefore, our main objective is to explore the neighborhoods of the city of Vancouver trying to find the most diverse and from there to explore the best places to visit. To do this, we will initially use the k-means clustering algorithm to segment neighborhoods according to the most common places, using the resources of the Foursquare API, to collect information from the various places located in Vancouver. We will use the map feature (folium) to better visualize the results.


### 3.   Data Understanding / Preparation

Before we start collecting the data necessary for our study, let's run the list of libraries used in this study:

In [None]:
#!conda install scikit-learn==0.20
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np # library to handle data in a vectorized manner
import requests
import zipfile as zp

# Elbow Method for K means
try:
  from yellowbrick.cluster import KElbowVisualizer
except:
  !pip install -U scikit-learn
  !pip install -U yellowbrick
  from yellowbrick.cluster import KElbowVisualizer
#
import json # library to handle JSON files
from pandas import json_normalize 
#
try:
  from geopy.geocoders import Nominatim
except:
  !conda install -c conda-forge geopy --yes 
  from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#
import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe
#
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
#
# import k-means from clustering stage
from sklearn.cluster import KMeans
#
try:
  import folium
except:
  !conda install -c conda-forge folium=0.5.0 --yes
  import folium # map rendering library
#    
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
import sklearn.utils    
from sklearn.metrics import silhouette_score
from itertools import product

    
print('Libraries imported.')

First of all, let's manually get the coordinates of the 22 neighborhoods in the city of Vancouver since I couldn't find this information already in a table. For this, I collected the latitude and longitude of the 22 neighborhoods (Wikipedia) and consolidated it into a table, which we can obtain as shown below:


In [2]:
url = 'https://raw.githubusercontent.com/expressosub/sharing_notebook/main/geocode_vancouver.csv'
vancouver_geo = pd.read_csv(url)


#print size of data
print(vancouver_geo.shape)
vancouver_geo


(22, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Arbutus Ridge,49.2571,-123.1662
1,Downtown,49.284167,-123.121111
2,Dunbar-Southlands,49.25,-123.185
3,Fairview,49.264,-123.13
4,Grandview-Woodland,49.275,-123.067
5,Hastings-Sunrise,49.281126,-123.04407
6,Kensington-Cedar Cottage,49.25,-123.06667
7,Kerrisdale,49.216667,-123.15
8,Killarney,49.223,-123.039
9,Kitsilano,49.266667,-123.166667


Now we’re going to put all of these neighborhoods on a centralized map of Vancouver. Let's use geopy library to get the latitude and longitude values of Vancouver. But you can also do a simple google search and find these coordinates easily.

In [3]:
address = 'Vancouver, BC'

geolocator = Nominatim(user_agent="vc_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vancouver are 49.2608724, -123.1139529.


We will use the Folium library to overlay the neighborhoods of Vancouver as points above the map.
    

In [4]:
# create map of Vancouver using latitude and longitude values
map_vancouver = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(vancouver_geo['Latitude'], vancouver_geo['Longitude'], vancouver_geo['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_vancouver)  
    
map_vancouver

Very cool . Now what we need is to collect information from each neighborhood to be able to understand and segment them into groups. We will use the Foursquare API tool, for that you need to have your credentials to be able to make your calls. As I am using a free version, limits are imposed on the data collection. You can choose to register your credit card (as I did) in order to have better access to data.

In [13]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20210119'
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


Now that we are able to make our calls on Fourquare Api, we will collect the most common places in each Vancouver neighborhood (limited within a radius of 600 meters) using the function below:

In [6]:
def getNearbyVenues(names, latitudes, longitudes, radius=600):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


vancouver_venues = getNearbyVenues(names=vancouver_geo['Neighborhood'],
                                   latitudes=vancouver_geo['Latitude'],
                                   longitudes=vancouver_geo['Longitude']
                                  )

Arbutus Ridge
Downtown
Dunbar-Southlands
Fairview
Grandview-Woodland
Hastings-Sunrise
Kensington-Cedar Cottage
Kerrisdale
Killarney
Kitsilano
Marpole
Mount Pleasant
Oakridge
Renfrew-Collingwood
Riley Park
Shaughnessy
South Cambie
Strathcona
Sunset
Victoria-Fraserview
West End
West Point Grey


In [113]:
print(vancouver_venues.shape)
vancouver_venues.head()

(645, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Arbutus Ridge,49.2571,-123.1662,Sweet Obsession Cakes & Pastries,49.257756,-123.165314,Dessert Shop
1,Arbutus Ridge,49.2571,-123.1662,Yuwa Japanese Cuisine,49.257938,-123.167884,Japanese Restaurant
2,Arbutus Ridge,49.2571,-123.1662,Carnarvon Park,49.257678,-123.171502,Baseball Field
3,Arbutus Ridge,49.2571,-123.1662,Starbucks,49.25792,-123.16821,Coffee Shop
4,Arbutus Ridge,49.2571,-123.1662,Subway,49.25805,-123.168586,Sandwich Place


Let's find out how many unique categories can be curated from all the returned venues

In [8]:
vancouver_venues.groupby('Neighborhood')['Venue Category'].agg(['nunique']).sort_values(['nunique'],ascending=False).rename(columns={'nunique':'Unique Categories Venues'})

Unnamed: 0_level_0,Unique Categories Venues
Neighborhood,Unnamed: 1_level_1
Downtown,60
West End,43
Kitsilano,43
Grandview-Woodland,39
Mount Pleasant,34
Hastings-Sunrise,30
Fairview,26
Sunset,22
Dunbar-Southlands,17
Renfrew-Collingwood,16


It seems that the Downton neighborhood has the most distinct places, followed by Kitsliano. Let's see what places and quantities our search returned to the Downtown neighborhood (the top 30 places).

In [9]:
vancouver_downtown = vancouver_venues[vancouver_venues['Neighborhood']=='Downtown']
vancouver_downtown['Venue Category'].value_counts()[0:30]

Hotel                  9
Clothing Store         5
Dessert Shop           4
Café                   4
Food Truck             4
Cosmetics Shop         3
Coffee Shop            3
Restaurant             3
Steakhouse             3
Concert Hall           3
Gym                    2
Japanese Restaurant    2
Toy / Game Store       2
Seafood Restaurant     2
Burger Joint           2
Sandwich Place         2
French Restaurant      2
Italian Restaurant     2
Donut Shop             2
Bakery                 1
Lebanese Restaurant    1
Miscellaneous Shop     1
Yoga Studio            1
Jewelry Store          1
Breakfast Spot         1
Optical Shop           1
Art Gallery            1
Hawaiian Restaurant    1
Movie Theater          1
Hot Dog Joint          1
Name: Venue Category, dtype: int64

<a id='item4'></a>