## Capstone Project - The Battle of Neighborhoods (IBM Applied Data Science) Week 1
### **The Business Problem: Vending machines stations carrying PPE products**

### *Introduction*:
### Audience for this project would be vending machine provider companies and PPE producers so they could plan to deliver vending machines carrying PPE products (Personal Protective Equipment) like masks, gloves and sanitizers to the city of Vancouver in British Columbia.

### In this project I will be working on the location data of Vancouver city targetting best neighborhoods and pin points in the city which are the best places to set up and install vending machines stations.

### As the year 2020 is a whole different year due to the COVID-19 pandemics,  PPE is thriving and the market is very financially rewarding.

### *Goal of the project:*
### The goal of the project is to find busy and populated areas with lots of venues around by using Foursquare API and Google location data API to guarantee massive sales and a profitable business plan in terms of PPE products. 
### The potential customers of these products will be the people visiting a venue which in most cases they need to wear a mask, use sanitizer gel and/or preferebly using disposable gloves which will be carried by the vending machines.

### *Data:*
### The data will be used for this project coming from different sources like getting the neighborhoods and postal codes of Vancouver from Wikipedia and using Google location data API for Lattitude/Longitude of neighborhoods, then by applying Foursquare API I will get details about the venues available in different part of the city. The main idea is to find the best locations and pinpoints by clustering Vancouver city and find 10-20 start-up points for setting up PPE vending machines stations.


## PART 1 : DATA

In [1]:
# Libraries and settings

import numpy as np # To handle data in a vectorized manner
import pandas as pd # Data analsysis

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # To Handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # To convert an address into latitude and longitude values

import requests # To Handle requests
from pandas.io.json import json_normalize # To tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # Map rendering

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
! jupyter trust TorontoSegmentingClustering.ipynb

Notebook already signed: TorontoSegmentingClustering.ipynb


### -   Let's get the Vancouver city neighbourhoods' data by scraping a wikipedia page
### and transform it into a pandas dataframe:

In [4]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V"
dfV = pd.read_html(url)[0]
dfV

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,V1AKimberley,V2APenticton,V3ALangley Township(Langley City),V4ASurreySouthwest,V5ABurnaby(Government Road / Lake City / SFU /...,V6AVancouver(Strathcona / Chinatown / Downtown...,V7ARichmondSouth,V8APowell River,V9AVictoria(Vic West / Esquimalt)Canadian Forc...
1,V1BVernonEast,V2BKamloopsNorthwest,V3BPort CoquitlamCentral,V4BWhite Rock,V5BBurnaby(Parkcrest-Aubrey / Ardingley-Sprott),V6BVancouver(NE Downtown / Gastown / Harbour C...,V7BRichmond(Sea Island / YVR),V8BSquamish,V9BVictoria(West Highlands / North Langford / ...
2,V1CCranbrook,V2CKamloopsCentral and Southeast,V3CPort CoquitlamSouth,V4CDeltaNortheast,V5CBurnaby(Burnaby Heights / Willingdon Height...,V6CVancouver(Waterfront / Coal Harbour / Canad...,V7CRichmondNorthwest,V8CKitimat,V9CVictoria(Colwood / South Langford / Metchosin)
3,V1ESalmon Arm,V2EKamloopsSouth and West,V3ECoquitlamNorth,V4EDeltaEast,V5EBurnaby(Lakeview-Mayfield / Richmond Park /...,V6EVancouver(SE West End / Davie Village),V7ERichmondSouthwest,V8EWhistler,V9EVictoria(East Highlands / NW Saanich)
4,V1GDawson Creek,V2GWilliams Lake,V3GAbbotsfordEast,V4GDeltaEast Central,V5GBurnaby(Cascade-Schou / Douglas-Gilpin),V6GVancouver(NW West End / Stanley Park),V7GNorth Vancouver (district municipality)Oute...,V8GTerrace,V9GLadysmith
5,V1HVernonWest,V2HKamloopsNorth,V3HPort Moody,V4HNot assigned,V5HBurnaby(Maywood / Marlborough / Oakalla / W...,V6HVancouver(West Fairview / Granville Island ...,V7HNorth Vancouver (district municipality)Inne...,V8HNot assigned,V9HCampbell RiverOutskirts
6,V1JFort St. John,V2JQuesnel,V3JCoquitlamNorth,V4JNot assigned,V5JBurnaby(Suncrest / Sussex-Nelson / Clinton-...,V6JVancouver(NW Shaughnessy / East Kitsilano /...,V7JNorth Vancouver (district municipality)East...,V8JPrince Rupert,V9JCourtenayNorthern Outskirts
7,V1KMerritt,V2KPrince GeorgeNorth,V3KCoquitlamSouth,V4KDeltaNorthwest,V5KVancouver(North Hastings-Sunrise),V6KVancouver(Central Kitsilano / Greektown),V7KNorth Vancouver (district municipality)Nort...,V8KSalt Spring Island,V9KQualicum Beach
8,V1LNelson,V2LPrince GeorgeEast Central,V3LNew WestminsterNortheast,V4LDeltaSoutheast,V5LVancouver(North Grandview-Woodland),V6LVancouver(NW Arbutus Ridge / NE Dunbar-Sout...,V7LNorth Vancouver (city)South Central,V8LSidney(North Saanich / YYJ),V9LDuncan
9,V1MLangley TownshipNorth,V2MPrince GeorgeWest Central,V3MNew WestminsterSouthwest(Includes Annacis I...,V4MDeltaSouthwest,V5MVancouver(South Hastings-Sunrise / North Re...,V6MVancouver(South Shaughnessy / NW Oakridge /...,V7MNorth Vancouver (city)Southwest Central,V8MCentral Saanich,V9MComox


### -  Data Wrangling

In [5]:
i = 1
dfVan = pd.DataFrame(list(dfV[0].apply(lambda x: (x[:3], x[3:]))))
for i in range(1, 9):
        dfVan = dfVan.append(pd.DataFrame(list(dfV[i].apply(lambda x: (x[:3], x[3:])))), ignore_index=True)
        i += 1
dfVan

Unnamed: 0,0,1
0,V1A,Kimberley
1,V1B,VernonEast
2,V1C,Cranbrook
3,V1E,Salmon Arm
4,V1G,Dawson Creek
5,V1H,VernonWest
6,V1J,Fort St. John
7,V1K,Merritt
8,V1L,Nelson
9,V1M,Langley TownshipNorth


In [6]:
dfVan = dfVan.rename(columns={0: 'postal_code', 1: 'Neighborhood'})
dfVan

Unnamed: 0,postal_code,Neighborhood
0,V1A,Kimberley
1,V1B,VernonEast
2,V1C,Cranbrook
3,V1E,Salmon Arm
4,V1G,Dawson Creek
5,V1H,VernonWest
6,V1J,Fort St. John
7,V1K,Merritt
8,V1L,Nelson
9,V1M,Langley TownshipNorth


In [7]:
dfVan.shape

(180, 2)

### - There are 180 Neighborhoods available which I'm going to work on the dataframe.

### - Now, let's get the Latitude and Longitude of each neighborhood by using Google location data API.

In [9]:
!conda install -c conda-forge geocoder  --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [10]:

API_KEY = 'AIzaSyDCeV9aKAM9x2Kqmb5gFlJdhac2zQFjcP8' # Google location data API 
import json

latitudes = [] # latitude array
longitudes = [] # longitude array

for nbhd in dfVan["Neighborhood"] : 
    place_name = nbhd + ",Vancouver,British Columbia" # forming the place location name
    url = 'https://maps.googleapis.com/maps/api/geocode/json?address={}&key={}'.format(place_name, API_KEY) # Getting the url to make the API call
    obj = json.loads(requests.get(url).text) # loading the JSON file in form of dictionary
    
    results = obj['results'] # extracting the results out of the JSON file
    lat = results[0]['geometry']['location']['lat'] # extracting the latitude value
    lng = results[0]['geometry']['location']['lng'] # extracting the longitude value
    
    latitudes.append(lat) # appending to the list of latitude array
    longitudes.append(lng) # appending to the list of longitude array
    

### - let's check out the dataframe of Vancouver's neighborhoods including Latitude and Longitude data

In [11]:
dfVan['Latitude'] = latitudes
dfVan['Longitude'] = longitudes
dfVan

Unnamed: 0,postal_code,Neighborhood,Latitude,Longitude
0,V1A,Kimberley,49.665157,-115.996721
1,V1B,VernonEast,49.277497,-123.079283
2,V1C,Cranbrook,49.512968,-115.7694
3,V1E,Salmon Arm,50.700103,-119.283844
4,V1G,Dawson Creek,49.222747,-123.048877
5,V1H,VernonWest,49.277497,-123.079283
6,V1J,Fort St. John,56.252423,-120.846409
7,V1K,Merritt,50.111308,-120.786222
8,V1L,Nelson,49.283155,-123.129024
9,V1M,Langley TownshipNorth,49.300423,-123.029057


In [12]:
dfVan.shape

(180, 4)

### - The data is ready to use. In following sections (Week 2) I'm going to apply Foursquare API to explore the neighborhoods of Vancouver. Working with the location data will result in finding the best pinpoints for vending machines stations.