<a href="https://colab.research.google.com/github/jade-lam/Coursera_Capstone/blob/master/Coursera_Capstone_Project_The_Battle_of_Neighbourhoods_(Week_1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h2><b>Capstone Project - The Battle of Neighbourhoods</b></h2>
<h5>This repository is the Capstone Project towards the IBM Data Science Professional Certification. 
This project aims to provide insights on locations to setup a food delivery business in Toronto, by examining Toronto's business and demographic profile.
<br>

Demographic data are obtained from the City of Toronto census data, and the restaurants business profile will be obtained via Foursquare API. K-means clustering and choropleth maps will be used to present the potential location options. </h5>

**Part A: Understanding Toronto's Neighbourhood Data**

Data sources to examine: Toronto neighbourhood by zip code, neighbourhood population from Toronto's Census Data

In [6]:
# importing required libraries
import pandas as pd 
import numpy as np 
import json 
import bs4

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from bs4 import BeautifulSoup # import BeautifulSoup for web scraping

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium 

from zipfile import ZipFile

1) Obtaining Toronto Zip Code Data

In [7]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
r = requests.get(url)

# passing neighbourhood table into dataframe
content = pd.read_html(r.text)
alldata = content[0]

# drop data in Borough that is "Not Assigned"
dataexclna = alldata[alldata["Borough"]!="Not assigned"]

# merge neighborhood with same post code
cdata = dataexclna.groupby(['Postal Code','Borough'], sort=False).agg(', '.join)
cdata.reset_index(inplace=True)

cdata.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


2) Obtaining Zip Code Latitutde, Longtitude and merge with Zip Code

In [8]:
# examine the latitude and longitude data file
latlongurl = 'https://cocl.us/Geospatial_data'
latlong = pd.read_csv(latlongurl)

latlong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
# merge neighborhood data with longlat dataset by Postal Code
tgeodata = pd.merge(left=cdata, right=latlong, how='left', left_on='Postal Code', right_on='Postal Code')
tgeodata.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


3) Obtain Population Data, Merge with Zip Code Data

In [10]:
rawdata = pd.read_csv("CA Pop Data.csv")
popdata = rawdata.rename(columns={'Population 2016': 'Population', 'Geographic code': 'Postal Code'})
del popdata["Geographic name"]
del popdata["Province or territory"]

popdata.head()

Unnamed: 0,Postal Code,Population,Total private dwellings 2016,Private dwellings occupied by usual residents 2016
0,A0A,46587,26155,19426
1,A0B,19792,13658,8792
2,A0C,12587,8010,5606
3,A0E,22294,12293,9603
4,A0G,35266,21750,15200


In [11]:
# merge neighborhood data with population data
torontodata = pd.merge(left=cdata, right=popdata, how='left', left_on='Postal Code', right_on='Postal Code')

torontodata.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Population,Total private dwellings 2016,Private dwellings occupied by usual residents 2016
0,M3A,North York,Parkwoods,34615.0,13847.0,13241.0
1,M4A,North York,Victoria Village,14443.0,6299.0,6170.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",41078.0,24186.0,22333.0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",21048.0,8751.0,8074.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",10.0,6.0,5.0


In [12]:
#population data breakdown by neighbourhoods (from CA Census Data)

hood_data = pd.read_csv("Toronto Census Data.csv")
hood_data.head()

Unnamed: 0,City of Toronto,Neighbourhood Number,Population,Average Household Income
0,Agincourt North,129,29113,25005
1,Agincourt South-Malvern West,128,23757,20400
2,Alderwood,20,12054,10265
3,Annex,95,30526,26295
4,Banbury-Don Mills,42,27695,23410


4) Toronto Neighbourhood's Population & Income Profile in Choropleth Map

In [13]:
#get toronto geojson file from adamw523
import urllib.request
import zipfile

zipurl = 'https://github.com/adamw523/toronto-geojson/zipball/master'

urllib.request.urlretrieve(zipurl, filename = 'geojsondata.zip')
zipfile.ZipFile('geojsondata.zip').extractall()

TorontoJSON = 'adamw523-toronto-geojson-3b02b53/simple.geojson'

with open(TorontoJSON) as f: 
    geodata = json.load(f)
    
geodata['features'][0]

{'geometry': {'coordinates': [[[-79.40428280044927, 43.64797961606815],
    [-79.403956753622, 43.64718271074494],
    [-79.42236786578222, 43.643467621011894],
    [-79.42640543946513, 43.65360764326518],
    [-79.41868792113178, 43.65521730993704],
    [-79.41769878521191, 43.65524323486715],
    [-79.41514736685951, 43.65496322517198],
    [-79.40767889826175, 43.65646442447146],
    [-79.40428280044927, 43.64797961606815]]],
  'type': 'Polygon'},
 'properties': {'CSDUID': '3520005',
  'DAUID': '35200879',
  'FULLHOOD': 'Trinity-Bellwoods (81)',
  'HOOD': 'Trinity-Bellwoods',
  'HOODNUM': 81,
  'PRUID': '35'},
 'type': 'Feature'}

In [14]:
#Toronto's Population Density Profile

map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=11)

folium.Choropleth(
    geo_data=geodata,
    data = hood_data,
    columns=['Neighbourhood Number','Population'],
    name = 'choropleth',
    key_on='feature.properties.HOODNUM',
    fill_color='PuBu',
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Tornto Population Density by Neighbourhood').add_to(map_toronto)
    
map_toronto

In [15]:
#Toronto's Household Income Profile

map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=11)

folium.Choropleth(
    geo_data=geodata,
    data = hood_data,
    columns=['Neighbourhood Number','Average Household Income'],
    name = 'choropleth',
    key_on='feature.properties.HOODNUM',
    fill_color='PuBuGn',
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Toronto Population Income Profile by Neighbourhood').add_to(map_toronto)
    
map_toronto

Looking at the population and income map, high income areas are seemingly coinciding the most densely populated areas.