<h2><b>Capstone Project - The Battle of Neighbourhoods</b></h2>
<h5>This repository is the Capstone Project towards the IBM Data Science Professional Certification. 

**Introduction** 

---


The food delivery business segment, is a ~USD 2500m dollar business in Canada in 2020. This is a sector with an expected CAGR of 8.6%. While there is a steady growth trend for online food delivery services, bulk of the revenue is still driven by deliveries from restaurant to consumer. In terms of customer profile, 2019 data suggests that close to 50% of users are below the age of 34 years old and bulk of their income is between low/ medium income. Given the existing penetration rate is just ~25% for restaurants to consumer delivery, and below 20% for platform to consumer delivery. This illustrates there is still huge potential and appetite for delivery businesses to be setup in Canada. (Data source: https://www.statista.com/outlook/374/108/online-food-delivery/canada)


**The Business Problem**

---
Identify a city in Canada, which has the best potential for establishing a food delivery business. Based on the city selected, identify the possible food delivery service points within the city limits for this business setup. 


**Target Audience**


---



This report targets potential entreprenuers who is interested to setup a food delivery business; or restaurant/ online platform owners, who wishes to understand their food delivery customer base relative to their location. 

**The Data & Methodology**

---
To understand the possible city and service points for this delivery business, a list of data will be obtained and analyzed to generate the final recommendation:
* Canada’s population data – which will be used to identify the biggest urban cities in Canada, and to understand which city will give the business the best revenue potential
* Canada’s commercial restaurant establishment data (by city) – which will be used in conjunction with the Canada’s population data and short list a city to setup the delivery business
* Canada’s provincial census data – to understand the population density and income profiles within the chosen city limit 
* Foursquare API data – Foursquare API data will be used to understand the restaurant locations at the chosen city. This data will be used in conjunction with the city population profile to identify setup point(s)

**Statistical Approach**

---
After narrowing down to a city location, clustering approach will be used to segment the city neighborhoods into clusters. K-means clustering (an unsupervised learning approach) will be used to group neighbourhoods with similar profiles together. Based on K-means clustering result, initial service setup point will be identified.

**Libraries for this Project**




---


*  Web Scraping libraries: Beautiful Soup, Requests – to scrap web data and handle http requests
*  Pandas: to structuralize and allow data manipulation in dataframes

*  Visualization libraries: Folium & Choropleth maps to visualize the neighborhoods in clusters
* JSON & Geocoder: to handle JSON files and retrieve location data

*  Scikit Learn: for k-means clustering of the city and neighborhoods

**Part A: Understanding Canada's City Profiles**

Data sources to examine: Toronto neighbourhood by zip code, neighbourhood population from Toronto's Census Data

In [None]:
!pip install geopandas

In [3]:
# importing required libraries
import pandas as pd 
import numpy as np 
import json 
import geopandas
import bs4

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from bs4 import BeautifulSoup # import BeautifulSoup for web scraping

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import seaborn as sns

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium 

from zipfile import ZipFile

  import pandas.util.testing as tm


1) Narrowing Down a City Location for the Delivery Business
<br></br>
In this section, Canada population data will be examined, in order to find out the top 3 populated cities in Canada. Based on short listed cities, further data collection will be conducted, to identify how many restaurants are there in each of these cities.


In [4]:
url = "https://www.todocanada.ca/population-canadas-cities-2016-census/"
r = requests.get(url)

# passing neighbourhood table into dataframe
rawdata = pd.read_html(r.text)
citypop = rawdata[1]

#replacing header with top row
citypop.columns = citypop.iloc[0]
citypop = citypop[1:4]

citypop

Unnamed: 0,Position,CSD — City/Municipality/Town,Population
1,1,"Toronto, Ont.",2731579
2,2,"Montreal, Que.",1704694
3,3,"Calgary, Alta.",1239220


How many restaurants in each of the top 3 cities:
<br></br>


*   Toronto - est. 7500 restaurants (source from: [Toronto.ca](https://www.toronto.ca/311/knowledgebase/kb/docs/articles/economic-development-and-culture/program-support/number-of-restaurants-in-toronto.html#:~:text=Approximately%207%2C500.))
*   Montreal - est. 7182 restaurants (source from: [Montreal.ca](http://ville.montreal.qc.ca/portal/page?_pageid=6897,67887857&_dad=portal&_schema=PORTAL))
*   Calgery - est. 3360 restaurants (as city govt data is not available, this is obtained from listings on [Calgery TripAdvisor](https://en.tripadvisor.com.hk/Restaurants-g154913-Calgary_Alberta.html))


In [5]:
#combine restaurant data with dataframe
rest = [7500,7182,3360]
for i in range(0, len(rest)): 
    rest[i] = float(rest[i]) 

pop = citypop["Population"].tolist()
for i in range(0, len(pop)): 
    pop[i] = float(pop[i]) 

restaurants = pd.DataFrame({"Restaurants": [0, 7500, 7182,3360]})
citypoprest = pd.merge(citypop, restaurants, left_index=True, right_index=True)
del citypoprest["Position"]

citypoprest

Unnamed: 0,CSD — City/Municipality/Town,Population,Restaurants
1,"Toronto, Ont.",2731579,7500
2,"Montreal, Que.",1704694,7182
3,"Calgary, Alta.",1239220,3360


**Results - Part A**

---

Based on the data collected, it is apparent that Toronto is the city with most populations, with Montreal trailing in second place but with a million less in population. Given both cities has a similar amount of restaurants, this means a restaurant will be serving more customers in Toronto than Montreal. ***Therefore, having the delivery services to setup in Toronto will likely have the biggest revenue potential.***

**Part B: Understanding Toronto's Neighbourhood Data**

Data sources to examine: Toronto neighbourhood by zip code, neighbourhood population from Toronto's Census Data. Latest census data was from year 2016. Source from: www.toronto.ca

1) Obtaining Toronto Zip Code Data - to understand neighbourhoods in Toronto

In [6]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
r = requests.get(url)

# passing neighbourhood table into dataframe
content = pd.read_html(r.text)
alldata = content[0]

# drop data in Borough that is "Not Assigned"
dataexclna = alldata[alldata["Borough"]!="Not assigned"]

# merge neighborhood with same post code
cdata = dataexclna.groupby(['Postal Code','Borough'], sort=False).agg(', '.join)
cdata.reset_index(inplace=True)

cdata.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


2) Obtaining Zip Code Latitutde, Longtitude and merge with Zip Code

In [7]:
# examine the latitude and longitude data file
latlongurl = 'https://cocl.us/Geospatial_data'
latlong = pd.read_csv(latlongurl)

latlong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


3) Merge Zip Code with Latitude and Longtitude

In [8]:
# merge neighborhood data with longlat dataset by Postal Code
tgeodata = pd.merge(left=cdata, right=latlong, how='left', left_on='Postal Code', right_on='Postal Code')
tgeodata.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


4) Obtain Population Data, Merge with Zip Code Data

In [9]:
rawdata = pd.read_csv("CA Pop Data.csv")
popdata = rawdata.rename(columns={'Population 2016': 'Population', 'Geographic code': 'Postal Code'})
del popdata["Geographic name"]
del popdata["Province or territory"]

popdata.head()

Unnamed: 0,Postal Code,Population,Total private dwellings 2016,Private dwellings occupied by usual residents 2016
0,A0A,46587,26155,19426
1,A0B,19792,13658,8792
2,A0C,12587,8010,5606
3,A0E,22294,12293,9603
4,A0G,35266,21750,15200


In [10]:
# further segment and merge neighborhood data with population data
torontodata = pd.merge(left=cdata, right=popdata, how='left', left_on='Postal Code', right_on='Postal Code')

torontodata.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Population,Total private dwellings 2016,Private dwellings occupied by usual residents 2016
0,M3A,North York,Parkwoods,34615.0,13847.0,13241.0
1,M4A,North York,Victoria Village,14443.0,6299.0,6170.0
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",41078.0,24186.0,22333.0
3,M6A,North York,"Lawrence Manor, Lawrence Heights",21048.0,8751.0,8074.0
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",10.0,6.0,5.0


In [11]:
#population data breakdown by neighbourhoods (from CA Census Data)

hood_data = pd.read_csv("Toronto Census Data.csv")
hood_data.head()

Unnamed: 0,City of Toronto,Neighbourhood Number,Population,Average Household Income
0,Agincourt North,129,29113,25005
1,Agincourt South-Malvern West,128,23757,20400
2,Alderwood,20,12054,10265
3,Annex,95,30526,26295
4,Banbury-Don Mills,42,27695,23410


5) Putting Toronto Neighbourhood's Population & Income Profile in Choropleth Map

In [12]:
#get toronto geojson file from adamw523
import urllib.request
import zipfile

zipurl = 'https://github.com/adamw523/toronto-geojson/zipball/master'

urllib.request.urlretrieve(zipurl, filename = 'geojsondata.zip')
zipfile.ZipFile('geojsondata.zip').extractall()

TorontoJSON = 'adamw523-toronto-geojson-3b02b53/simple.geojson'

with open(TorontoJSON) as f: 
    geodata = json.load(f)
    
geodata['features'][0]

{'geometry': {'coordinates': [[[-79.40428280044927, 43.64797961606815],
    [-79.403956753622, 43.64718271074494],
    [-79.42236786578222, 43.643467621011894],
    [-79.42640543946513, 43.65360764326518],
    [-79.41868792113178, 43.65521730993704],
    [-79.41769878521191, 43.65524323486715],
    [-79.41514736685951, 43.65496322517198],
    [-79.40767889826175, 43.65646442447146],
    [-79.40428280044927, 43.64797961606815]]],
  'type': 'Polygon'},
 'properties': {'CSDUID': '3520005',
  'DAUID': '35200879',
  'FULLHOOD': 'Trinity-Bellwoods (81)',
  'HOOD': 'Trinity-Bellwoods',
  'HOODNUM': 81,
  'PRUID': '35'},
 'type': 'Feature'}

In [13]:
#manipulating geojson file with geopandas
geodf = geopandas.read_file(TorontoJSON)
geodf.head(3)

Unnamed: 0,DAUID,PRUID,CSDUID,HOODNUM,HOOD,FULLHOOD,geometry
0,35200879,35,3520005,81,Trinity-Bellwoods,Trinity-Bellwoods (81),"POLYGON ((-79.40428 43.64798, -79.40396 43.647..."
1,35201763,35,3520005,1,West Humber-Clairville,West Humber-Clairville (1),"POLYGON ((-79.56668 43.71179, -79.55673 43.714..."
2,35201852,35,3520005,2,Mount Olive-Silverstone-Jamestown,Mount Olive-Silverstone-Jamestown (2),"POLYGON ((-79.57825 43.73552, -79.57739 43.733..."


In [14]:
#combine geopandas data with hood_data
geodfc = pd.read_csv('geopd.csv')
tgeodata = pd.merge(left=hood_data, right=geodfc, how='left', left_on='Neighbourhood Number', right_on='HOODNUM')
tgeodata.drop(columns=['Unnamed: 0', 'DAUID', 'PRUID', 'CSDUID', 'HOODNUM', 'HOOD', 'FULLHOOD', 'Unnamed: 7'], inplace=True)
tgeodata.head()

Unnamed: 0,City of Toronto,Neighbourhood Number,Population,Average Household Income,Lat,Long
0,Agincourt North,129,29113,25005,43.809923,-79.257681
1,Agincourt South-Malvern West,128,23757,20400,43.789579,-79.237056
2,Alderwood,20,12054,10265,43.606185,-79.52554
3,Annex,95,30526,26295,43.671294,-79.38718
4,Banbury-Don Mills,42,27695,23410,43.723154,-79.33015


In [15]:
#Toronto's Population Density Profile

map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=11)

folium.Choropleth(
    geo_data=geodata,
    data = tgeodata,
    columns=['Neighbourhood Number','Population'],
    name = 'choropleth',
    key_on='feature.properties.HOODNUM',
    fill_color='PuBu',
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Tornto Population Density by Neighbourhood').add_to(map_toronto)
    
map_toronto

In [16]:
#Toronto's Household Income Profile

map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=11)

folium.Choropleth(
    geo_data=geodata,
    data = tgeodata,
    columns=['Neighbourhood Number','Average Household Income'],
    name = 'choropleth',
    key_on='feature.properties.HOODNUM',
    fill_color='PuBuGn',
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Toronto Population Income Profile by Neighbourhood').add_to(map_toronto)
    
map_toronto

**Results - Part B**

---

Based on the data collected, it is apparent that Toronto is the city with most populations, with Montreal trailing in second place but with a million less in population. Given both cities has a similar amount of restaurants, this means a restaurant will be serving more customers in Toronto than Montreal. ***Therefore, having the delivery services to setup in Toronto will likely have the biggest revenue potential.***

**Part C: Understanding Toronto's Restaurant Presence**

This section aims to look at restaurants around Toronto neighbourhoods, by examining venues from the FourSquare API.
<br></br>
Venues will be retrieved from FourSquare API based on each neighbourhood's latitude and longtitude. Objective is to idenitfy how many restaurants are there in each neighbourhood, and identify possible locations to setup a delivery business operations.

1) Obtianing Toronto city's restuarants locations from Foursquare API

In [17]:
# foursquare credentials
CLIENT_ID = "EESQMKNSKZFWWWIRB0T5G4PQZ2CWFOIZRLY340YPCSNWKHWD"
CLIENT_SECRET = "UKUQTDKAQQWGX0GFFG0EBOVUKOWMB3A4D3O1RKFCSTB2OVKV"
VERSION="20200808"

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
LIMIT = 200
venues = getNearbyVenues(names=tgeodata['City of Toronto'],
                                   latitudes=tgeodata['Lat'],
                                   longitudes=tgeodata['Long']
                                  )

Agincourt North
Agincourt South-Malvern West
Alderwood
Annex
Banbury-Don Mills
Bathurst Manor
Bay Street Corridor
Bayview Village
Bayview Woods-Steeles
Bedford Park-Nortown
Beechborough-Greenbrook
Bendale
Birchcliffe-Cliffside
Black Creek
Blake-Jones
Briar Hill-Belgravia
Bridle Path-Sunnybrook-York Mills
Broadview North
Brookhaven-Amesbury
Cabbagetown-South St. James Town
Caledonia-Fairbank
Casa Loma
Centennial Scarborough
Church-Yonge Corridor
Clairlea-Birchmount
Clanton Park
Cliffcrest
Corso Italia-Davenport
Danforth
Danforth East York
Don Valley Village
Dorset Park
Dovercourt-Wallace Emerson-Junction
Downsview-Roding-CFB
Dufferin Grove
East End-Danforth
Edenbridge-Humber Valley
Eglinton East
Elms-Old Rexdale
Englemount-Lawrence
Eringate-Centennial-West Deane
Etobicoke West Mall
Flemingdon Park
Forest Hill North
Forest Hill South
Glenfield-Jane Heights
Greenwood-Coxwell
Guildwood
Henry Farm
High Park North
High Park-Swansea
Highland Creek
Hillcrest Village
Humber Heights-Westmount
Hu

2) Understanding the Venue Profiles based on Neighbourhoods, filtering results down to restaurants per Neighbourhood

In [20]:
#finding the venues for each neighbourhood
print(venues.shape)
venues.head()

(11112, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agincourt North,43.809923,-79.257681,Samosa King - Embassy Restaurant,43.810152,-79.257316,Indian Restaurant
1,Agincourt North,43.809923,-79.257681,Menchie's,43.808338,-79.268288,Frozen Yogurt Shop
2,Agincourt North,43.809923,-79.257681,Saravanaa Bhavan South Indian Restaurant,43.810117,-79.269275,Indian Restaurant
3,Agincourt North,43.809923,-79.257681,Fahmee Bakery & Jamaican Foods,43.81017,-79.280113,Caribbean Restaurant
4,Agincourt North,43.809923,-79.257681,Honey B Hives Restaurant,43.822722,-79.248259,Burger Joint


Filtering has been done based on Venue Category showing it is either a "Restaurant" or a "Joint" (e.g. Burger Joint). Reviewing the data, some food types are just not suitable for deliveries (e.g. from Frozen Yoghurt/ Ice Cream shops) 

In [21]:
#Filtering for venues that is restaurant or a joint
venues["Restaurant"] = venues["Venue Category"].str.contains('Restaurant|Joint') 
venues.head(3)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Restaurant
0,Agincourt North,43.809923,-79.257681,Samosa King - Embassy Restaurant,43.810152,-79.257316,Indian Restaurant,True
1,Agincourt North,43.809923,-79.257681,Menchie's,43.808338,-79.268288,Frozen Yogurt Shop,False
2,Agincourt North,43.809923,-79.257681,Saravanaa Bhavan South Indian Restaurant,43.810117,-79.269275,Indian Restaurant,True


This section is to trim down how many restaurants there are in a neighbourhood.

In [22]:
rest_1 = venues[venues["Restaurant"]==True]
restperhood = rest_1.groupby("Neighborhood").count()
restperhood["Restaurant"].to_frame()

Unnamed: 0_level_0,Restaurant
Neighborhood,Unnamed: 1_level_1
Agincourt North,16
Agincourt South-Malvern West,27
Alderwood,28
Annex,26
Banbury-Don Mills,33
...,...
Wychwood,32
Yonge-Eglinton,29
Yonge-St.Clair,32
York University Heights,1


3) Putting Toronto's Restaurant Density per Neighbourhood into a Choropleth Map

In [23]:
#merge restaurant count per neighbourhood to neighbourhood data
tfullset = pd.merge(left=tgeodata, right=restperhood, how='left', left_on='City of Toronto', right_on='Neighborhood')
tfullset.drop(columns=['Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category'], inplace=True)
tfullset.head()

Unnamed: 0,City of Toronto,Neighbourhood Number,Population,Average Household Income,Lat,Long,Restaurant
0,Agincourt North,129,29113,25005,43.809923,-79.257681,16
1,Agincourt South-Malvern West,128,23757,20400,43.789579,-79.237056,27
2,Alderwood,20,12054,10265,43.606185,-79.52554,28
3,Annex,95,30526,26295,43.671294,-79.38718,26
4,Banbury-Don Mills,42,27695,23410,43.723154,-79.33015,33


In [24]:
#Showing Toronto's Restaurant Density per Neighbourhood

map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=11)

folium.Choropleth(
    geo_data=geodata,
    data = tfullset,
    columns=['Neighbourhood Number','Restaurant'],
    name = 'choropleth',
    key_on='feature.properties.HOODNUM',
    fill_color='PuBuGn',
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Toronto Restaurant density by Neighbourhood').add_to(map_toronto)
    
map_toronto

Below highlights the top 10 neighbourhood, with most restaurants and population density

In [32]:
tfullset.sort_values(['Restaurant', 'Population'], ascending=False, inplace=True)
top10 = tfullset[0:10]
top10

Unnamed: 0,Cluster Labels,City of Toronto,Neighbourhood Number,Population,Average Household Income,Lat,Long,Restaurant
112,1,Steeles,116,24623,21120,43.808989,-79.306642,47
20,0,Caledonia-Fairbank,109,9955,8420,43.687255,-79.446768,47
114,1,Tam O'Shanter-Sullivan,118,27446,23140,43.779896,-79.286629,43
44,0,Forest Hill South,101,10732,9260,43.689404,-79.409401,42
56,0,Humewood-Cedarvale,106,14365,12065,43.685792,-79.41932,41
103,0,Roncesvalles,86,14974,12565,43.641471,-79.432326,39
86,1,Newtonbrook West,36,23831,20820,43.781663,-79.415981,38
91,2,Oakwood Village,107,21210,17840,43.68076,-79.432693,38
34,0,Dufferin Grove,83,11785,10430,43.653244,-79.428139,38
43,0,Forest Hill North,102,12806,10615,43.702783,-79.417028,37


**Part D: Clustering Neighbourhoods of Similar Profile**

In this section, K-means clustering method will be deployed, to identify similarities of neighbourhoods, based on restaurant profile, in order to help further drill down to a possible location for an initial business setup.

In [27]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = tfullset.drop('City of Toronto', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 1, 0, 0, 0, 1, 2, 0, 0], dtype=int32)

In [28]:
#tfullset.drop(columns=['Cluster Labels'], inplace=True)

In [29]:
# add clustering labels
tfullset.insert(0, 'Cluster Labels', kmeans.labels_)

tfullset.head(5)

Unnamed: 0,Cluster Labels,City of Toronto,Neighbourhood Number,Population,Average Household Income,Lat,Long,Restaurant
112,1,Steeles,116,24623,21120,43.808989,-79.306642,47
20,0,Caledonia-Fairbank,109,9955,8420,43.687255,-79.446768,47
114,1,Tam O'Shanter-Sullivan,118,27446,23140,43.779896,-79.286629,43
44,0,Forest Hill South,101,10732,9260,43.689404,-79.409401,42
56,0,Humewood-Cedarvale,106,14365,12065,43.685792,-79.41932,41


In [30]:
# Choose data to plot on choropleth
data = tfullset
variable = "Restaurant"

# Create map of Toronto, ON 
hh_map = folium.Map(location = [43.653963, -79.387207], zoom_start = 11)

folium.Choropleth(
    geo_data = geodata,
    data = tfullset,
    columns = ['Neighbourhood Number', variable],
    name = 'choropleth',
    key_on = 'feature.properties.HOODNUM',
    fill_color='Blues').add_to(hh_map)

# Set color scheme for clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to cluster map 
markers_colors = []
for lat, lon, poi, cluster in zip(tfullset['Lat'], tfullset['Long'], tfullset['Restaurant'], tfullset['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster' + str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat,lon],
        radius = 5,
        popup = label,
        color = rainbow[cluster-1],
        fill = True, 
        fill_color = rainbow[cluster-1],
        fill_opacity=0.7).add_to(hh_map)

hh_map

The choropleth map showing (in red dots/ cluster 2) are neighbourhoods with most similar profile in terms of population and no of restuarants. Heaviest concentration is towards downtown Toronto area. It would also be most sensible for an the initial setup of this delivery business to work on serving Cluster 2.

**Part E: Finding a location for Business Setup**

The center of gravity method will be used to identify the best point to serve Cluster 2. Center of gravity is an approach that seeks to compute geographic coordinates for a potential single new facility (in our case, delivery serive point) that will minimize costs/ lead-times/ distances.

In [77]:
#calculating the center of gravity

c2 = tfullset[tfullset["Cluster Labels"]==2]
lat = c2["Lat"].mean()
long = c2["Long"].mean()
print("The Center of Gravity of Cluster 2 is :", str(lat), str(long))

The Center of Gravity of Cluster 2 is : 43.712112603125 -79.37661863541666


In [79]:
m = folium.Map(location=[43.653963, -79.387207], zoom_start=13)

folium.Marker(
    location=[43.653963, -79.387207],
    popup='Service Point',
    icon=folium.Icon(color='green')
).add_to(m)

m

**Conclusion**

Based on center of gravity, the green marker would be the centroid (i.e. the most optimal location) in serving Cluster 2. It is recommended that an initial setup to be made. Based on data collected during actual operation, an additional location to be identified as addiitonal delivery point as business expands.