# The Battle of Neighborhood

## Introduction

Burger Queen, a Canadian newly established burger joint, is aiming to discover where to build its new restaurant. The restaurant will be opened in Toronto, so as a data scientist we ought to analyze which area it may be most profitable to build the joint. The requirements set by the CEO is only places where there are not many competitors, as we have to build our brand first before trying to compete with popular and well-established joints. But, the place should be populous as well so that the burger brings profit. The CEO wants to build 3 restaurants per borough and would like to find the exact three location per borough that is needed to be bought to ensure the requirements are met.

## Data

The neighborhood candidate data needed for this analytics are the Toronto postal code, borough, and neighborhood data which will be extracted from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and their longitude and latitude which will be extracted from http://cocl.us/Geospatial_data. Then, the list of nearby burger joints will be extracted with Foursquare API. After we have the rival joints data, we use the data to generate K-Means Clustering, where the cluster center or centroids determine the area where we would like to open the new three joints.

### Neighborhood Candidate

In [1]:
#Let's import the libraries needed
import pandas as pd
from geopy.geocoders import Nominatim
import folium
import requests
from pandas.io.json import json_normalize
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
#Then we retrieve the postal code data from Wikipedia
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url)
df2 = df[0]                                                                                     
df3 = df2[['Postal Code','Borough', 'Neighbourhood']]
df3

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


In [3]:
#As you can see, there are lots of 'not assigned' borough and we don't need them
#Let's drop the unneeded borough
df4 = df3[df3.Borough != 'Not assigned']
df4

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Nice! We now have the list of borough and neighborhood in Toronto. Now let's fing their latitude and longitude.

In [4]:
geodf=pd.read_csv('http://cocl.us/Geospatial_data')
geodf

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [5]:
#The two dataframes have the same column named 'Postal Code'
#Let's combine both dataframe and let 'Postal Code' be the combining parameter
to_merged = pd.merge(df4, geodf, on="Postal Code")
to_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [6]:
#We want to know all unique borough in Toronto
to_merged['Borough'].unique()

array(['North York', 'Downtown Toronto', 'Etobicoke', 'Scarborough',
       'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [7]:
#I am having trouble using Foursquare API for East Toronto and Central Toronto
#I decided to exclude East Toronto from this data

to_merge1 = to_merged[to_merged.Borough != 'East Toronto']
to_merge = to_merge1[to_merge1.Borough != 'Central Toronto']                    
to_merge

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
97,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.382280
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [8]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6534817, -79.3839347.


Looking good! Now let's move on to the next section

### Foursquare API

In [9]:
#Let's declare our credentials first
CLIENT_ID = 'WIFWESZITOIN1EVYX3CGVYNM4ZY2QOOFPYK31ZKERQQTJBVH'
CLIENT_SECRET = 'TTED4T1YTWCNZQIGJ2A5R10OCGRHU2WFP5GV2YIEHMZCSAR2'
VERSION = '20180604'
LIMIT = 500
print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: WIFWESZITOIN1EVYX3CGVYNM4ZY2QOOFPYK31ZKERQQTJBVH
CLIENT_SECRET:TTED4T1YTWCNZQIGJ2A5R10OCGRHU2WFP5GV2YIEHMZCSAR2


In [10]:
boroughs = ['North York, ON', 'Downtown Toronto, ON', 'Etobicoke, ON', 'Scarborough, ON', 'East York, ON', 
            'York, ON', 'West Toronto, ON', 'Mississauga, ON']
results = {}
for borough in boroughs:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        borough,
        LIMIT,
        "4bf58dd8d48988d16c941735") # Burger Joints CATEGORY ID
    results[borough] = requests.get(url).json()

In [11]:
#See the sample
results['North York, ON']

{'meta': {'code': 200, 'requestId': '5f205a263ab3975ba8e324b4'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'geocode': {'what': '',
   'where': 'north york on',
   'center': {'lat': 43.76681, 'lng': -79.4163},
   'displayString': 'North York, ON, Canada',
   'cc': 'CA',
   'geometry': {'bounds': {'ne': {'lat': 43.85873, 'lng': -79.296997},
     'sw': {'lat': 43.702629, 'lng': -79.557068}}},
   'slug': 'north-york-ontario-canada',
   'longId': '72057594044019040'},
  'headerLocation': 'North York',
  'headerFullLocation': 'North York',
  'headerLocationGranularity': 'city',
  'query': 'burgers',
  'totalResults': 41,
  'suggestedBounds': {'ne': {'lat': 43.85008397930268,
    'lng': -79.29020232542227},
   'sw': {'lat': 43.70496562920522, 'lng': -79.54638365797548}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This sp

In [12]:
to_venues={}
for borough in boroughs:
    venues = json_normalize(results[borough]['response']['groups'][0]['items'])
    to_venues[borough] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    to_venues[borough].columns = ['Name', 'Address', 'Latitude', 'Longitude']

In [13]:
#See sample
to_venues['Mississauga, ON'].head()

Unnamed: 0,Name,Address,Latitude,Longitude
0,On the bun,14-5030 Maingate Dr,43.629843,-79.627587
1,Five Guys,"6045 Mavis Rd, Unit 3",43.613328,-79.695366
2,The Burger's Priest,129 Lakeshore Rd E,43.554373,-79.582291
3,Union Burger,4188 Living Arts Dr,43.589846,-79.647793
4,A&W,2930 Argentia Rd,43.598628,-79.781539


In [14]:
combined = to_venues['North York, ON'].append(to_venues['Downtown Toronto, ON'], ignore_index=True)
combined1 = combined.append(to_venues['Etobicoke, ON'], ignore_index=True)
combined2 = combined1.append(to_venues['Scarborough, ON'], ignore_index=True)
combined3 = combined2.append(to_venues['East York, ON'], ignore_index=True)
combined4 = combined3.append(to_venues['York, ON'], ignore_index=True)
combined5 = combined4.append(to_venues['West Toronto, ON'], ignore_index=True)
combined6 = combined5.append(to_venues['Mississauga, ON'], ignore_index=True)
combined6

Unnamed: 0,Name,Address,Latitude,Longitude
0,South St. Burger,49 Clock Tower Road G010,43.734274,-79.343655
1,The Burger Cellar,3391 Yonge St,43.732362,-79.403894
2,Johnny's Hamburgers,2595 Victoria Park Ave,43.774833,-79.322365
3,Hero Certified Burgers,4698 Yonge St,43.758885,-79.410249
4,Golden Star,7123 Yonge St,43.801533,-79.420520
...,...,...,...,...
478,Fionn MacCool's Britannia,825 Britannia Road West,43.609581,-79.699215
479,The Yorkshire Arms,1201 Britannia Rd W,43.602963,-79.706652
480,Master Steaks,5895 Dixie Rd.,43.653483,-79.644318
481,Fionn MacCool's,Toronto Pearson International Airport (YYZ),43.685422,-79.621352


In [15]:
map_burger = folium.Map(location=[latitude, longitude], zoom_start=9)

for lat, lng, label in zip(combined6['Latitude'], combined6['Longitude'], combined6['Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_burger)  
    
map_burger

As you can see, above is the map containing all the burger joints in Toronto. This map is only for reference. In the end, what matters are the maps of each borough and the restaurants inside them.

## Methodology

In order to find those three locations, I will be using K-means clustering with k = 3. I decided to use that because areas containing burger joints means that it is strategically area with high demography, so we are not afraid of potentially building our joint in area without population. However, it still stands that we have to be not too close to other joints. With k-means clustering, we will create k-number of centroids which is supposed to be the point where all nearby joints has similar distance to the point. Therefore, we can assume that it is not too close to the other joints.

## Analysis

First, let's define all needed functions.

In [16]:
def findlatlang(borough):
    add1 = borough
    geolocator = Nominatim(user_agent=borough+"explorer")
    loc1 = geolocator.geocode(add1)
    lat1 = loc1.latitude
    long1 = loc1.longitude
    return lat1, long1

In [17]:
def findarea(borough):
    a = to_venues[borough].drop(['Name', 'Address'], 1)
    kclusters = 3
    kmeans = KMeans(n_clusters=kclusters, n_init=10).fit(a)
    centroid = kmeans.cluster_centers_
    Locs = ['Area 1', 'Area 2', 'Area 3']
    CentroFrame = pd.DataFrame(centroid)
    CentroFrame.columns = ['Latitude', 'Longitude']
    CentroFrame['Location'] = Locs
    return CentroFrame

Then, we use the function with each borough to find the location and create a map.

### North York

In [18]:
loc_noyo = findlatlang('North York, ON')
print(loc_noyo)
cf_noyo = findarea('North York, ON')
print(cf_noyo)

(43.7543263, -79.44911696639593)
    Latitude  Longitude Location
0  43.751081 -79.484132   Area 1
1  43.752712 -79.340172   Area 2
2  43.764950 -79.405064   Area 3


In [19]:
map_noyo = folium.Map(location=[loc_noyo[0], loc_noyo[1]], zoom_start=12)

for lat, lon, label in zip(cf_noyo['Latitude'], cf_noyo['Longitude'], cf_noyo['Location']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_noyo)
       
map_noyo

### Downtown Toronto

In [20]:
loc_doto = findlatlang('Downtown Toronto, ON')
print(loc_doto)
cf_doto = findarea('Downtown Toronto, ON')
print(cf_doto)

(43.6563221, -79.3809161)
    Latitude  Longitude Location
0  43.648052 -79.387662   Area 1
1  43.653827 -79.381233   Area 2
2  43.649311 -79.379159   Area 3


In [21]:
map_doto = folium.Map(location=[loc_doto[0], loc_doto[1]], zoom_start=15)

for lat, lon, label in zip(cf_doto['Latitude'], cf_doto['Longitude'], cf_doto['Location']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_doto)
       
map_doto

### Etobicoke

In [22]:
loc_eto = findlatlang('Etobicoke, ON')
print(loc_eto)
cf_eto = findarea('Etobicoke, ON')
print(cf_eto)

(43.6435559, -79.5656326)
    Latitude  Longitude Location
0  43.684930 -79.610814   Area 1
1  43.616143 -79.556612   Area 2
2  43.637102 -79.505608   Area 3


In [23]:
map_eto = folium.Map(location=[loc_eto[0], loc_eto[1]], zoom_start=12)

for lat, lon, label in zip(cf_eto['Latitude'], cf_eto['Longitude'], cf_eto['Location']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_eto)
       
map_eto

### Scarborough

In [24]:
loc_sc = findlatlang('Scarborough, ON')
print(loc_sc)
cf_sc = findarea('Scarborough, ON')
print(cf_sc)

(43.773077, -79.257774)
    Latitude  Longitude Location
0  43.790821 -79.296103   Area 1
1  43.726828 -79.280640   Area 2
2  43.789862 -79.183114   Area 3


In [25]:
map_sc = folium.Map(location=[loc_sc[0], loc_sc[1]], zoom_start=12)

for lat, lon, label in zip(cf_sc['Latitude'], cf_sc['Longitude'], cf_sc['Location']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_sc)
       
map_sc

### East York

In [26]:
loc_ey = findlatlang('East York, ON')
print(loc_ey)
cf_ey = findarea('East York, ON')
print(cf_ey)

(43.699971000000005, -79.33251996261595)
    Latitude  Longitude Location
0  43.679343 -79.346659   Area 1
1  43.679750 -79.313439   Area 2
2  43.707899 -79.344037   Area 3


In [27]:
map_ey = folium.Map(location=[loc_ey[0], loc_ey[1]], zoom_start=13)

for lat, lon, label in zip(cf_ey['Latitude'], cf_ey['Longitude'], cf_ey['Location']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_ey)
       
map_ey

### York

In [28]:
loc_yo = findlatlang('York, ON')
print(loc_yo)
cf_yo = findarea('York, ON')
print(cf_yo)

(43.6896191, -79.479188)
    Latitude  Longitude Location
0  43.684366 -79.437440   Area 1
1  44.353646 -79.649221   Area 2
2  43.886043 -79.304222   Area 3


In [29]:
map_yo = folium.Map(location=[loc_yo[0], loc_yo[1]], zoom_start=8)

for lat, lon, label in zip(cf_yo['Latitude'], cf_yo['Longitude'], cf_yo['Location']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_yo)
       
map_yo

### West Toronto

In [30]:
loc_wt = findlatlang('West Toronto, ON')
print(loc_wt)
cf_wt = findarea('West Toronto, ON')
print(cf_wt)

(43.6534817, -79.3839347)
    Latitude  Longitude Location
0  43.652689 -79.495731   Area 1
1  43.655253 -79.408509   Area 2
2  43.664295 -79.450047   Area 3


In [31]:
map_wt = folium.Map(location=[loc_wt[0], loc_wt[1]], zoom_start=12)

for lat, lon, label in zip(cf_wt['Latitude'], cf_wt['Longitude'], cf_wt['Location']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_wt)
       
map_wt

### Mississauga

In [32]:
loc_mi = findlatlang('Mississauga, ON')
print(loc_mi)
cf_mi = findarea('Mississauga, ON')
print(cf_mi)

(43.590338, -79.645729)
    Latitude  Longitude Location
0  43.591486 -79.630629   Area 1
1  43.576159 -79.720029   Area 2
2  43.673471 -79.625325   Area 3


In [33]:
map_mi = folium.Map(location=[loc_mi[0], loc_mi[1]], zoom_start=11)

for lat, lon, label in zip(cf_mi['Latitude'], cf_mi['Longitude'], cf_mi['Location']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_mi)
       
map_mi

## Results

Using the data from Wikipedia, we obtained data regarding to all borough and neighborhood in Toronto. Then we are able to obtain the name of unique boroughs that is used in further analysis.

Using the data from Foursquare, we obtained data regarding location of rival burger joints on each borough. It is needed to find the location which requirements meet the requests of the CEO.

After that, we used K-Means clustering to cluster the location of rival burger joints in order to find the centroids. The centroids are the location which is predicted to meet the requirements from the CEO

After the centroids of each borough is found, we use Geopy and Folium to generate a map to visualize the data.

## Conclusion

Now we have 3 locations from each borough where we should consider building our first 3 joints in the borough. As we have the latitude and longitude of each area, it is a simple task to translate it to street number using other user-friendly apps such as Google Maps to find the location in real life. 

As we are using n_init = 10, it is considered the best option from 10 tries so we may say that it is almost the ideal location by considering distance to other joints and number of population around the area.

Final decission on optimal joint location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.