# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal neigbohord for a restaurant in Toronto (Canada).

Since there are lots of restaurants in Berlin we will try to detect **locations that are not already crowded with restaurants**. 
We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
<li>Toronto's Population</li>
<li>Toronto Demographics by postal codes</li>
<li>Toronto Average income post taxes by postal codes</li>
<li>List of all competitors in that location by postal code</li>

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.


In [5]:
# We install the required packages and do imports
!pip install requests
!pip install lxml
!pip install bs4
import lxml
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup





In [6]:
# We create the dataset of Posgal codes for Toronto
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, 'lxml')

table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['Postal Code'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df_Toronto=pd.DataFrame(table_contents)
df_Toronto['Borough']=df_Toronto['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df_Toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [7]:
#we get the geospatial data
!wget -O to_geo_space.csv http://cocl.us/Geospatial_data
print('Data Downloaded !')

gf = pd.read_csv('to_geo_space.csv')

df_TorontoGeo = pd.merge(df_Toronto, gf, on='Postal Code', how='inner')

df_TorontoGeo = df_TorontoGeo.rename(columns={'Postal Code':'PostalCode'})

df_TorontoGeo.head()

URL transformed to HTTPS due to an HSTS policy
--2021-05-29 19:13:21--  https://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 52.116.127.25, 52.116.122.240
Connecting to cocl.us (cocl.us)|52.116.127.25|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2021-05-29 19:13:23--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.29.197
Connecting to ibm.box.com (ibm.box.com)|107.152.29.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2021-05-29 19:13:23--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Reusing existing connection to ibm.box.com:443.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.ent.box.com/public/static/9

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


In [8]:
#we get the population data for Toronto by postalcode
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

df_TorontoPop = pd.read_csv('https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/File.cfm?T=1201&SR=1&RPP=9999&PR=0&CMA=0&CSD=0&S=22&O=A&Lang=Eng&OFT=CSV',encoding= 'unicode_escape')
print('Data Downloaded !')

df_TorontoPop = df_TorontoPop.rename(columns={'Geographic code':'PostalCode', 'Geographic name':'PostalCod2', 'Province or territory':'Province', 'Incompletely enumerated Indian reserves and Indian settlements, 2016':'Incomplete', 'Population, 2016':'Population_2016', 'Total private dwellings, 2016':'TotalPrivDwellings', 'Private dwellings occupied by usual residents, 2016':'PrivDwellingsOccupied'})
df_TorontoPop= df_TorontoPop.drop(columns=['PostalCod2', 'Province', 'Incomplete', 'TotalPrivDwellings', 'PrivDwellingsOccupied'])

df_TorontoPop = df_TorontoPop.iloc[1:]
df_TorontoPop.head()

Data Downloaded !


Unnamed: 0,PostalCode,Population_2016
1,A0A,46587.0
2,A0B,19792.0
3,A0C,12587.0
4,A0E,22294.0
5,A0G,35266.0


In [9]:
#we merge
df_TorontoGeo
df_TorontoGeo1 = pd.merge(df_TorontoPop, df_TorontoGeo, on='PostalCode', how='right')

df_TorontoGeo1 = df_TorontoGeo1.sort_values(by=['Population_2016'], ascending=False)

df_TorontoGeo1 = df_TorontoGeo1.rename(columns={'Neighbourhood':'Neighborhood'})
df_TorontoGeo1.head()

Unnamed: 0,PostalCode,Population_2016,Borough,Neighborhood,Latitude,Longitude
59,M2N,75897.0,North York,Willowdale South,43.77012,-79.408493
6,M1B,66108.0,Scarborough,"Malvern, Rouge",43.806686,-79.194353
33,M2J,58293.0,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
89,M9V,55959.0,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
85,M1V,54680.0,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577


In [10]:
#we get the best neigohood and its location
CLIENT_ID = 'RFQMDMYZKGBYV33YTXLFQXXBTBTNEX1KJPNFYDI5MYDC5MB1' # your Foursquare ID
CLIENT_SECRET = 'GKAQZ1XFWP0C1SC0QS0BHNNAGLDRCB2NRVXN3RBAMOTHPYXC' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 150
df_TorontoGeoOnly = df_TorontoGeo.drop('PostalCode',1)
df_TorontoGeoOnly = df_TorontoGeoOnly.rename(columns={'Neighbourhood':'Neighborhood'})
df_TorontoGeoOnly.head()
df_TorontoGeoOnly.loc[0, 'Neighborhood']
neighbourhood_latitude = df_TorontoGeoOnly.loc[0, 'Latitude'] # neighbourhood latitude value
neighbourhood_longitude = df_TorontoGeoOnly.loc[0, 'Longitude'] # neighbourhood longitude value

neighbourhood_name = df_TorontoGeoOnly.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.
