# Capstone Project - The Battle of Neighborhoods ##

## Week 5 - Opening a new Chinese restaurant in Toronto##

## 1.0 Introduction / Business Problem ##

### 1.1 Background: ###
Toronto is Canada's largest city by population (4th largest city by population in North America) and is recognized as one of the most multicultural and cosmopolitan cities in the world. The diverse population of Toronto reflects its current and historical role as an important destination for immigrants to Canada. 50%+ of residents belong to a visible minority population group.

The diversity of the cuisine available is reflective of Toronto's multiculturalism and is key to maintaining Toronto's reputation as one of the most immigrration friendly and ethnically diverse cities in Canada. Toronto can be divided into many ethnic neighborhoods mainly serving different kinds of ethnic foods where residents and tourists alike know where to get the best selection of ethnic foods to satisfy all of their food cravings.

Opening a new restaurant (i.e, Chinese restaurant) is often an immmigrant/resident's dream and could become their main/only source of income. As with any business decision, opening a new restaurant requires serious consideration of many complex factors. Specifically, it is critical for the owner to accurately predict the best location to establish the restaurant to ensure the long-term success of the business. One of the key considerations in deciding on a location could be looking at where the other Chinese restaurants are located in the city.

### 1.2 Problem / Objective: ###

The objective of this project is to analyze and pick the best locations to open a new Chinese restaurant in Toronto. Factors critical in determining the best location to open a Chinese restaurant could include:

- Boroughs/neighbourhoods with the highest/lowest number of similar restaurants
- Boroughs/neighbourhoods with the highest rated and lowest rated Chinese retaurants
- Average cost of chinese dishes by Borough/neighbourhood

Applying data science methodologies on Foursquare location data, the project aims to provide solutions to the following question: __Where should a new business owner look to open a Chinese restaurant in the city of Toronto?__

### 1.3 Target Audience / Interested Parties: ###

The main stakeholder/target audience would be the business owner who is looking to make his/her main/only source of income by establishing a new Chinese restaurant at an optimal location. Other interested parties include residents and new immigrants who want to try out a new Chinese restaurant/make this their regular dining out spot, tourists who come from all over the world to dine at the best Toronto restaurants, essentially everyone who likes Chinese food/is interested in trying out Chinese food.

## 2.0 Data Description / Data Sources##

Data required for this project includes:

- List of boroughs/neighbourhoods in Toronto, as the location of the project is confined to the city of Toronto in the country of Canada (Wikipedia)
- Latitude and longtitude coordinates of these boroughs. This is required to plot the map as well as retrive the venue data (Geocoder package on Python)
- Venue data, specifically data related to Chinese restaurants. This data will be used to get a list of Chinese restaurants in different neighbourhoods (Foursquare)

Sources of data and methods for extraction: 

- A list of boroughs/neighbourhoods in Toronto sorted by postcode can be found on Wikipedia (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M). To extract the data, Python has built-in functions to turn the data into table form. 
- Then the Python Geocoder package will be used to get the geographical coordinates (latitude and longtitude coordinates) of the neighbourhoods.
- The Foursquare API will then be used to get the venue data for the neighbourhoods. The Foursquare API can provide many categories of venue data but my area of interest is in the Chinese restaurants category.

## 3.0 Methodology##


### Step 1: import the neighbourhood data into a dataframe ###

In this step, I web scrapped Wikipedia into a dataframe containing postcode, borough and neighbourhood info for the city of Toronto which is one of my main data sources.

In [116]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source,'lxml')

In [5]:
table = soup.find('table')
tablevalues = table.find_all('td') #finds all the cells in table and creates a list

elementcount = len(tablevalues) #total number of cells

postcode,borough,neighborhood = [],[],[] #sets columns to 3 empty lists

for i in range(0, elementcount, 3): #start at cell 0, iterate through cells in increments of 3
    postcode.append(tablevalues[i].text.strip()) #removes white space of text
    borough.append(tablevalues[i+1].text.strip())
    neighborhood.append(tablevalues[i+2].text.strip())

In [7]:
#putting the data into a dataframe
df = pd.DataFrame(data=[postcode, borough, neighborhood]).transpose()
df.columns = ['Postcode', 'Borough', 'Neighborhood']

df.drop(df[df['Borough'] == 'Not assigned'].index, inplace=True)
df.loc[df.Neighborhood == 'Not assigned', "Neighborhood"] = df.Borough

df

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
...,...,...,...
281,M8Z,Etobicoke,Kingsway Park South West
282,M8Z,Etobicoke,Mimico NW
283,M8Z,Etobicoke,The Queensway West
284,M8Z,Etobicoke,Royal York South West


### Step 2: clean the data ###

I then cleaned the data so that each unique postal code corresponds to a borough and a list of neighbourhoods.

In [211]:
#cleaning the data, and putting all neighbourhoods corresponding to a postcode into a list
newdf = df.groupby(['Postcode', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()
newdf.columns = ['Postcode', 'Borough', 'Neighborhood']
newdf.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### Step 3: add/append geographic coordinates ###

In this step, I used the geospatial data containing latitude and longitude data provided by Coursera in order to add/append coordinates onto my earlier database so that it can now be plotted.

In [12]:
#get latitude data and longitude data and append it to the dataframe
coordinatesfile = 'http://cocl.us/Geospatial_data'
columns = ["Postcode","Latitude","Longitude"]

coordinates_df = pd.read_csv(coordinatesfile,names=columns,skiprows=1)  #getting data from coordinates file
print("Done")

new_coordinates_df = pd.merge(newdf, coordinates_df, on='Postcode', how='inner')  #join the dfs to also show coordinates info

new_coordinates_df

Done


Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437


### Step 4: explore and cluster neighbourhoods then superimpose the Toronto map on top

I used python folium to visualize geographic details of Toronto and its boroughs and created a map of Toronto with boroughs suerimposed on tip. I used latitude and longitude values from geocoder to get the visuals as below.


In [13]:
#start by importing all necessary libraries 

import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

!pip install geocoder==1.5.0
!pip install geopy
#!conda install -c conda-forge geopy --yes #uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim #convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
!pip install sklearn

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install folium
import folium # map rendering library

print('Libraries imported.')

You should consider upgrading via the 'pip install --upgrade pip' command.[0m
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Libraries imported.


In [95]:
#mapped out Toronto by boroughs and neighbourhoods

neighborhoods=df_toronto
df_toronto = new_coordinates_df.reset_index(drop=True)
df_toronto

address = 'Toronto, Ontario' 

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

  import sys


The geograpical coordinate of Toronto City are 43.653963, -79.387207.


### Step 5: using Foursquare API to screen out all chinese restaurants in Toronto###

I then used the Foursquar API to find Chinese restaurants in each borough. I designed the limit as 1000 venues and the radius as 6,000 meters from center of Toronto as determined by Foursquare based on latitude and longitude information. The category ID corresponding to Chinese restaurants is also input to limit the search to Chinese restaurants only. Below is a head list containing venue name, category (in this case Chinese restaurants only), latitude and longitude information from Foursquare API.

In [208]:
import math

CLIENT_ID = 'VXKTGFX1QDNVIVEVIO1G3ZRYCWIHP24EHIDQPAKN3G1DRQQZ' # your Foursquare ID
CLIENT_SECRET = 'HKW5U5NNSWCDDGH32FRNHAR4CDJMVKHKSG1UULTHAUBY0BBV' # your Foursquare Secret
VERSION = '20200201' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

df_toronto.loc[0, 'Neighborhood']

neighborhood_latitude = df_toronto.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_toronto.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_toronto.loc[0, 'Neighborhood'] # neighborhood name

LIMIT = 50 # limit of number of venues returned by Foursquare API

radius = 6000 # define radius

params = {
    'client_id':CLIENT_ID,
    'client_secret':CLIENT_SECRET, 
    'v':VERSION, 
    'near': 'Toronto, ON', 
    'radius':radius,
    'limit':LIMIT,
    'categoryId': '4bf58dd8d48988d145941735'
}

print("Fethching inital page")
results = requests.get("https://api.foursquare.com/v2/venues/explore",params=params).json()
total_count = results['response']['totalResults']
print("Total count: {}".format(total_count))

venues = results['response']['groups'][0]['items']

if total_count > len(venues):
    page_size = len(venues)
    remainder = total_count - page_size
    pages = math.ceil(remainder / page_size)
    for i in range(pages):
        print("Fetching page {}".format(i+1))
        offset = (1 + i) * page_size
        next_params = params
        next_params['offset'] = offset
        next_results = requests.get("https://api.foursquare.com/v2/venues/explore",params=next_params).json()
        
        venues = venues + next_results['response']['groups'][0]['items']


# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print(nearby_venues.info())
nearby_venues.head(10)

Your credentails:
CLIENT_ID: VXKTGFX1QDNVIVEVIO1G3ZRYCWIHP24EHIDQPAKN3G1DRQQZ
CLIENT_SECRET:HKW5U5NNSWCDDGH32FRNHAR4CDJMVKHKSG1UULTHAUBY0BBV
Fethching inital page
Total count: 157
Fetching page 1
Fetching page 2
Fetching page 3
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 157 entries, 0 to 156
Data columns (total 4 columns):
name          157 non-null object
categories    157 non-null object
lat           157 non-null float64
lng           157 non-null float64
dtypes: float64(2), object(2)
memory usage: 5.0+ KB
None


Unnamed: 0,name,categories,lat,lng
0,Swatow Restaurant 汕頭小食家,Chinese Restaurant,43.653866,-79.398334
1,C'est Bon,Chinese Restaurant,43.716785,-79.400406
2,GB Hand-Pulled Noodles,Chinese Restaurant,43.656434,-79.383783
3,Cho Cho Cho,Chinese Restaurant,43.680139,-79.432895
4,Crown Princess Fine Dining 伯爵名宴,Chinese Restaurant,43.666455,-79.387698
5,Asian Legend 味香村,Chinese Restaurant,43.653603,-79.395047
6,Lee,Chinese Restaurant,43.644375,-79.400425
7,Sea-Hi,Chinese Restaurant,43.73163,-79.433014
8,Yueh Tung Chinese Restaurant,Chinese Restaurant,43.655281,-79.385337
9,House of Gourmet 滿庭芳,Chinese Restaurant,43.653273,-79.39723


As seen above, there are 157 Chinese restaurant venues returned by Foursquare.

# 4.0 Results #

In [212]:
nearby_venues.reset_index(drop=True)
nearby_venues

address = 'Toronto, Ontario' 

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

cr_map = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers
for lat, lng, name in zip(nearby_venues['lat'], nearby_venues['lng'], nearby_venues['name']):
    label = folium.Popup(name, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(cr_map)  
    
cr_map

  


The map result shows the Chinese restaurants are fairly scattered but seem to be more concentrated in Downtown Toronto, and North York.

In [201]:
import math

def nearest_borough(lat, lng):
    nearest_borough = None
    nearest_distance = None
    for i, borough in new_coordinates_df.iterrows():
        b_lat = borough['Latitude']
        b_lng = borough['Longitude']
        
        distance = math.sqrt(
            math.pow(b_lng-lng,2) +
            math.pow(b_lat-lat,2)
        )
    
        if nearest_distance is None or distance < nearest_distance:
            nearest_distance = distance
            nearest_borough = borough["Borough"]
    
    return nearest_borough

crvenues_df = nearby_venues

for i, venue in crvenues_df.iterrows():
    borough = nearest_borough(venue['lat'], venue['lng'])
    crvenues_df.loc[i, 'borough'] = borough
    
crvenues_df.head(10)

Unnamed: 0,name,categories,lat,lng,borough
0,Swatow Restaurant 汕頭小食家,Chinese Restaurant,43.653866,-79.398334,Downtown Toronto
1,C'est Bon,Chinese Restaurant,43.716785,-79.400406,Central Toronto
2,GB Hand-Pulled Noodles,Chinese Restaurant,43.656434,-79.383783,Downtown Toronto
3,Cho Cho Cho,Chinese Restaurant,43.680139,-79.432895,York
4,Crown Princess Fine Dining 伯爵名宴,Chinese Restaurant,43.666455,-79.387698,Downtown Toronto
5,Asian Legend 味香村,Chinese Restaurant,43.653603,-79.395047,Downtown Toronto
6,Lee,Chinese Restaurant,43.644375,-79.400425,Downtown Toronto
7,Sea-Hi,Chinese Restaurant,43.73163,-79.433014,North York
8,Yueh Tung Chinese Restaurant,Chinese Restaurant,43.655281,-79.385337,Downtown Toronto
9,House of Gourmet 滿庭芳,Chinese Restaurant,43.653273,-79.39723,Downtown Toronto


In [210]:
crvenues_df.groupby("borough").count()

Unnamed: 0_level_0,name,categories,lat,lng
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Central Toronto,9,9,9,9
Downtown Toronto,119,119,119,119
East Toronto,5,5,5,5
East York,2,2,2,2
North York,13,13,13,13
West Toronto,6,6,6,6
York,3,3,3,3


By mapping the coordinates of the restaurants to the closest borough and displaying it in a dataframe, we can further confirm that downtown Toronto has the highest number of Chinese restaurants, followed by North York.

# 5.0 Discussion#

Toronto is a big city with pockets of high population density, with downtown toronto and north york both having pockets of high density. I first visualized the Foursquare Chinese restaurant data and was able to see there are high concentrations of Chinese restaurants in Downtown and North York Toronto. 

Then, I clustered the restarants into their corresponding boroughs. In order to do this I applied a mathematical algorithm to assign restaurants to the boroughs with a central coordinate closest to the restaurant as measured by a straight line. After that I put this data into a dataframe which further confirmed that Downtown Toronto has the highest number of Chinese restaurants by far (119) vs North York which has the second highest number of Chinese restaurants (13) and other neighbourhoods. Chinatown is also located in Downtown Toronto. So it is safe to say that when people want Chinese food, they know they can best find them in Downtown and North York.

Based on these results, I would recommend opening a Chinese restaurant in a location with the highest population density and number of Chinese restaurants, which would be Downtown Toronto. My rationale includes the following:

- Areas with high population density will have more mouths to feed. As a result, there will be higher demand for restaurants in those areas where residents are looking to eat something convenient and delicious without having to travel far.
- Toronto has one Chinatown located Downtown, it is the most popular place residents and tourists go when they are craving Chinese food since it has the greatest and best selection. A Chinese restaurant set up in this area would get a lot of foot traffic from residents and tourists alike who are just passing buy and want to try out a new place.

# 6.0 Conclusion#



To conclude, Downtown Toronto, specifically Chinatown, is known for its high population density and for being a tourist attraction for Toronto and foreign residents alike who want to try out different types of Chinese cousine. As a result, this neighborhood also has the highest number of Chinese restaurants. 

I recommend opening a new chinese restaurant in this location since it is the go to neighborhoods for Chinese cuisine. When the restaurant first opens it will gain a lot of foot traffic and attention from passing residents and tourists alike, as long as food quality, reasonable prices (covering costs) and great service can be ensured, the good word will get out and the restaurant can expect to prosper in the long term!