# <center>Capstone Project</center>

# <center>Exploring the Best Places to Establish a Chinese Restaurant in Toronto</center>

### Index

<ol>
    <li>Introduction</li>
    <li>Data Description</li>
</ol>

## 1. Introduction

<p>Toronto, the capital of the province of Ontario, is a major Canadian city along Lake Ontario’s northwestern shore. Toronto also has many green spaces, from the orderly oval of Queen’s Park to 400-acre High Park and its trails, sports facilities and zoo.</p>

<p>Toronto is a city with a high population and population density, as per <a href="https://en.wikipedia.org/wiki/Chinese_Canadians" target="_blank">Wikipedia Page</a> about <b>631,050</b> Chinese lives in Toronto. In order to open a Chinese restaurant in Toronto we first need to identify places or venues which will be more suitable for us to open this restaurant to make it a profitable business.</p>

<p>If we think of above problem we can start from finding the best possible neighborhood for establishing a Chinese restaurant in Toronto city based on the number of Chinese restaurants in the vicinity of the chosen spot, i.e. choosing a neighborhood with minimum competition and coming up with a few suggestive neighbourhoods that have business potential in terms of opening a new Chinese restaurant.</p>

<p>When an invester dream about doing investments into opening Restaurants, one of the important question is try to find best possible place or area with one of the least competition. In this project we will locate such places which are ideal for opening a Chinese Restaurant.</p>

<p>This project will help two types of target audiences, first Individuals/Investers who are trying to establish new Restaurant business by finding areas which has low number of Chinese restaurants and secondly tourists e.g. Asian tourists or people who like Chinese food to help them choosing neighborhoods with easy accessibility.</p>

## 2. Data Description

<p>The required data set can be acquired from different data sources. The three data sources are listed below:</p>

<li><a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M" target="_blank">Wikipedia</a> to fetch boroughs and neighborhoods of Toronto.</li>
<li>A .csv file <a href="https://cocl.us/Geospatial_data" target="_blank">https://cocl.us/Geospatial_data</a> to fetch latitudes and longitudes corresponding to each postal code.</li>
<li>The <a href="https://developer.foursquare.com/" target="_blank">Foursquare API</a> to fetch different public venues in the vicinity of the neighborhood.</li>

<p>The Wikipedia page contains a table of postal codes followed in Toronto, along with the boroughs and neighbourhoods in Toronto city. The <b>.csv file</b> provides us with the latitude and longitude co-ordinates of each postal code followed in the region of Toronto. This data is beneficial since these co-ordinates are then used with the <b>FourSquare API</b> to give out a list of popular venues in each neighbourhood.
The data is comprehensive, and yields valuable insights related to Toronto city that eventually helped us in unearthing conclusive results and observations. The data source, as it is perceived at the start of the project is unclean and required intensive pre-processing in order to convert it to a working set, capable of handling machine learning algorithms and visualization operations that were implemented on it. </p>

### 2.1 Data Pre-Processing

#### First Data Source

<p>First of all we will scrape data from Wikipedia page into our <i>pandas</i> dataframe</p>

In [4]:
# Importing libraries
import pandas as pd
import numpy as np

In [8]:
# Url to Wikipedia page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data_table = pd.read_html(url)
df=data_table[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [9]:
# Shape of our dataframe
df.shape

(180, 3)

<p>We don't needs <b>Boroughs</b> having value <i>Not assigned</i>, so we will remove them from our dataframe</p>

In [10]:
df = df[df.Borough != 'Not assigned']

<p>Now we will sort our dataframe with respect to <b>Postal Code</b>, <b>Borough</b> and reset indexes.</p>

In [12]:
df = df.sort_values(by = ['Postal Code','Borough'])
df.reset_index(inplace=True)
df.drop('index', axis = 1, inplace = True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### Second Data Source

<p>Now we will use our second data source, and import our data for Longitudes and Latitudes</p>

In [15]:
dt = pd.read_csv('https://cocl.us/Geospatial_data')
dt.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<p>Merging above dataframe with our original dataframe so that against each Postal Code we have Latitude and Longitude values set</p>

In [16]:
df=pd.merge(df, dt, on='Postal Code')
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<p>Hence as our Data Pre-Processing is done and dataframe is now ready for Data Analysis.</p>