# Capston Project 
## The Battle of the Neighborhoods - Open a Restaurant Supply Store in Toronto


## 1. Introduction:
### 1.1 Background
Toronto is the largest city in Canada and is located in the province of Ontario, located along the Lake Ontario’s northwestern shore. There are many Restaurant Supply Stores located in the city.

### 1.2	Business Problem
In this project, we are going to look for an optimal location to open a Restaurant Supply Store. Specifically, this report can provide a reference for stakeholders who are interested in opening a Restaurant Supply Store in Toronto, Quebec, Canada.

### 1.3	Interest
In this report, we will focus on all areas of Toronto city. There are many Restaurant Supply Stores in Toronto, we will present the distribution of existing Supply Stores. Then we will use a clustering model to find similar areas in the city considering demographic data of each borough and region. The preferred area shall be distant from Restaurant Supply Stores.
We will use data science tools to fetch the raw data, visualize it then generate a few most promising areas based on the above criteria. In the meanwhile, we will also explain the advantage and traits for the candidates, so that stakeholders can make the final decision based on the analysis. 



Locating a new store according to the requirements will ensure the following: 
- Lowest cost for delivery 
- Shortest travel time to his store for his clients
- Overall lower run costs 
- Increase in overall business
- Overall greater customer satisfaction 


# 2. Data:
### 2.1	Data sources
In this project, we will fetch or extract data from the following data sources:
- **Toronto neighborhoods broken down by postal code**<br>
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M<br>
Here I used BeautifulSoup to scrape the wiki page to extract a working list of Toronto Neighborhoods sorted by postal code.
<br>
- **Toronto geospatial coordinates** <br>
http://cocl.us/Geospatial_data<br>
Next, I joined geo spatial to the Toronto Data.
<br>
-  Toronto neighborhoods populations broken down by postal code<br>
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/File.cfm?T=1201&SR=1&RPP=9999&PR=0&CMA=0&CSD=0&S=22&O=A&Lang=Eng&OFT=CSV<br>
Use Pandas to grab the csv
<br>
- **Toronto neighborhoods average after tax income broken down by postal code** <br>
Here we must manually download these from Stats Canada and load them.<br>
https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/search-recherche/change-geo.cfm?Lang=E&Geo1=FSA<br>
See: to_geo_space.csv
<br>
- **What is the Canadian National Average After Tax Income** <br>
Here I must also manually download this from Stats Canada and load them.<br>
https://www150.statcan.gc.ca/n1/daily-quotidien/180313/dq180313a-eng.htm<br>
Canadian families and unattached individuals had a median after-tax income of $57,000 in 2016.<br><br>

- **Toronto list of Restaurants or Venues that could potentially use Restaurant Equipment** <br>
4SQUARE API<br>
https://api.foursquare.com


### 2.2	Data Analysis
Prepare the data for clustering<br><br>
_**Combine all of those into a working Data Set to cluster and geo spatial map of the results showing the best neighbourhood to open a Restaurant Supply Store**_<br>

Combining all of these disparate data sets will clearly demonstrate the following:<br>

- Which neighbourhood in Toronto have clusters of like Restaurants
- How populated each neighbourhood is
- The average after tax income is all of these neighbourhoods
- Which neighbourhood should he target to open his new store.

## 3. Methodology:

### 3.1 Choice of Algorithms <br>
I chose K-Means Clustering. <br>
https://towardsdatascience.com/clustering-algorithms-for-customer-segmentation-af637c6830ac <br>

A backgrounder on K-Means clustering <br>
“K-means clustering is an iterative clustering algorithm where the number of clusters K is predetermined and the algorithm iteratively assigns each data <br> point to one of the K clusters based on the feature similarity.” <br>

***  Key Observation: And for my project feature similarity means restaurant similarity in Neighborhoods *** <br>

### 3.2 Choosing the correct number of clusters. <br>
https://www.jeremyjordan.me/grouping-data-points-with-k-means-clustering/ <br>
Here I use Silhouette analysis to determine the optimum number of clusters to use. <br>

A backgrounder on Silhouette analysis.

“We can use Silhouette analysis to evaluate each model. A Silhouette coefficient is calculated for observation, which is then averaged to determine the Silhouette score. <br>
The coefficient combines the average within-cluster distance with average nearest-cluster distance to assign a value between -1 and 1. A value below zero  <br>denotes that the observation is probably in the wrong cluster and a value closer to 1 denotes that the observation is a great fit for the cluster and  <br>clearly separated from other clusters. This coefficient essentially measures how close an observation is to neighboring clusters, where it is desirable <br> to be the maximum distance possible from neighboring clusters. <br>
We can automatically determine the best number of clusters, k, by selecting the model which yields the highest Silhouette score.” <br>

*** Key Observation: My highest score was 3. *** <br>

### 3.3 Run K means and segment data into clusters and generate labels

### 3.4 Merge the Toronto data with geo coordinates data and make sure it's the right shape<br>
Here I reshape the Toronto data so that it’s shape matches the clustered data.<br>
### 3.5 Add the KMeans Labels
Determine the largest cluster in this case it was cluster number **3** with a shape of <br>
(67, 14)

### 3.6 Cluster 2 Contains the highest cluster density. We need to find the geographic centroid for this cluster. This is the optimum location for a new Restaurant Supply Store.<br>
Here we take the average latitude and longitude to be the centroid.<br>

### 3.7 Install opencage to reverse lookup the coordinates
Opencage allows me to reverse lookup the geo coordinates. <br>
*** Key Observation: This is the optimum location for a new Restaurant Supply Store.***




## Results:

### 4.1 Plot the clusters on a Map of the Toronto and Super Impose the best location of a Store

![](https://i.imgur.com/lQnSq7P.png)

### 4.2 Exact Address of desired Location

Based on a reverse Lookup <br>
The exact Address to locate would be: 268 Balliol Street, ON M4S 1C2, Canada or lat: 43.6534817, lng: -79.3839347

## 5. Discussion:

### 5.1 Explaining the results

As we built our list of neighborhoods with Restaurant venues exclusively we discovered most neighborhoods were similar and the greatest concentration of restaurants was in Central Toronto and downtown Toronto. This might seem obvious but it would also appear that these are some of the most affluent neighborhoods in Toronto so there appears to be correlation. By Locating in the general vicinity of the Exact location my friend could be geographically centered in this cluster and poised to service his restaurant customer base with greatest efficiency.<br>

When we built our our K-Means dataset we used Silhouette analysis to tell us there was a lot of similarity between neighborhoods and the most common restaurants contained with in. Really there was only 3 types of clusters or neighborhoods in greater Toronto. The vast majority of those were in cluster 3. So Toronto restaurants might be many but they are very homogeneously located near the center of Toronto.<br>

Of the 103 Toronto Neighborhoods gathered only 55.3% or 57 Neighborhoods are above the median after-tax income. 37.8% or 39 Neighborhoods are below he median after-tax income. 6.7% or 7 neighborhoods did not register as it appears their populations are too low. It appears that the greatest concentration of affluence is near central Toronto. We decided to keep all neighborhoods in the dataset regardless of income of population as the majority were close enough.<br> 

## Conclusion:

I feel confident with the recommendation I have given my friend as it is backed up with demonstrated data analysis. While nothing can ever be 100% certain he will certainly be better informed than he was prior to asking for my help.

Much more inference can be obtained with more work. A potential side business for my friend might be assisting new restaurant owners where they might locate a new restaurant, who their competition is and who their clientele might be. 
