# Where Do I Open a Sporting Goods Store in Toronto?

   ### Capstone Project by Rakesh Prusty

## Introduction

**Audience/User** - Aspiring Sporting Goods Store Owners

**Business Problem** - A sports enthusiast is planning to open a Sports Goods store in the city of Toronto. She does a market research and came up with a hypothesis that it's always less risky to open a Sports Goods store in the areas where sports/fitness centers are located in close proximity. This project will recommend the areas to the User where she can open a store.   

## Data

1. Toronto Postal Code and Neighborhood data will be used from the wiki page - https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
2. Geo Coordinates will be used from the csv file provided - 'Geospatial_Coordinates.csv'
3. Foursquare Data will be used to analyze different venues in the neighborhoods 

## Method

**Python Libraries Used**
1. Pandas and Numpy for Data Analysis
2. Wikipedia for extracting data from Wikipedia Page
3. Geopy to extract Geo Coordinates from an address
4. Sklearn to use K-Means Clustering
5. Matplotlib and Folium for Data Visualization

**Data Collection**
1. Toronto Poatal Code and Neighborhood data are collected from the above mentioned wiki page with help of 'wikipedia' library
2. 'Geospatial_Coordinates.csv' is imported and merged with the Neighborhood data for further processing
3. Foursquare Data is collected with help of API

**Data Wrangling and Exploratory Analysis**
1. Neighborhood data is wrangled by removing unassigned values to Borough and by renaming the columns
2. Geospatial coordinates data is then merged to the Neighborhood data to generate the final dataframe

Toronto Neighborhoods look like below

![Toronto_Neighborhood_Data.png](attachment:Toronto_Neighborhood_Data.png)

3. A function is written to call the Foursquare API and get top 100 venues in the neighborhoods in Toronto.
4. Each Neighborhood is analyzed by one hot encoding to produce frequency of occurence of different venues in each neighborhood
5. The data is grouped by Neighborhood and filtered with only below venue categories
    1. Park
    2. Gym
    3. Gym / Fitness Center
    4. Sporting Goods Shop
    5. Yoga Studio
    6. Playground
    7. Trail
    8. Baseball Field
    9. Athletics & Sports
    10. Pool
    11. Basketball Stadium
    12. Hockey Arena
    13. Baseball Stadium
    14. Skate Park
    15. Soccer Field
    16. Stadium
    
The data looks like below
![Toronto_Neigh_Sports_Data.PNG](attachment:Toronto_Neigh_Sports_Data.PNG)

**K-Means Clustering**
1. A K-Means clustering with 3 clusters have been performed
2. Cluster labels are added to the dataframe

Here is the data look like
![Toronto_Neigh_Cluster.PNG](attachment:Toronto_Neigh_Cluster.PNG)

3. Below are the sum of occurances of each sporting venue in each cluster and statistics of the clusters

![Sports%20Vanue%20Occurence.PNG](attachment:Sports%20Vanue%20Occurence.PNG)
![Cluster_Stats.png](attachment:Cluster_Stats.png)

**Data Visualization**
1. The below box plot chart shows that wide variation and frequency of Sporting Venues are pesent in Cluster 0
![Cluster_Analysis_Chart.png](attachment:Cluster_Analysis_Chart.png)

2. The Neighborhood clusters in the map of Toronto has been plotted with help of matplotlib and folium

![Toronto_Clustering_Map.PNG](attachment:Toronto_Clustering_Map.PNG)

**Red dots belong to Cluster 0**

**Purple dots belong to Cluster 1**

**Green dots belong to Cluster 2**

## Deep Dive for Precise Recomendation

As per our initial assumption, the business owner would like to open the Sporting Goods in an area where there are Sporting related venues are present in close proximity. As per our analysis, Cluster 0 could be recommended. However, there are several neighborhoods in cluster 0. Inorder to recommend top 5 neighborhoods, below steps are performed.

1. The dataframe was sorted based on maximum types and maximum count of venues
2. The top 5 Neighborhoods are recommended to open the store

    1. Business Reply Mail Processing Centre 969 Eastern (Cluster 0)
    2. Rosedale (Cluster 1)
    3. Moore Park, Summerhill East (Cluster 1)
    4. Thorncliffe Park (Cluster 0)
    5. Queen's Park (Cluster 0)

So, now we have 5 neighborhoods to choose from. What if the Business Owner can't pick any one neighborhood? To solve this problem, I extracted the neighborhoods of already existing Sports Goods Stores (Venue category as "Sporting Goods Shop"). Then I searched which are the recommended neighborhoods don't have a sporting store. That would be the ideal location to open a store since people in that area can visit the store which would result in good profitability.

Below are the images of the top 5 recommended neighborhoods and already existing store locations

1. **Already Existing Sports Stores Locations**
![Existing_Store.PNG](attachment:Existing_Store.PNG)

2. **Top 5 Recommended Neighborhoods**
![Final_Recommendation.PNG](attachment:Final_Recommendation.PNG)

### Final Recommendation
After comparing the Top 5 recommendations and existing store locations, it's safe to conclude that The Sports Store could be opened in East Toronto Area (Marked with an arrow) since there are several Sporting/Fitness venues are present there. However, There are no existing Sporting Goods shops are present there. Which could result in great profitability and bring convinience to the people living around East Toronto.

## Validation

In order to validate the recommendation, I searched Sporting Goods Shop in Toronto in www.foursquare.com. Below is the map showing all the locations in Toronto.

![foursquare_actual_map.PNG](attachment:foursquare_actual_map.PNG)

The above map shows that there are no sporting shops around East Toronto. That gives confidence to the recommendation

## Project Summary & Conclusion

The scope of the project has evolved with time and analysis.

1. Initially a cluster of neighborhood was identified
2. Later on, Top 5 neighborhoods were identified in the cluster
3. Top 5 neighborhoods were compared to the neighborhoods where Sporting Goods Shops are already available.
4. Finally only 1 neighborhood is recommended to open the store and it was validated visually

**Future Scope**

This analysis is not only limited to the specific problem statement. Several other questions can also be answered by following this method. e.g - Where can I buy a house with my preferences? Where to go on a sunday to watch a movie, do shopping and have a fine dinner? 

More strength could be given to the recommendation engine by considering population data, tips and rating data from Foursquare. Hopefully, a standard benchmark could also be used to validate the recommendation.


## References
1. https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
2. [Four Square API](https://developer.foursquare.com/)
3. [Coursera](www.coursera.com)

### Thank You.
**Rakesh Prusty**

                                                                                                                       22nd Aug 2019