Introduction / Business Problem

The business problem here is focused on finding the best location to open an Italian restaurant in Toronto, based on the peers, income and class of families. By analyzing such data I would be able to determine where the restaurant will be able to attract more people since this type of restaurants typically has a well-defined target population. By setting up the restaurant on the right spot we will be spending less on advertisement since our market niche is close, for example, and set prices according to it.

In [None]:
Data to analyze 

Toronto data can be found here: https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv
Using pandas I will be able to extract the information useful from the csv file, performing a segmentation of the Toronto population based on their income, age, family length, etc. This way I will be able to understand which is the best location to set up a restaurant where target clients are young/mid-age, single/ small families with average income.

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0          conda-forge
    geopy:           

In [None]:
csv_path='https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv'
df = pd.read_csv(csv_path,encoding='latin1')


In [None]:
Neighbourhoods = list(df.columns.values)
Neighbourhoods = Neighbourhoods[5:]
print(Neighbourhoods)

In [None]:
dfToronto = pd.DataFrame(index=Neighbourhoods, columns=["Population_2016","Income_2016"])
dfToronto.head()

In [None]:
# Population_2016 = Population, 2016
# Income_2016 = Total income: Average amount ($)


for index, row in dfToronto.iterrows():
    dfToronto.at[index, 'Population_2016'] = df[index][2]
    dfToronto.at[index, 'Income_2016'] = df[index][2264]
    

dfToronto.sort_values('Income_2016')

Methodology

In order to define the best location to open an Italian restaurant in Toronto I decide to evaluate and base my decision on the type of population living in the area, in the income people were generating, and accounted for the peers.
I defined the target client as a mid age, mid/high class worker, with higher than average income and also people in similar situation with kids.
In order to so I resorted to 2018 Census information combined with choropleth maps to check where the people of mid/high class and young/mid age were located. On a later stage Foursquare data was used to locate were the peers were. SEgmentation here was fundamental, by using an unsupervised ML method, K-means, I was able to find the best spot.

Results

Even though the majority of welthiest neighborhoods are located on the north, the big portion of restaurants is located on the south side of the city. This is visible by looking at the maps I have generated above. Moreover, the main streets of the city of Toronto are also filled with restaurants, similar to the one I am advising the location.
There is a clearly distinction between residential and "social" area, also visible on the maps.
Thus, I would advise to locate the Italian restaurant on the south side of the city, were people like to go and have some quality time, this way, we would be able to attract customers and get market share.

Conclusions

This was a simple way to show how to use data science to make an business, usual and important decision, by considering few factors, like the age, class and income of the people leaving in Toronto. Even though we would think we should open a restaurant right next to where the target client is located, this was not the final conclusion. The restaurant should open next to the other restaurants as this has proven to be the favorite area for people to go. Of course I could have extended the model and analyzed more variables that also impact the decision like the parking spots around, public transports and connection, etc. However this was a fun, short but insightful project.