# Optimizing the Location of New Italian Restaurants in Toronto, CA

### I. Background Information and Business Problem

Several criteria are involved in the process of choosing a location to start a new restaurant. In major cities like Toronto, Canada, a restaurant owner has to be particularly aware of the following factors:

1. Their target demographic
2. Whether or not their restaurant is likely to be frequented by the local population
3. Effective marketing strategies to boost public awareness of their venue
4. The existing competitors to the cuisine offered

Given the relative difficulty of this task, the purpose of this study is to aid individuals who are planning to open a new Italian restaurant in Toronto choose the right location by both providing and analyzing data relating to:

1. The per capita income of a given neighborhood
2. The population of a given neighborhood 
2. The competitors already present within a given neighborhood

Collectively, the findings in this report will not only enable Italian restaurant owners to establish a restaurant in the right location, but it will also ensure the restaurant's profitability in the long term.

### II. Analytical Approach and Data Evaluation

In order to make appropriate recommendations to restaurant owners about the ideal location to start a new Italian restaurant, data on the restaurant's competitors and target clientele will be needed. Of the factors relating to consumers, the biggest questions restaurant owners will need to answer are (1) whether or not the consumer is likely to eat at the restaurant based on their disposable income, and (2) whether there is a sufficiently large enough population in the neighborhood to attract a diverse customer base. The prediction is that the more wealthy a family is within a given neighborhood, the more likely they are to eat out at the restaurant. Moreover, it is expected that the more people there are in the neighborhood, the probability that the restaurant will attract more customers will be greater. 

Separately, looking at factors that do not pertain to consumers themselves, a restaurant owner will also have to consider the potential competitors to their business model based on whether there are similar restaurants offering the same cuisine as the new restaurant (in this case, Italian food). If there are, then it is expected that the restaurant will not be as profitable in the given location.

In order to evaluate these hypotheses, data on per capita income for a given neighborhood as well as the population of each neighborhood will be needed; this will come from Canadian census data for the city of Toronto. Additionally, a list of Italian restaurants will be needed for each neighborhood to evaluate the threat competitors pose to the new restaurant's business model; this data will come from the Foursquare location database that is accessible online.

### III. Methodology

#### A. Data Collection

First, all the datasets for use in this study must be downloaded from the appropriate sources. Start with the data on income and population for each of Toronto's neighborhoods.

In [6]:
# First import the necessary libraries
import pandas as pd
import numpy as np
import requests

# Download the file data on Toronto's neighborhoods (income and population)
csv_path='https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv'
df_population = pd.read_csv(csv_path,encoding='latin1')
print('Toronto Income and Population Data Loaded!')
df_population.head()

Toronto Income and Population Data Loaded!


Unnamed: 0,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,Neighbourhood Information,Neighbourhood Information,City of Toronto,Neighbourhood Number,,129,128,20,95,42,...,37,7,137,64,60,94,100,97,27,31
1,Neighbourhood Information,Neighbourhood Information,City of Toronto,TSNS2020 Designation,,No Designation,No Designation,No Designation,No Designation,No Designation,...,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,Emerging Neighbourhood
2,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2016",2731571,29113,23757,12054,30526,27695,...,16936,22156,53485,12541,7865,14349,11817,12528,27593,14804
3,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2011",2615060,30279,21988,11904,29177,26918,...,15004,21343,53350,11703,7826,13986,10578,11652,27713,14687
4,Population,Population and dwellings,Census Profile 98-316-X2016001,Population Change 2011-2016,4.50%,-3.90%,8.00%,1.30%,4.60%,2.90%,...,12.90%,3.80%,0.30%,7.20%,0.50%,2.60%,11.70%,7.50%,-0.40%,0.80%


Next we load the data on Toronto's restaurants.

In [7]:
# Download the file data on Toronto's neighborhoods (restaurant competitors) from Foursquare
!pip install geopy
!pip install folium
import folium
print('Folium installed')

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
# libraries for displaying images
from IPython.display import Image
from IPython.core.display import HTML
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
print('All libraries imported')

Collecting geopy
  Downloading geopy-1.21.0-py2.py3-none-any.whl (104 kB)
[K     |████████████████████████████████| 104 kB 5.0 MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49
  Downloading geographiclib-1.50-py3-none-any.whl (38 kB)
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-1.21.0
Folium installed
All libraries imported


In [8]:
CLIENT_ID = 'P5K4WD4L45UNOPS2NZ21XGKBAPHXSRT5QBMDUKMWUAREXPVK' # your Foursquare ID
CLIENT_SECRET = 'QETU3JMJD1X5LZEOFEDEZOTXLHH421DVHYLO4DVEYXDGV14A' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: P5K4WD4L45UNOPS2NZ21XGKBAPHXSRT5QBMDUKMWUAREXPVK
CLIENT_SECRET:QETU3JMJD1X5LZEOFEDEZOTXLHH421DVHYLO4DVEYXDGV14A


In [9]:
address = '100 Queen St. W. Toronto, ON M5H 2N2'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.6536032 -79.38400547469666


In [30]:
search_query = 'Italian'
radius = 1000000
print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

Italian .... OK!


'https://api.foursquare.com/v2/venues/search?client_id=P5K4WD4L45UNOPS2NZ21XGKBAPHXSRT5QBMDUKMWUAREXPVK&client_secret=QETU3JMJD1X5LZEOFEDEZOTXLHH421DVHYLO4DVEYXDGV14A&ll=43.6536032,-79.38400547469666&v=20180604&query=Italian&radius=1000000&limit=30'

In [31]:
results = requests.get(url).json()

In [32]:
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = pd.json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress
0,52f6816f11d24a43115dc834,Scaddabush Italian Kitchen & Bar,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1586117302,False,"382 Yonge Street, Unit #7",Gerrard,43.65892,-79.382891,"[{'label': 'display', 'lat': 43.65892029202872...",598,M5B 1S8,CA,Toronto,ON,Canada,"[382 Yonge Street, Unit #7 (Gerrard), Toronto ..."
1,5b897e92db1d81002c91df8c,Fabbrica Rustic Italian,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1586117302,False,66 Wellington St W,,43.647161,-79.381691,"[{'label': 'display', 'lat': 43.647161, 'lng':...",740,M5K 1E7,CA,Toronto,ON,Canada,"[66 Wellington St W, Toronto ON M5K 1E7, Canada]"
2,4e31afdd091a973ec9c5a2b5,"Punto Gelato, Simply Italian","[{'id': '4bf58dd8d48988d1c9941735', 'name': 'I...",v-1586117302,False,146 Cumberland St,btwn Avenue Rd & Bay St,43.669955,-79.392603,"[{'label': 'display', 'lat': 43.66995452843031...",1947,M5R 1A8,CA,Toronto,ON,Canada,"[146 Cumberland St (btwn Avenue Rd & Bay St), ..."
3,5e594c8a3de308000870c948,Elm Street Italian Deli,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1586117302,False,15 Elm Street,,43.65769,-79.38248,"[{'label': 'display', 'lat': 43.65769, 'lng': ...",471,M5G 1G7,CA,Toronto,ON,Canada,"[15 Elm Street, Toronto ON M5G 1G7, Canada]"
4,4bfc0289c3ba9521c00f9653,Italian Consulate Toronto,"[{'id': '4bf58dd8d48988d12c951735', 'name': 'E...",v-1586117302,False,136 Beverley St,Dundas Street,43.654027,-79.394104,"[{'label': 'display', 'lat': 43.65402694219784...",814,,CA,Toronto,ON,Canada,"[136 Beverley St (Dundas Street), Toronto ON, ..."


In [34]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
dataframe_filtered = dataframe_filtered[dataframe_filtered['categories']=='Italian Restaurant']
dataframe_filtered.reset_index()

Unnamed: 0,index,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
0,0,Scaddabush Italian Kitchen & Bar,Italian Restaurant,"382 Yonge Street, Unit #7",Gerrard,43.65892,-79.382891,"[{'label': 'display', 'lat': 43.65892029202872...",598,M5B 1S8,CA,Toronto,ON,Canada,"[382 Yonge Street, Unit #7 (Gerrard), Toronto ...",52f6816f11d24a43115dc834
1,1,Fabbrica Rustic Italian,Italian Restaurant,66 Wellington St W,,43.647161,-79.381691,"[{'label': 'display', 'lat': 43.647161, 'lng':...",740,M5K 1E7,CA,Toronto,ON,Canada,"[66 Wellington St W, Toronto ON M5K 1E7, Canada]",5b897e92db1d81002c91df8c
2,3,Elm Street Italian Deli,Italian Restaurant,15 Elm Street,,43.65769,-79.38248,"[{'label': 'display', 'lat': 43.65769, 'lng': ...",471,M5G 1G7,CA,Toronto,ON,Canada,"[15 Elm Street, Toronto ON M5G 1G7, Canada]",5e594c8a3de308000870c948
3,5,Scaddabush Italian Kitchen & Bar,Italian Restaurant,200 Front St W,at Simcoe St,43.644737,-79.385355,"[{'label': 'display', 'lat': 43.6447367776608,...",992,M5V 3J1,CA,Toronto,ON,Canada,"[200 Front St W (at Simcoe St), Toronto ON M5V...",581cad6a7c74e15859a6f890
4,7,Mustachio Italian Eatery,Italian Restaurant,595 Bay St,Dundas St,43.65616,-79.38319,"[{'label': 'display', 'lat': 43.65616, 'lng': ...",292,M5G 2C2,CA,Toronto,ON,Canada,"[595 Bay St (Dundas St), Toronto ON M5G 2C2, C...",573df789498e03dd8e54b166
5,8,The Fresh Italian,Italian Restaurant,,,43.654991,-79.387897,"[{'label': 'display', 'lat': 43.65499143746528...",349,,CA,Toronto,ON,Canada,"[Toronto ON, Canada]",51bf3866498e55ee55df8db0
6,9,LA's Italian + Bar,Italian Restaurant,,,43.65054,-79.384603,"[{'label': 'display', 'lat': 43.65053979517576...",344,,CA,,,Canada,[Canada],4f88cf84e4b002b90ab3b9b9
7,10,Kit Kat Italian Bar & Grill,Italian Restaurant,297 King St W,at John St,43.646416,-79.39003,"[{'label': 'display', 'lat': 43.64641598988062...",935,M5V 1J5,CA,Toronto,ON,Canada,"[297 King St W (at John St), Toronto ON M5V 1J...",4b3ace79f964a520ae6e25e3
8,11,The Fresh Italian Eatery,Italian Restaurant,"109 McCaul Street, Unit #42",Dundas Street West,43.653889,-79.390785,"[{'label': 'display', 'lat': 43.653889, 'lng':...",546,M5T 3K5,CA,Toronto,ON,Canada,"[109 McCaul Street, Unit #42 (Dundas Street We...",526fe29411d2aeb3803013b0
9,12,john's italian cafe,Italian Restaurant,27 Baldwin Street,,43.656127,-79.393301,"[{'label': 'display', 'lat': 43.65612672798775...",799,,CA,Toronto,ON,Canada,"[27 Baldwin Street, Toronto ON, Canada]",53daae5b498e9c9597c19b23


In [35]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the City of Toronto Office

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='City of Toronto Central Office',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)
    
# display map
venues_map

The blue dots in the map shown above indicate the locations of Italian restaurants in the Toronto area within a radius of 1 million meters. This radius is sufficient to encapsulate the entirety of the Toronto metropolitan area. It is evident that the most number of Italian restaurants are clustered close to the center of Toronto (that is, Toronto's Downtown area). Toronto's suburbs, however, have far fewer Italian restaurants. 

#### B. Data Analysis

We begin by exploring whether the suburbian districts in Toronto have a large enough population and a relatively high average disposable income to warrant the establishment of a new Italian restaurant in the neighborhood.

In [49]:
Neighborhoods = list(df_population.columns.values)
Neighborhoods = Neighborhoods[5:]
print(Neighborhoods)

['Agincourt North', 'Agincourt South-Malvern West', 'Alderwood', 'Annex', 'Banbury-Don Mills', 'Bathurst Manor', 'Bay Street Corridor', 'Bayview Village', 'Bayview Woods-Steeles', 'Bedford Park-Nortown', 'Beechborough-Greenbrook', 'Bendale', 'Birchcliffe-Cliffside', 'Black Creek', 'Blake-Jones', 'Briar Hill-Belgravia', 'Bridle Path-Sunnybrook-York Mills', 'Broadview North', 'Brookhaven-Amesbury', 'Cabbagetown-South St. James Town', 'Caledonia-Fairbank', 'Casa Loma', 'Centennial Scarborough', 'Church-Yonge Corridor', 'Clairlea-Birchmount', 'Clanton Park', 'Cliffcrest', 'Corso Italia-Davenport', 'Danforth', 'Danforth East York', 'Don Valley Village', 'Dorset Park', 'Dovercourt-Wallace Emerson-Junction', 'Downsview-Roding-CFB', 'Dufferin Grove', 'East End-Danforth', 'Edenbridge-Humber Valley', 'Eglinton East', 'Elms-Old Rexdale', 'Englemount-Lawrence', 'Eringate-Centennial-West Deane', 'Etobicoke West Mall', 'Flemingdon Park', 'Forest Hill North', 'Forest Hill South', 'Glenfield-Jane Heig

In [50]:
dfToronto = pd.DataFrame(index=Neighborhoods, columns=["Population_2016","Avg_Income_2016"])
dfToronto

Unnamed: 0,Population_2016,Avg_Income_2016
Agincourt North,,
Agincourt South-Malvern West,,
Alderwood,,
Annex,,
Banbury-Don Mills,,
...,...,...
Wychwood,,
Yonge-Eglinton,,
Yonge-St.Clair,,
York University Heights,,


In [85]:
# Now we populate the dataframe with the data
for index, row in dfToronto.iterrows():
    dfToronto.at[index, 'Population_2016'] = df_population[index][2].replace(",","")
    dfToronto.at[index, 'Avg_Income_2016'] = df_population[index][2264].replace(",","")
    
# Sort the data according to aveerage family income
dfToronto['Population_2016'] = dfToronto['Population_2016'].astype('float')
dfToronto['Avg_Income_2016'] = dfToronto['Avg_Income_2016'].astype('float')
df_sorted_income = dfToronto.sort_values('Avg_Income_2016', ascending = False)
df_sorted_income.head(20) # dimensions of 140 x 2

Unnamed: 0,Population_2016,Avg_Income_2016
Bridle Path-Sunnybrook-York Mills,9266.0,308010.0
Rosedale-Moore Park,20923.0,207903.0
Forest Hill South,10732.0,204521.0
Lawrence Park South,15179.0,169203.0
Casa Loma,10968.0,165047.0
Kingsway South,9271.0,144642.0
Leaside-Bennington,16828.0,125564.0
Bedford Park-Nortown,23236.0,123077.0
Yonge-St.Clair,12528.0,114174.0
Annex,30526.0,112766.0


In [86]:
df_sorted_population = dfToronto.sort_values('Population_2016', ascending = False)
df_sorted_population.head(20)

Unnamed: 0,Population_2016,Avg_Income_2016
Waterfront Communities-The Island,65913.0,70600.0
Woburn,53485.0,30878.0
Willowdale East,50434.0,45326.0
Rouge,46496.0,39556.0
L'Amoreaux,43993.0,31826.0
Islington-City Centre West,43965.0,52787.0
Malvern,43794.0,29573.0
Dovercourt-Wallace Emerson-Junction,36625.0,39740.0
Downsview-Roding-CFB,35052.0,34168.0
Parkwoods-Donalda,34805.0,42516.0


The above tables list the top twenty neighborhoods in Toronto ordered by (1) average family income and (2) population size. Based on the tables, the neighborhood Annex appears to be the only neighborhood that appears in both tables. Based on this observation, it is safe to conclude that Annex boasts both a high average family income and a high population that would make the establishment of an Italian restaurant relatively lucrative.

Now, it is important to double check that Annex is in a suburb that is relatively far from Downtown Toronto, where the majority of Italian reestaurant competitors are located.

In [94]:
# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [43.6698, -79.4076], # coordinates for the Annex neighborhood in Toronto, CA
    radius=10,
    color='Green',
    popup='Annex Neighborhood',
    fill = True,
    fill_color = 'Green',
    fill_opacity = 0.6
).add_to(venues_map)

# display map
venues_map

Clearly, the Annex is far away from the other comparablee Italian restaurants in the Toronto area.

### IV. Results

The results from the data tables and the maps generated in the section above indicated that there was a significant difference in the number of Italian restaurants located in Downtown Toronto compared to surrounding suburbian neighborhoods. After conducting an investigation to determine which of these neighborhoods would be the most profitable for a new restaurant owner to set up an Italian restaurant, it was found that the neeighborhood with the best balance of both a high average income and high population was the Annex. 

In this case, the Annex had an average income level of $112,766.00 and a population size of 30,526. This is beneficial for two reasons. First, the relatively high average income in the Annex neighborhood is more conducive to customers visiting a restaurant. This is because a family that has a higher disposable income is more capable and -- resultantly -- more likely to eat out at a new Italian restaurant. Second, the higher population of Annex means that there are more potential customers that the new restaurant can serve, making it an attractive location to build a strong customer base while increasing customer loyalty.

Finally, with regards to potential competitors, the green marker in the map above indicates that there are no other Italian restaurants in the neighborhood. This is a significant advantage to a restaurant owner seeking to establish a new Italian restaurant in Toronto because it means that they will not have to compete with other restaurants and/or diners offering similar foods at lower prices. Therefore, it would be much easier to turn a profit.

### V. Discussion and Conclusion

These recommendations are relatively ironclad and definitely approximate the ideal neighborhood to build a restaurant in a manner that is both logical and conducive to turning a profit. However, to further concretize the findings of this report, it is important to consider other factors that could influence these findings. For example, it was assumed in the analytical approach section that a restaurant owner should look for a populated suburb to build their new Italian restaurant. Yet it is entirely possible that even though a community may have a higher population, it does not necessarily mean that its people prefer eating Italian food. For that reason, future studies should incorporate an analysis of consumer demand segmented by each of Toronto's neighborhoods to get a more accurate picture of what the new restaurant's profit margins can expeect to be. 

Furthermore, on the subject of profit margins, further analysis should be done to not just identify the ideal *location* of a new Italian restaurant, but also the restaurant's profit estimates. This would give a restaurant owner a much clearer picture of (1) where to locate their restaurant and (2) precisely how profitable that location would be for their business. 