# Market research for opening restaurant in Toronto

### Introduction / Business Problem

Toronto along with being the financial capital of Canada and a city of great diversity is the provincial capital of Ontario and the most populous city in Canada. With a population of 2.7 million, and 6 million within the Greater Toronto Area, Toronto offers an interesting market for the restaurant business. The aim of this study is to determine which kind of restaurant to open there and to find a suitable location for doing so.

The type of restaurant to be opened will be the type of restaurant which has the highest average evaluation among Foursquare users. Based on this decision a suitable location for the new restaurant will be sought by looking for a location which best meets the following criteria:

1. The close vicinity does not yet contain the selected type of restaurant (in order to avoid unnecessary competition)
2. The nearest current restaurants have low ratings (in order to find competition easier to beat) 
3. There are many rather than few restaurants near the location (the number of restaurants is used as a proxy for the number of potential customers, locations with high numbers of restaurants should also be rich with potential clients)

### Data

The project will be based on Foursquare data drawn from its API. The dataset will comprise of 50 restaurants in downtown Toronto (specified as latitude = 43.651070, longitude = -79.347015).

In [1]:
import pandas as pd
import numpy as np
import math
import json
import getpass
import time
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # map rendering library

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

print('Libraries imported.')

Libraries imported.


In [2]:
# Settings for Foursquare API

CLIENT_ID = getpass.getpass(prompt='CLIENT_ID: ')
CLIENT_SECRET = getpass.getpass(prompt='CLIENT_SECRET: ')
VERSION = '20180605'
LIMIT = '100'
radius = '500'

CLIENT_ID: ········
CLIENT_SECRET: ········


In [99]:
# Toronto
latitude = 43.651070
longitude = -79.347015

print(latitude, longitude)

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query=restaurant&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            LIMIT)

43.65107 -79.347015


In [5]:
results = requests.get(url).json()

#### Send the GET Request and examine the results
#### Get relevant part of JSON and transform it into a *pandas* dataframe

In [63]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = pd.json_normalize(venues)

In [64]:
print(dataframe.shape)
dataframe

(50, 19)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,venuePage.id,location.crossStreet,location.neighborhood
0,5d3127c715cfea0007e44d52,est Restaurant,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",v-1598811115,False,729 Queen St. East,43.658911,-79.349035,"[{'label': 'display', 'lat': 43.658911, 'lng':...",887,M4M 1H1,CA,Toronto,ON,Canada,"[729 Queen St. East, Toronto ON M4M 1H1, Canada]",552734373.0,,
1,4ae77362f964a52069ab21e3,Mi Mi Restaurant,"[{'id': '4bf58dd8d48988d14a941735', 'name': 'V...",v-1598811115,False,688 Gerrard Street East,43.666293,-79.349079,"[{'label': 'display', 'lat': 43.66629336492295...",1702,M4M 1Y3,CA,Toronto,ON,Canada,"[688 Gerrard Street East (at De Grassi St.), T...",,at De Grassi St.,
2,4ad4c05cf964a520e1f520e3,Docks Restaurant & Night Club The,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",v-1598811115,False,11 Polson St.,43.641806,-79.354171,"[{'label': 'display', 'lat': 43.64180568910036...",1181,M5A 1A4,CA,Toronto,ON,Canada,"[11 Polson St. (at Cherry), Toronto ON M5A 1A4...",,at Cherry,
3,4ada5d5bf964a520e92121e3,The Hot House Restaurant & Bar,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",v-1598811115,False,35 Church St,43.648824,-79.373702,"[{'label': 'display', 'lat': 43.64882370529773...",2164,M5E 1T3,CA,Toronto,ON,Canada,"[35 Church St (at Front St E), Toronto ON M5E ...",,at Front St E,
4,4ad4c05cf964a52006f620e3,Victoria's Restaurant,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1598811115,False,37 King Street East,43.649298,-79.376431,"[{'label': 'display', 'lat': 43.64929834396347...",2377,M5C 1E9,CA,Toronto,ON,Canada,[37 King Street East (at Le Meridien King Edwa...,498556908.0,at Le Meridien King Edward Hotel,
5,4ae094a7f964a5208f8021e3,SCHOOL Restaurant,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",v-1598811115,False,70 Fraser Ave.,43.637775,-79.424297,"[{'label': 'display', 'lat': 43.6377753703417,...",6398,M6K 3C5,CA,Toronto,ON,Canada,"[70 Fraser Ave. (at Liberty St.), Toronto ON M...",38118429.0,at Liberty St.,
6,4ad4c060f964a5207bf720e3,Gio Rana's Really Really Nice Restaurant,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1598811115,False,1220 Queen St E,43.663367,-79.330425,"[{'label': 'display', 'lat': 43.66336657355032...",1912,M4M 1L7,CA,Toronto,ON,Canada,"[1220 Queen St E, Toronto ON M4M 1L7, Canada]",,,
7,4b7b268cf964a52061542fe3,Tender Trap Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1598811115,False,580 Parliament St,43.667724,-79.369485,"[{'label': 'display', 'lat': 43.66772379650776...",2590,M4X 1P8,CA,Toronto,ON,Canada,"[580 Parliament St (at Wellesley St), Toronto ...",,at Wellesley St,
8,4ad4c05cf964a520cef520e3,Sassafraz | Cafe | Restaurant | Private Events,"[{'id': '4bf58dd8d48988d171941735', 'name': 'E...",v-1598811115,False,100 Cumberland Street,43.670342,-79.391041,"[{'label': 'display', 'lat': 43.67034226367314...",4144,M5R 1A6,CA,Toronto,ON,Canada,"[100 Cumberland Street (Bellair St), Toronto O...",,Bellair St,Yorkville
9,4ad4c060f964a5207ff720e3,Rol San Restaurant 龍笙棧,"[{'id': '4bf58dd8d48988d1f5931735', 'name': 'D...",v-1598811115,False,323 Spadina Ave.,43.654318,-79.39865,"[{'label': 'display', 'lat': 43.65431754076345...",4174,M5T 2E9,CA,Toronto,ON,Canada,"[323 Spadina Ave. (at D'Arcy St.), Toronto ON ...",,at D'Arcy St.,Kensington Market


In [65]:
category_id = []
category_name = []

for x in range(len(dataframe)):
    category_id.append(dataframe['categories'][x][0].get('id', ''))
    category_name.append(dataframe['categories'][x][0].get('name', ''))

print('The restaurant categories are:\n', category_name)

['Italian Restaurant', 'Chinese Restaurant', 'Thai Restaurant', 'Korean Restaurant', 'Bar', 'Caribbean Restaurant', 'Indian Restaurant', 'American Restaurant', 'Breakfast Spot', 'Dim Sum Restaurant', 'Wine Bar', 'Ethiopian Restaurant', 'Vietnamese Restaurant', 'Diner', 'Japanese Restaurant', 'New American Restaurant', 'Event Space', 'Noodle House', 'Szechuan Restaurant', 'Restaurant']


In [73]:
print('The dataset consists of', len(category_id), 'restaurants.')
print('There are', len(set(category_id)), 'different types of restaurants in the data.')

The dataset consists of 50 restaurants.
There are 20 different types of restaurants in the data.


In [19]:
# Lists for storing information from venues
r_name = []
r_latitude = []
r_longitude = []
r_rating = []
r_ratingSignals = []
r_type = []
r_shortName = []

In [20]:
for i in dataframe['id']:
    
    # While loop to enable retrial if bad response
    while True:

        # Init retrial counter
        j = 0

        try:

            # create the API request URL
            temp_url_venue = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(
                i, # venue_id,
                CLIENT_ID, # self.client_id,
                CLIENT_SECRET, #self.client_secret,
                VERSION) # Settings.version)
    
            print(i)
    
            temp_response_venue = requests.get(temp_url_venue).json()
    
            # print(temp_response_venue)
    
            r_name.append(temp_response_venue['response']['venue'].get('name', ''))
            r_latitude.append(temp_response_venue['response']['venue']['location'].get('lat', np.nan))
            r_longitude.append(temp_response_venue['response']['venue']['location'].get('lng', np.nan))
            r_rating.append(temp_response_venue['response']['venue'].get('rating', np.nan))
            r_ratingSignals.append(temp_response_venue['response']['venue'].get('ratingSignals', np.nan))
            r_type.append(temp_response_venue['response']['venue']['categories'][0].get('name', ''))
            r_shortName.append(temp_response_venue['response']['venue']['categories'][0].get('shortName', ''))
    
    
            # Sleep long enough to ensure compliance with Foursquare API guidelines
            time.sleep(0.51)

        except:
            # Sleep long enough to ensure compliance with Foursquare API guidelines
            time.sleep(0.51)

            # Add to retrial counter
            j += 1
            print('Retrial ' + str(j))

            # Get back to while loop = retrial
            continue

        # Break the while loop if successful
        break

5d3127c715cfea0007e44d52
4ae77362f964a52069ab21e3
4ad4c05cf964a520e1f520e3
4ada5d5bf964a520e92121e3
4ad4c05cf964a52006f620e3
4ae094a7f964a5208f8021e3
4ad4c060f964a5207bf720e3
4b7b268cf964a52061542fe3
4ad4c05cf964a520cef520e3
4ad4c060f964a5207ff720e3
4bd47e6fcfa7b7139f2924da
4ad4c05ff964a52018f720e3
4b072e9df964a52009f922e3
4b02a076f964a520234922e3
5c33c7cd8194fc002c66f72e
4f872713e4b0abaa00979185
4ad4c05cf964a520cdf520e3
4b266f05f964a520657b24e3
4ad4c05cf964a520dff520e3
4ba2d00cf964a5203e1b38e3
4d0fb7b7a3d9721e0fb1d0fd
4ae29812f964a520288f21e3
4b223f5af964a520ba4424e3
4c6345c358810f477c71091e
59719d3c65211f46b1172eb2
4bc3ad5e461576b0db037f32
4bce1ec0cc8cd13a9ac8c3cf
576c61e5498ef74633413b66
4aef8854f964a5201cd921e3
5750b013498e755287c6de97
4b0c6669f964a5207e3c23e3
587bae87286804584acd2a7a
4b19c8b7f964a52017e423e3
5e1f32b49dfc69000731470f
4ad4c05ff964a52048f720e3
4b074bb1f964a52077fb22e3
4ada6d8df964a520832221e3
4b22e814f964a520175024e3
4b9fcc3df964a520a43f37e3
4b4d3b51f964a520e4ce26e3


### Methodology

Now that the restaurant data has been gathered the data will be aggregated by restaurant type, and mean ratings for the types of restaurants will be counted. The type of restaurant which has the highest mean rating will be decided to open. Lastly the restaurants will be placed on a map to scout a suitable place to open the new restaurant.

In [74]:
restaurant_data = {'Name': r_name, 
                   'Latitude': r_latitude,
                   'Longitude': r_longitude,
                   'Rating': r_rating,
                   'N': r_ratingSignals,
                   'Type': r_type,
                   'Short': r_shortName}

df_restaurants = pd.DataFrame(data=restaurant_data)
df_restaurants

Unnamed: 0,Name,Latitude,Longitude,Rating,N,Type,Short
0,est Restaurant,43.658911,-79.349035,,,American Restaurant,American
1,Mi Mi Restaurant,43.666293,-79.349079,8.3,55.0,Vietnamese Restaurant,Vietnamese
2,Docks Restaurant & Night Club The,43.641806,-79.354171,,,Bar,Bar
3,The Hot House Restaurant & Bar,43.648824,-79.373702,6.8,191.0,American Restaurant,American
4,Victoria's Restaurant,43.649298,-79.376431,7.7,9.0,Restaurant,Restaurant
5,SCHOOL Restaurant,43.637775,-79.424297,7.7,370.0,Breakfast Spot,Breakfast
6,Gio Rana's Really Really Nice Restaurant,43.663367,-79.330425,7.3,77.0,Italian Restaurant,Italian
7,Tender Trap Restaurant,43.667724,-79.369485,,,Chinese Restaurant,Chinese
8,Sassafraz | Cafe | Restaurant | Private Events,43.670342,-79.391041,,,Event Space,Event Space
9,Rol San Restaurant 龍笙棧,43.654318,-79.39865,6.6,303.0,Dim Sum Restaurant,Dim Sum


In [75]:
df_restaurants.dropna()

Unnamed: 0,Name,Latitude,Longitude,Rating,N,Type,Short
1,Mi Mi Restaurant,43.666293,-79.349079,8.3,55.0,Vietnamese Restaurant,Vietnamese
3,The Hot House Restaurant & Bar,43.648824,-79.373702,6.8,191.0,American Restaurant,American
4,Victoria's Restaurant,43.649298,-79.376431,7.7,9.0,Restaurant,Restaurant
5,SCHOOL Restaurant,43.637775,-79.424297,7.7,370.0,Breakfast Spot,Breakfast
6,Gio Rana's Really Really Nice Restaurant,43.663367,-79.330425,7.3,77.0,Italian Restaurant,Italian
9,Rol San Restaurant 龍笙棧,43.654318,-79.39865,6.6,303.0,Dim Sum Restaurant,Dim Sum
12,Sky Dragon Chinese Restaurant 龍翔酒樓,43.652783,-79.398174,6.0,72.0,Dim Sum Restaurant,Dim Sum
13,ONE Restaurant/Lounge,43.670809,-79.393272,7.8,134.0,New American Restaurant,New American
16,Insomnia Restaurant and Lounge,43.66518,-79.410966,8.8,441.0,Restaurant,Restaurant
17,Goldstone Noodle Restaurant 金石,43.652278,-79.398039,6.1,91.0,Noodle House,Noodles


In [80]:
print('Of the', df_restaurants.shape[0], 'restaurants', df_restaurants.dropna().shape[0], 'have been rated.')

Of the 50 restaurants 32 have been rated.


In [82]:
df_rated = df_restaurants.dropna()

df_rated_type = df_rated.groupby(['Type']).mean()
df_rated_type = df_rated_type.rename_axis(None, axis=1).reset_index()
df_rated_type = df_rated_type.drop(columns=['Latitude', 'Longitude', 'N'])
df_rated_type

Unnamed: 0,Type,Rating
0,American Restaurant,6.4
1,Bar,8.1
2,Breakfast Spot,6.85
3,Chinese Restaurant,7.066667
4,Dim Sum Restaurant,6.3
5,Diner,7.2
6,Indian Restaurant,6.7
7,Italian Restaurant,7.3
8,Japanese Restaurant,7.7
9,Korean Restaurant,7.55


In [84]:
# Count number of restaurants by type
df_rated_count = df_rated.groupby(['Type']).count()

# Count number of ratings by type
df_rated_sum = df_rated.groupby(['Type']).sum()

In [86]:
# Summary of restaurant ratings
df_summary = pd.merge(df_rated_type, df_rated_count, on='Type')
df_summary = pd.merge(df_summary, df_rated_sum, on='Type')
df_summary = df_summary.drop(columns=['Name', 'Latitude_x', 'Longitude_x', 'Rating_y', 'Short', 'Latitude_y', 'Longitude_y', 'Rating'])
df_summary.rename(columns={"Rating_x": "Average Rating", "N_x": "Count", "N_y": 'Total Ratings'}, inplace=True)
df_summary.sort_values(by=['Average Rating', 'Count'], inplace=True, ascending=[False, False])
df_summary['Average Rating'] = round(df_summary['Average Rating'], 2)
df_summary['Total Ratings'] = df_summary['Total Ratings'].apply(lambda x: int(x))
df_summary = df_summary.rename_axis(None, axis=1).reset_index()
df_summary = df_summary.drop(columns=['index'])

df_summary

Unnamed: 0,Type,Average Rating,Count,Total Ratings
0,New American Restaurant,8.35,2,374
1,Bar,8.1,1,22
2,Japanese Restaurant,7.7,1,12
3,Korean Restaurant,7.55,6,374
4,Italian Restaurant,7.3,1,77
5,Wine Bar,7.3,1,452
6,Vietnamese Restaurant,7.25,2,63
7,Diner,7.2,1,438
8,Chinese Restaurant,7.07,3,212
9,Breakfast Spot,6.85,2,402


In [96]:
print('Lowest rating for a restaurant was', min(df_rated['Rating']))
print('Highest rating for a restaurant was', max(df_rated['Rating']))
print('Average rating for a restaurant was', round(np.mean(df_rated['Rating']), 2))

# nan = white
# >= 8 = green
# < 8 and >= 7 = yellow
# < 7 = red

colors_rating = []

for y in df_restaurants['Rating']:
    if np.isnan(y):
        colors_rating.append('white')
    elif y >= 8:
        colors_rating.append('green')
    elif y < 7:
        colors_rating.append('red')
    else:
        colors_rating.append('yellow')

Lowest rating for a restaurant was 5.3
Highest rating for a restaurant was 8.9
Average rating for a restaurant was 7.11


In [97]:
colors_type = colors_rating.copy()

for x in range(len(df_restaurants['Type'])):
    if df_restaurants['Type'][x] == 'New American Restaurant':
        colors_type[x] = 'blue'

### Results

It turns out that the type of restaurant with the highest average ratings is the New American Restaurant, and thus it is decided to select opening a New American Restaurant.

For deciding the location of the new restaurant the 50 restaurants comprising the data are plotted onto a map of Toronto. The restaurants are colorcoded with a traffic light scheme so that the restaurants with a rating of 8 or higher are plotted with green dots, restaurants with a rating below 8 but greater than or equal to 7 are represented with a yellow dot, and restaurants with ratings less than 7 are plotted with red dots. The restaurants which have not been rated are plotted with a white dot.

To highlight the locations of the existing New American Restaurants, their dots are given a blue ring so that they can be found on the map.

In [109]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
i = -1
for lat, lng, type1, name1, rating1 in zip(df_restaurants['Latitude'], df_restaurants['Longitude'], df_restaurants['Type'], df_restaurants['Name'], df_restaurants['Rating']):
    i += 1
    label = '{}, {}, {}'.format(type1, name1, rating1)
    label = folium.Popup(label, parse_html=True)
    color1 = ['red', 'blue']
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color = colors_type[i],
        fill=True,
        fill_color=colors_rating[i],
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    

map_toronto

It turns out that there is a clustering of lowly rated restaurants in the Chinatown area. This area around the intersection of Dundas Street West and Spadina Avenue seems a suitable site for opening a New American Restaurant. The vicinity of this site already has in total ten restaurants which gives reason to believe that there are also customers in this area. While most of the restaurants are Asian, there is also a Breakfest Spot and a Caribean Restaurant so a New American Restaurant should not be out of place in this location.

The competition, the other New American Restaurants are also suitably far away.

The second map seen below has been centered and zoomed to show the Chinatown area. The five red dots signifying lowly rated restaurants as well as the abscence of green dots representing highly rated restaurants is clearly visibly.

In [110]:
map_toronto_2 = folium.Map(location=[43.653866, -79.398334], zoom_start=16)

# add markers to map
i = -1
for lat, lng, type1, name1, rating1 in zip(df_restaurants['Latitude'], df_restaurants['Longitude'], df_restaurants['Type'], df_restaurants['Name'], df_restaurants['Rating']):
    i += 1
    label = '{}, {}, {}'.format(type1, name1, rating1)
    label = folium.Popup(label, parse_html=True)
    color1 = ['red', 'blue']
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color = colors_type[i],
        fill=True,
        fill_color=colors_rating[i],
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_2)  
    

map_toronto_2

### Discussion

Since the number of restaurants in this study was only 50 of which 32 had been rated, all conclusions based on this limited data have to go with some grain of salt, but nevertheless there were two interesting results which stood out. The first one was how highly rated the [New American](https://en.wikipedia.org/wiki/New_American_cuisine) restaurants were. The style of cuisine was unknown to me before this project and my appetitite for trying dishes in this style has been awoken. The other surprise was how low the ratings of the different types of Chinese restaurants around Chinatown were. Whether these genuinely reflect low quality or are a result of disappointment caused by excessively high exceptations is an interesting question. Noteworthily the ratings of Korean restaurants in nearby Koreatown were clearly higher than their Chinese comparisons.


### Conclusion

This limited scope study on restaurants in Toronto demonstrated how useful the Foursquare API can be location based market research. While the scope of this study was small scale, with a little extra effort and incorporation of other data sources on this base a commercial level feasilbility study for a business plan could be performed.