## The Best Neighborhood for opening a Chinese restaurant in Toronto

In [None]:
!conda install -c conda-forge geopy --yes

import requests 
import pandas as pd 
import numpy as np 
from geopy.geocoders import Nominatim
import folium 

### Introduction

The target audiences are my friend and his team who would care about this problem - the best place for a new Chinese restaurant.

My friend is looking to open a Chinese restaurant in Toronto, and he is looking for the place. 
He asks for my help to recommend the best place where they can open it.

### Data

Importing the dataset about postal code, borough and neighborhood in Toronto, and the latitude and longitude of each neighborhood. There are 103 neighborhoods, and I will decide which one is the best for opening a Chinese restaurant. I will use the Foursquare data to execute my idea.

In [300]:
toronto_neigh = pd.read_csv("toronto_neigh.csv")
toronto_neigh.drop('Unnamed: 0', axis=1, inplace=True)

print("Dataset Shape: ", toronto_neigh.shape)
print('\n')
toronto_neigh.head()

Dataset Shape:  (103, 5)




Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


#### Getting all venues in Toronto by using Foursquare data.

In [301]:
CLIENT_ID = 'BMD5UMUXO0EIAIWFLU4QH2N5C3HQ1W3GXNEUHIWMGZ52RLZN' 
CLIENT_SECRET = 'WSAWH1VJ3YHFRYVJREYRIM3TIFHVNQYENEXYGCSPYZUV4WCE' 

VERSION = '20180605'
LIMIT = 100
#radius = 1000

In [187]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [247]:
def getNearbyVenues(names, latitudes, longitudes):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):      
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng,  LIMIT)          
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']    
        # return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['id'],
           v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'],  
           v['venue']['categories'][0]['name']) for v in results])    
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 
                  'Venue ID','Venue Name', 'Venue Latitude','Venue Longitude',  'Venue Category']
    
    return(nearby_venues)

In [248]:
toronto_venues = getNearbyVenues(names=toronto_neigh['Neighborhood'],
                                   latitudes=toronto_neigh['Latitude'],
                                   longitudes=toronto_neigh['Longitude']
                                  )

print('dataset shape: ', toronto_venues.shape)
print('\n')
toronto_venues.head()

dataset shape:  (10167, 8)




Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue Name,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,4b8991cbf964a520814232e3,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,4bd4846a6798ef3bd0c5618d,Donalda Golf & Country Club,43.752816,-79.342741,Golf Course
2,Parkwoods,43.753259,-79.329656,4b8ec91af964a520053733e3,Graydon Hall Manor,43.763923,-79.342961,Event Space
3,Parkwoods,43.753259,-79.329656,4b149ea4f964a52029a523e3,Darband Restaurant,43.755194,-79.348498,Middle Eastern Restaurant
4,Parkwoods,43.753259,-79.329656,4bdccf4cafe8c9b6da285185,LCBO,43.757774,-79.314257,Liquor Store


### Methodology

By judging if the most common venues in a neighborhood include 'Chinese Restaurant' or not, I can choose some neighborhoods which are suitable for opening the Chinese restaurant.

By using Foursquare data, I can get rating and tips of each restaurant in these neighborhoods. 

By using the number, avg rating and tips of restaurants in each neighborhood, I can cluster them and decide the best neighborhood for the Chinese restaurant.

#### Getting all Chinese restaurant in Toronto (162), and showing them in the map.

In [268]:
toronto_venues_map = toronto_venues[toronto_venues['Venue Category']=='Chinese Restaurant']
toronto_venues_map.reset_index(drop=True, inplace = True)

print('dataset shape: ', toronto_venues_map.shape)
print('\n')
toronto_venues_map.head()

dataset shape:  (162, 8)




Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue Name,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,5b6a321d340a58002cc0d9db,Omni Palace Noodle House,43.771047,-79.33157,Chinese Restaurant
1,Parkwoods,43.753259,-79.329656,584e235102b60e2d40263821,天天渔港 Captain's Catch,43.774961,-79.333873,Chinese Restaurant
2,Parkwoods,43.753259,-79.329656,4ae71b0cf964a52078a821e3,Noodle Delight,43.772399,-79.320209,Chinese Restaurant
3,Victoria Village,43.725882,-79.315572,55dded07498eecf46ed3e0d9,Hakka Legend,43.726046,-79.286561,Chinese Restaurant
4,Victoria Village,43.725882,-79.315572,5269be82498e1cf7de5d5dd4,Super Hakka Restaurant,43.742892,-79.304949,Chinese Restaurant


In [307]:
address = 'Toronto, ON' # 43.653963, -79.387207

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [308]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(toronto_venues_map['Venue Latitude'], toronto_venues_map['Venue Longitude'], toronto_venues_map['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=3, popup=label, color='blue', fill=True, fill_color='#3186cc',
                        fill_opacity=0.7, parse_html=False).add_to(map_toronto)  

map_toronto

#### Analyzing each neighborhood by venue category.

In [269]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

print("Dataset Shape: ", toronto_grouped.shape)
print('\n')
toronto_grouped.head()

Dataset Shape:  (103, 320)




Unnamed: 0,Neighborhood,Zoo Exhibit,ATM,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Amphitheater,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,...,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0


#### Getting 3 common venues of each neighborhood

In [270]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [271]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

print("Dataset Shape: ", toronto_venues_sorted.shape)
print('\n')
toronto_venues_sorted.head()

Dataset Shape:  (103, 4)




Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar
1,Agincourt,Chinese Restaurant,Coffee Shop,Indian Restaurant
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Chinese Restaurant,Vietnamese Restaurant,Bubble Tea Shop
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Coffee Shop,Fast Food Restaurant,Sandwich Place
4,"Alderwood, Long Branch",Burger Joint,Coffee Shop,Breakfast Spot


#### Total 10 neighborhoods are suitable for my goal.

If there is 'Chinese restaurant' within 3 common venues of a neighborhood,  this neighborhood is suitable for my goal.

In [272]:
col = toronto_venues_sorted.columns.values
toronto_venues_sorted.loc[:,'OK for the Chinese restaurant'] = 'N'

for i in range(toronto_venues_sorted.shape[0]):
    for j in range(col.shape[0]-1):
        if toronto_venues_sorted.loc[i,col[j+1]].find('Chinese Restaurant')>=0 :
            toronto_venues_sorted.loc[i, 'OK for the Chinese restaurant'] = 'Y'
            
chn_rest_neigh_list = toronto_venues_sorted[toronto_venues_sorted['OK for the Chinese restaurant']=='Y']
chn_rest_neigh_list.reset_index(drop = True, inplace=True) 

chn_rest_neigh_list = chn_rest_neigh_list[['Neighborhood', 'OK for the Chinese restaurant']]

print("Dataset Shape: ", chn_rest_neigh_list.shape)
print('\n')
chn_rest_neigh_list.head()

Dataset Shape:  (10, 2)




Unnamed: 0,Neighborhood,OK for the Chinese restaurant
0,Agincourt,Y
1,"Agincourt North, L'Amoreaux East, Milliken, St...",Y
2,Bayview Village,Y
3,Cedarbrae,Y
4,"Clairlea, Golden Mile, Oakridge",Y


#### By analyzing Chinese restaurants of these 10 neighborhoods, deciding the best neighborhood.

There are 75 Chinese restaurants within these 10 neighborhoods.

In [258]:
toronto_venues_chn_rest=pd.DataFrame()
for j in range(chn_rest_neigh_list.shape[0]):
    for i in range(toronto_venues.shape[0]):
        if toronto_venues.loc[i, 'Neighborhood'] == chn_rest_neigh_list.loc[j, 'Neighborhood'] and toronto_venues.loc[i, 'Venue Category'] == 'Chinese Restaurant':
                toronto_venues_chn_rest=toronto_venues_chn_rest.append(toronto_venues.loc[i,])
                
toronto_venues_chn_rest.reset_index(drop=True, inplace=True)

print("Dataset Shape: ", toronto_venues_chn_rest.shape)
print('\n')
toronto_venues_chn_rest.head()

Dataset Shape:  (75, 8)




Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue Category,Venue ID,Venue Latitude,Venue Longitude,Venue Name
0,Agincourt,43.7942,-79.262029,Chinese Restaurant,58388814809a776da0e00646,43.797885,-79.270585,Grandeur Palace 華丽宮 (Grandeur Palace 華麗宮)
1,Agincourt,43.7942,-79.262029,Chinese Restaurant,5377a256498ea252667e0f7b,43.787392,-79.268387,Congee Me 小米粥鋪
2,Agincourt,43.7942,-79.262029,Chinese Restaurant,5aa5a8c85f68b930df32dc53,43.801909,-79.295409,Fishman Lobster Clubhouse Restaurant 魚樂軒
3,Agincourt,43.7942,-79.262029,Chinese Restaurant,4aedbb5df964a52069ce21e3,43.788068,-79.266768,Asian Legend 味香村
4,Agincourt,43.7942,-79.262029,Chinese Restaurant,4b2d6bcaf964a5204ed624e3,43.784752,-79.277787,Maple Yip Seafood 陸羽海鮮酒家


#### Getting rating and tips of each Chinese restaurant by using Foursquare data.

In [302]:
chn_rest_rating_tips = pd.DataFrame()

for i in range(toronto_venues_chn_rest.shape[0]):
    venue_id = toronto_venues_chn_rest.loc[i,'Venue ID']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

    result = requests.get(url).json()
    try:
        rating = result['response']['venue']['rating']
    except:
        rating = np.nan
    try:
        tips = result['response']['venue']['tips']['count']
    except:
        tips = np.nan
    
    chn_rest_rating_tips.loc[i, 'Neighborhood'] = toronto_venues_chn_rest.loc[i,'Neighborhood']
    chn_rest_rating_tips.loc[i, 'Name'] = toronto_venues_chn_rest.loc[i,'Venue Name']
    chn_rest_rating_tips.loc[i, 'rating'] = rating
    chn_rest_rating_tips.loc[i, 'tips'] = tips

chn_rest_rating_tips['rating'] = chn_rest_rating_tips['rating'].astype(np.float64)

print("Dataset Shape: ", chn_rest_rating_tips.shape)
print('\n')
chn_rest_rating_tips.head()

Dataset Shape:  (75, 4)




Unnamed: 0,Neighborhood,Name,rating,tips
0,Agincourt,Grandeur Palace 華丽宮 (Grandeur Palace 華麗宮),7.6,0.0
1,Agincourt,Congee Me 小米粥鋪,7.4,2.0
2,Agincourt,Fishman Lobster Clubhouse Restaurant 魚樂軒,8.8,7.0
3,Agincourt,Asian Legend 味香村,7.0,20.0
4,Agincourt,Maple Yip Seafood 陸羽海鮮酒家,7.2,12.0


#### Getting the number, avg rating and tips of Chinese restaurants for each neighborhood

In [303]:
chn_rest_rating_tips_avg = chn_rest_rating_tips.copy()
chn_rest_rating_tips_avg = chn_rest_rating_tips_avg.groupby('Neighborhood').mean().reset_index()

chn_rest_rating_tips_avg

Unnamed: 0,Neighborhood,rating,tips
0,Agincourt,7.177778,11.444444
1,"Agincourt North, L'Amoreaux East, Milliken, St...",7.207692,14.538462
2,Bayview Village,7.26,19.8
3,Cedarbrae,7.233333,9.333333
4,"Clairlea, Golden Mile, Oakridge",7.166667,10.0
5,"Clarks Corners, Sullivan, Tam O'Shanter",7.5,6.777778
6,"Dorset Park, Scarborough Town Centre, Wexford ...",7.35,14.666667
7,"Fairview, Henry Farm, Oriole",7.557143,13.142857
8,L'Amoreaux West,7.43,7.7
9,"Maryvale, Wexford",7.1,11.571429


In [304]:
chn_rest_num = toronto_venues_chn_rest[['Neighborhood','Venue ID' ]]
chn_rest_num = chn_rest_num.groupby('Neighborhood').count().reset_index()
chn_rest_num.rename(columns={'Venue ID':'Num'}, inplace = True)

chn_rest_num

Unnamed: 0,Neighborhood,Num
0,Agincourt,9
1,"Agincourt North, L'Amoreaux East, Milliken, St...",13
2,Bayview Village,5
3,Cedarbrae,6
4,"Clairlea, Golden Mile, Oakridge",3
5,"Clarks Corners, Sullivan, Tam O'Shanter",9
6,"Dorset Park, Scarborough Town Centre, Wexford ...",6
7,"Fairview, Henry Farm, Oriole",7
8,L'Amoreaux West,10
9,"Maryvale, Wexford",7


In [305]:
neigh_score = chn_rest_rating_tips_avg.copy()
neigh_score['Num'] = chn_rest_num['Num']

neigh_score

Unnamed: 0,Neighborhood,rating,tips,Num
0,Agincourt,7.177778,11.444444,9
1,"Agincourt North, L'Amoreaux East, Milliken, St...",7.207692,14.538462,13
2,Bayview Village,7.26,19.8,5
3,Cedarbrae,7.233333,9.333333,6
4,"Clairlea, Golden Mile, Oakridge",7.166667,10.0,3
5,"Clarks Corners, Sullivan, Tam O'Shanter",7.5,6.777778,9
6,"Dorset Park, Scarborough Town Centre, Wexford ...",7.35,14.666667,6
7,"Fairview, Henry Farm, Oriole",7.557143,13.142857,7
8,L'Amoreaux West,7.43,7.7,10
9,"Maryvale, Wexford",7.1,11.571429,7


#### Standardization and ordering by total score.

score = rating + tips + num

In [309]:
neigh_score['Num'] = neigh_score['Num']/neigh_score['Num'].sum()
neigh_score['rating'] = neigh_score['rating']/neigh_score['rating'].sum()
neigh_score['tips'] = neigh_score['tips']/neigh_score['tips'].sum()

neigh_score['score'] = neigh_score['rating'] + neigh_score['tips']+ neigh_score['Num']

neigh_score = neigh_score.sort_values('score', ascending = False)
neigh_score.reset_index(drop=True, inplace=True)

neigh_score

Unnamed: 0,Neighborhood,rating,tips,Num,score
0,"Agincourt North, L'Amoreaux East, Milliken, St...",0.098759,0.122198,0.173333,0.39429
1,Bayview Village,0.099476,0.166422,0.066667,0.332564
2,Agincourt,0.098349,0.096192,0.12,0.314541
3,"Fairview, Henry Farm, Oriole",0.103547,0.110467,0.093333,0.307348
4,"Dorset Park, Scarborough Town Centre, Wexford ...",0.100709,0.123275,0.08,0.303984
5,L'Amoreaux West,0.101805,0.064719,0.133333,0.299858
6,"Maryvale, Wexford",0.097283,0.097259,0.093333,0.287876
7,"Clarks Corners, Sullivan, Tam O'Shanter",0.102764,0.056968,0.12,0.279732
8,Cedarbrae,0.09911,0.078448,0.08,0.257558
9,"Clairlea, Golden Mile, Oakridge",0.098197,0.084051,0.04,0.222248


### Result

The score of the neighborhood of Agincourt North, L'Amoreaux East, Milliken, Steeles East is the highest.  

### Discussion

the neighborhood of Agincourt North, L'Amoreaux East, Milliken, Steeles East is the best place for opening a new Chinese restaurant.

In [312]:
neigh = neigh_score.loc[0,'Neighborhood']

the_best_one = toronto_venues_chn_rest[toronto_venues_chn_rest['Neighborhood']==neigh]
the_best_one.drop(['Venue ID','Venue Category','Venue Latitude','Venue Longitude'], axis = 1, inplace=True)
the_best_one.reset_index(drop=True, inplace=True)

the_best_one

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue Name
0,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Fishman Lobster Clubhouse Restaurant 魚樂軒
1,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Sun's Kitchen 拉麵王
2,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Grandeur Palace 華丽宮 (Grandeur Palace 華麗宮)
3,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Alton Restaurant 益街坊
4,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Magic Noodle 大槐樹
5,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,D&R Wings 美華茶餐廳
6,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Sugar Sweet Cafe 八爪魚
7,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Fishman Lobster Clubhouse Restaurant 魚樂軒
8,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Mr Congee Chinese Cuisine 龍粥記
9,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Congee Town 太皇名粥


Showing the Chinese restaurants of the neighborhood Agincourt in the map.

In [313]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(toronto_venues_map['Venue Latitude'], toronto_venues_map['Venue Longitude'], toronto_venues_map['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=3, popup=label, color='blue', fill=True, fill_color='#3186cc',
                        fill_opacity=0.7, parse_html=False).add_to(map_toronto)  

solution_map = toronto_venues_map[toronto_venues_map['Neighborhood'] == neigh_score.loc[0,'Neighborhood']]

# add markers to map
for lat, lng, neighborhood in zip(solution_map['Venue Latitude'], solution_map['Venue Longitude'], solution_map['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=3, popup=label, color='red', fill=True, fill_color='#3186cc',
                        fill_opacity=0.7, parse_html=False).add_to(map_toronto)  

map_toronto

### Conclusion

By using Foursquare data, I can get the number, avg rating and tips of restaurants in each neighborhood. Based on content-based recommendation system, I can get the best score within neighborhoods.