# A Recommender System for Chinese Restaurants in Toronto
## The capstone project of IBM Data Science Professional Certificate

In this project, I will build a recommender system for anyone who wants to enjoy Chinese food in one of the best restaurants in Toronto. The project will use the Foursquare free API to acquire the information about the restaurants, and give recommendations based on customers' past ratings.

### I. Prepare the data

#### Foursquare API

In [1]:
import requests
import urllib.request
import pandas as pd

In [2]:
CLIENT_ID = 'NELDHJBYBEKTRSPQ3FSZIQAX2FIQMPTUPTJC3G51J2IUYPBP' # your Foursquare ID
CLIENT_SECRET = 'PZUK3RM22COSNA3JXB0KVXFLR1TW0PVQRCPFVPHGMLRO4YJN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

\* The client id and secret will be hided when this file is uploaded.

#### Get the restaurants list

We will request for the Chinese restaurants near Yonge-Finch (Finch subway station) of Toronto.  
The coordinate is 43.7807387,-79.4162661.   
The category id for Chinese restaurants is 4bf58dd8d48988d145941735.  
We can only get 50 results from Foursquare's free API.

In [3]:
LIMIT = 50 # limit of number of venues returned by Foursquare API
radius = 8000 # define radius
category_id = "4bf58dd8d48988d145941735"
# Yonge-Finch lat and long 43.7807387,-79.4162661
#

# create URL
url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&intent=browse&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    43.7807387,-79.4162661, 
    category_id,
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/search?&client_id=NELDHJBYBEKTRSPQ3FSZIQAX2FIQMPTUPTJC3G51J2IUYPBP&client_secret=PZUK3RM22COSNA3JXB0KVXFLR1TW0PVQRCPFVPHGMLRO4YJN&v=20180605&ll=43.7807387,-79.4162661&intent=browse&categoryId=4bf58dd8d48988d145941735&radius=8000&limit=50'

In [4]:
results = requests.get(url).json()

We only need the basic information of the restaurants. Let's extract these values from the results.

In [6]:
df = []
for venue in results['response']['venues']:
    dict = {
            'id': venue['id'],
            'name' : venue['name']
        }
    for i, ctg in enumerate(venue['categories']):
        dict['ctg'+str(i)] = ctg['name']
    if len(venue['location']['labeledLatLngs']) != 0:
        dict['lat'] = venue['location']['labeledLatLngs'][0]['lat']
        dict['lng'] = venue['location']['labeledLatLngs'][0]['lng']
    df.append(dict)

df = pd.DataFrame.from_dict(df)

In [8]:
df.head()

Unnamed: 0,ctg0,id,lat,lng,name
0,Cantonese Restaurant,4b29222af964a520679924e3,43.81215,-79.357462,Congee Queen 皇后名粥
1,Dim Sum Restaurant,56ace162498e88c7171dd72e,43.844362,-79.387251,漁膳房 Yu Seafood
2,Chinese Restaurant,5c4a5f28acc5f5002c185358,43.85138,-79.408195,The One Fusion Cuisine 聚龍軒
3,Chinese Restaurant,5b634b5f95a722002c977b3e,43.822392,-79.351106,Sam's Congee Delight 黃三記
4,Cantonese Restaurant,534089f4498e9792e622eead,43.843543,-79.377949,Congee Queen 皇后名粥


In [9]:
len(df)

50

#### Get details of the restaurants

In [10]:
data_v=[]
for row in df.iterrows():
    print("Request details of id:", row[1]['id'])
    # create URL
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(
        row[1]['id'],
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION
    )
    results = requests.get(url).json()
    dict={}
    dict['name'] = results['response']['venue']['name']

    try:
        dict['parent'] = results['response']['venue']['parent']['categories'][0]['name']
    except:
        pass

    try:
        dict['hours'] = results['response']['venue']['hours']['status']
    except:
        pass

    try:
        l = []
        for attr in results['response']['venue']['attributes']['groups']:
            l.append(attr['summary'])
        dict['attributes'] = l
    except:
        pass

    try:
        l = []
        for cat in results['response']['venue']['categories']:
            l.append(cat['name'])
        dict['categories'] = l
    except:
        pass

    try:
        l = []
        for tip in results['response']['venue']['tips']['groups'][0]['items']:
            l.append(tip['text'])
        dict['tips'] = l
    except:
        pass

    try:
        dict['likes',results['response']['venue']['likes']['count']]
    except:
        pass

    try:
        dict['rating',results['response']['venue']['rating']]
    except:
        pass
    
    data_v.append(dict)

Request details of id: 4b29222af964a520679924e3
Request details of id: 56ace162498e88c7171dd72e
Request details of id: 5c4a5f28acc5f5002c185358
Request details of id: 5b634b5f95a722002c977b3e
Request details of id: 534089f4498e9792e622eead
Request details of id: 4bc1cfa4b492d13a6f76a660
Request details of id: 53ece74a498edc23fc8556f9
Request details of id: 4b0ec556f964a520c95a23e3
Request details of id: 4aebd5a7f964a520f0c421e3
Request details of id: 4b300a45f964a520f0f424e3
Request details of id: 4c90e5059087199ca2c6af31
Request details of id: 4b51613af964a520164c27e3
Request details of id: 537795de498e50bda372186e
Request details of id: 4bd396d041b9ef3b799c00e6
Request details of id: 4b91e835f964a520c9de33e3
Request details of id: 59a04cdaad910e6c34fb49ae
Request details of id: 4b2e5594f964a52069de24e3
Request details of id: 524b46f611d2999c1d6d7a24
Request details of id: 4ae71b0cf964a52078a821e3
Request details of id: 4ae66ccbf964a520eba621e3
Request details of id: 4ceea52a7db3224b4

In [11]:
details_df = pd.DataFrame(data_v)
details_df.head()

Unnamed: 0,attributes,categories,hours,name,parent,tips
0,"[$$, Reservations, Credit Cards, Outdoor Seati...",[Cantonese Restaurant],Open until 1:00 AM,Congee Queen 皇后名粥,Shopping Mall,[Try the congee]
1,"[$$, Reservations, Dinner, Lunch & more, Parking]","[Dim Sum Restaurant, Seafood Restaurant]",Open until Midnight,漁膳房 Yu Seafood,Shopping Mall,"[Very good dimsum, service and setting!]"
2,"[$$, Dinner, Lunch & more]","[Chinese Restaurant, Dim Sum Restaurant]",,The One Fusion Cuisine 聚龍軒,,"[Be prepare to wait, table will be given to fr..."
3,"[$, Brunch & Lunch]","[Chinese Restaurant, Cantonese Restaurant]",Closed until 8:30 AM tomorrow,Sam's Congee Delight 黃三記,,[The best congee in town! The wait will be qui...
4,,"[Cantonese Restaurant, Chinese Restaurant]",Open until 1:00 AM,Congee Queen 皇后名粥,,[Turnip and congee]


In [12]:
len(details_df)

50

There are much more information available for us to build the model. However, Foursquare only give us a few for free.

In this case, we found that for each restaurant there are more than one value in the categories, like Cantonese Restaurant, Dim Sum Restaurant, and so on. This is because even they are all Chinese restaurants, they are very different! As a Chinese food lover, one should understand that these subcategoies are very different. So it will be very helpful if the system can recommend restaurants for customers that have particular interest in some of the categories.

Before next step, let's save it in csv files first.

In [15]:
df.to_csv("restaurants.csv")

In [16]:
details_df.to_csv("res-details.csv")

#### Creating labels
First, we need to find out how many subcategories there are in the dataset.

In [21]:
l = list(details_df.categories)
labels = list(set([j for i in l for j in i]))
labels

['Bubble Tea Shop',
 'Asian Restaurant',
 'Cantonese Restaurant',
 'Event Space',
 'Cha Chaan Teng',
 'Halal Restaurant',
 'Dessert Shop',
 'Szechuan Restaurant',
 'Convention Center',
 'Dim Sum Restaurant',
 'Seafood Restaurant',
 'Chinese Restaurant',
 'Hong Kong Restaurant',
 'Noodle House',
 'BBQ Joint',
 'Buffet',
 'Bakery',
 'Taiwanese Restaurant',
 'Thai Restaurant',
 'Shanghai Restaurant']

In [28]:
labels_df = pd.DataFrame()
for l in labels:
    for i in range(0,50):
        if l in details_df.categories[i]:
            labels_df.at[i,l] = 1

In [29]:
labels_table = labels_df.sort_index(axis = 0).fillna(0)

In [37]:
res_df = df[['id','name']].join(labels_table)
res_df.head()

Unnamed: 0,id,name,Bubble Tea Shop,Asian Restaurant,Cantonese Restaurant,Event Space,Cha Chaan Teng,Halal Restaurant,Dessert Shop,Szechuan Restaurant,...,Seafood Restaurant,Chinese Restaurant,Hong Kong Restaurant,Noodle House,BBQ Joint,Buffet,Bakery,Taiwanese Restaurant,Thai Restaurant,Shanghai Restaurant
0,4b29222af964a520679924e3,Congee Queen 皇后名粥,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,56ace162498e88c7171dd72e,漁膳房 Yu Seafood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,5c4a5f28acc5f5002c185358,The One Fusion Cuisine 聚龍軒,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,5b634b5f95a722002c977b3e,Sam's Congee Delight 黃三記,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,534089f4498e9792e622eead,Congee Queen 皇后名粥,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### II. Create the user profile

In [32]:
userInput = [
    {'id':'534089f4498e9792e622eead', 'rating': 5},
    {'id':'4b2e5594f964a52069de24e3', 'rating': 4.5},
    {'id':'4ae66ccbf964a520eba621e3', 'rating': 5},
    {'id':'4b37b700f964a520e94425e3', 'rating': 3.5},
    {'id':'543cb4f5498ee8034fabc321', 'rating': 2},
]

In [33]:
inputRes = pd.DataFrame(userInput)

In [34]:
inputRes

Unnamed: 0,id,rating
0,534089f4498e9792e622eead,5.0
1,4b2e5594f964a52069de24e3,4.5
2,4ae66ccbf964a520eba621e3,5.0
3,4b37b700f964a520e94425e3,3.5
4,543cb4f5498ee8034fabc321,2.0


Filter out the restaurants from the input.

In [38]:
userRes = res_df[res_df['id'].isin(inputRes['id'].tolist())]

In [39]:
userRes

Unnamed: 0,id,name,Bubble Tea Shop,Asian Restaurant,Cantonese Restaurant,Event Space,Cha Chaan Teng,Halal Restaurant,Dessert Shop,Szechuan Restaurant,...,Seafood Restaurant,Chinese Restaurant,Hong Kong Restaurant,Noodle House,BBQ Joint,Buffet,Bakery,Taiwanese Restaurant,Thai Restaurant,Shanghai Restaurant
4,534089f4498e9792e622eead,Congee Queen 皇后名粥,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
16,4b2e5594f964a52069de24e3,Crown Prince Fine Dining 紫京盛宴,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
19,4ae66ccbf964a520eba621e3,Congee Wong 天皇名粥,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
36,4b37b700f964a520e94425e3,Golden Court Abalone Restaurant 黃金閣鮑翅海鮮酒家,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
46,543cb4f5498ee8034fabc321,ZenQ,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


Reset the index and drop all the id & name columns, leaving only the labels>

In [42]:
userLabelTable = userRes.reset_index(drop=True).drop('id',1).drop('name',1)

In [43]:
userLabelTable

Unnamed: 0,Bubble Tea Shop,Asian Restaurant,Cantonese Restaurant,Event Space,Cha Chaan Teng,Halal Restaurant,Dessert Shop,Szechuan Restaurant,Convention Center,Dim Sum Restaurant,Seafood Restaurant,Chinese Restaurant,Hong Kong Restaurant,Noodle House,BBQ Joint,Buffet,Bakery,Taiwanese Restaurant,Thai Restaurant,Shanghai Restaurant
0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


Now we can start to learn the user's preferences.

In [45]:
inputRes['rating']

0    5.0
1    4.5
2    5.0
3    3.5
4    2.0
Name: rating, dtype: float64

In [46]:
userProfile = userLabelTable.transpose().dot(inputRes['rating'])
userProfile

Bubble Tea Shop          2.0
Asian Restaurant         0.0
Cantonese Restaurant     8.5
Event Space              0.0
Cha Chaan Teng           0.0
Halal Restaurant         0.0
Dessert Shop             2.0
Szechuan Restaurant      0.0
Convention Center        0.0
Dim Sum Restaurant       8.0
Seafood Restaurant       0.0
Chinese Restaurant      13.5
Hong Kong Restaurant     0.0
Noodle House             0.0
BBQ Joint                0.0
Buffet                   0.0
Bakery                   0.0
Taiwanese Restaurant     2.0
Thai Restaurant          0.0
Shanghai Restaurant      0.0
dtype: float64

Now, we have the weights for every of the user's preferences. This is known as the User Profile. Using this, we can recommend restaurants that satisfy the user's preferences.

### III. Recommend restaurants

In [58]:
rec_df = res_df.set_index('id').drop('name',1)
rec_df.head()

Unnamed: 0_level_0,Bubble Tea Shop,Asian Restaurant,Cantonese Restaurant,Event Space,Cha Chaan Teng,Halal Restaurant,Dessert Shop,Szechuan Restaurant,Convention Center,Dim Sum Restaurant,Seafood Restaurant,Chinese Restaurant,Hong Kong Restaurant,Noodle House,BBQ Joint,Buffet,Bakery,Taiwanese Restaurant,Thai Restaurant,Shanghai Restaurant
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
4b29222af964a520679924e3,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
56ace162498e88c7171dd72e,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5c4a5f28acc5f5002c185358,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5b634b5f95a722002c977b3e,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
534089f4498e9792e622eead,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [59]:
recommendationTable = ((rec_df*userProfile).sum(axis=1))/(userProfile.sum())
recommendationTable.head()

id
4b29222af964a520679924e3    0.236111
56ace162498e88c7171dd72e    0.222222
5c4a5f28acc5f5002c185358    0.597222
5b634b5f95a722002c977b3e    0.611111
534089f4498e9792e622eead    0.611111
dtype: float64

In [60]:
recommendationTable = recommendationTable.sort_values(ascending=False)
recommendationTable.head()

id
4b37b700f964a520e94425e3    0.833333
5b634b5f95a722002c977b3e    0.611111
534089f4498e9792e622eead    0.611111
4b7895d2f964a52082d82ee3    0.611111
5c4a5f28acc5f5002c185358    0.597222
dtype: float64

In [62]:
#The final recommendation table
df.loc[res_df['id'].isin(recommendationTable.head().keys())]

Unnamed: 0,ctg0,id,lat,lng,name
2,Chinese Restaurant,5c4a5f28acc5f5002c185358,43.85138,-79.408195,The One Fusion Cuisine 聚龍軒
3,Chinese Restaurant,5b634b5f95a722002c977b3e,43.822392,-79.351106,Sam's Congee Delight 黃三記
4,Cantonese Restaurant,534089f4498e9792e622eead,43.843543,-79.377949,Congee Queen 皇后名粥
36,Chinese Restaurant,4b37b700f964a520e94425e3,43.844003,-79.388244,Golden Court Abalone Restaurant 黃金閣鮑翅海鮮酒家
40,Chinese Restaurant,4b7895d2f964a52082d82ee3,43.841508,-79.399478,John's Chinese BBQ Restaurant 敍香園


Now we give 5 recommendations based on user's preference.