# Restaurant Recommendation System

## 1. Introduction

Have you ever wanted to go out to eat but didn't know exactly where? Or perhaps you are in a new city and want to find the best restaurants based on your preferences? In this report a content-based recommendation system will be built to give the costumer personalized recommendations. This will be made using data from my hometown (Braga, Portugal) assuming the user already rated several restaurants. Then, based on the user's preferences we will use the recommendation system to give personalized recommendations on different restaurants located in Braga. This recommendation system could be used in other cities if we gathered the information and ratings of the restaurants located in the respective cities. Such could be useful if you are travelling to a new city and want personalized suggestions of where to eat.  

## 2. Data

### 2.1 Acquiring the Data

We will be using Forsquare API to obtain the required data:
* Name, type and rating of different restaurants located in Braga, Portugal.

Import all needed libraries

In [99]:
!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

import requests # get information stored in a url

from pandas.io.json import json_normalize # used to tranform json file into a pandas dataframe library

import pandas as pd # build and manipulate dataframes

import numpy as np

from bs4 import BeautifulSoup # web scraping tool



API Forsquare Credentials

In [100]:
CLIENT_ID = 'OA3HCOKYGV2111ZUPWOJLQRPYQF4ZOBLQ2SAWDPTPM41SLVV' # your Foursquare ID
CLIENT_SECRET = 'JRFG4OPQ2XACID1CF10R00KWZNDCME1XEQEDUFQAYYKKCXLQ' # your Foursquare Secret
ACCESS_TOKEN = 'HJ3E4T0ZUJSQNLQRP1PVI5PIIFW30WQEXMZRY5VBVBOHRCFD' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OA3HCOKYGV2111ZUPWOJLQRPYQF4ZOBLQ2SAWDPTPM41SLVV
CLIENT_SECRET:JRFG4OPQ2XACID1CF10R00KWZNDCME1XEQEDUFQAYYKKCXLQ


Get the geospatial coordinates of the city of Braga and use Foursquare API to explore the venues in a given radius. Convert the relevant information into a pandas dataframe.

In [101]:
# get the geospatial coordinates of the city of Braga, Portugal
address = 'Braga, Portugal'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude_braga = location.latitude
longitude_braga = location.longitude

# explore the venues in a given radius and get them in json format 

radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude_braga, longitude_braga, VERSION, radius, LIMIT)
results = requests.get(url).json()

# assign relevant part of JSON to venues
venues = results['response']['groups'][0]['items']

# tranforming json file into a pandas dataframe library
dataframe = json_normalize(venues)

# filter the information you are interested in and convert it to a pandas dataframe
dataframe['Category'] = [venues[i]['venue']['categories'][0]['name'] for i in range(len(venues))]
dataframe['Name'] = [venues[i]['venue']['name'] for i in range(len(venues))]
dataframe['Address'] = [venues[i]['venue']['location']['formattedAddress'][0] for i in range(len(venues))]
dataframe['Latitude'] = [venues[i]['venue']['location']['lat'] for i in range(len(venues))]
dataframe['Longitude'] = [venues[i]['venue']['location']['lng'] for i in range(len(venues))]

filtered_columns = ['Name','Category','Address','Latitude','Longitude']
dataframe_filtered = dataframe.loc[:,filtered_columns]

dataframe_filtered

Unnamed: 0,Name,Category,Address,Latitude,Longitude
0,Jardim de Sta. Bárbara,Garden,"R. Dr. Justino da Cruz, 127 (R. Eça de Queiroz)",41.551308,-8.425489
1,Sé de Braga,Church,R. D. Paio Mendes,41.549835,-8.427574
2,DeGema Hamburgueria Artesanal,Burger Joint,Pç. Dr. José Augusto Ferreira Salgado (R. Dr. ...,41.551292,-8.425344
3,Dona Petisca,Restaurant,Braga,41.549792,-8.427953
4,Setra,Bar,"Rua de São João, 15",41.550058,-8.425931
...,...,...,...,...,...
95,A Astoria,Portuguese Restaurant,Portugal,41.551572,-8.423084
96,Lado B,Brewery,4700-422 Braga,41.550167,-8.429622
97,Confeitaria Sto. António,Coffee Shop,Portugal,41.552217,-8.427534
98,Lusitana,Bakery,"Rua Dr. Justino Cruz, 127 (Jardim Sta. Bárbara)",41.551325,-8.425460


### 2.2 Data Preprocessing

Now, let's select all the venues with 'Restaurant' in the Category column.

In [102]:
# create a function that filters a dataframe based on a specific word located in a specific column
def search_word(word, dataframe, column):
    """
    Given a word(string), a pandas dataframe and a column(string), return a dataframe only with 
    the entries that contain the input word in the input column.
    """
    Empty = []
    index = [i for i in range(len(dataframe)) if dataframe[column].str.findall(word)[i]!=Empty]
    dataframe = dataframe.loc[index,:]
    return dataframe 

dataframe_braga_restaurants = search_word("Restaurant", dataframe_filtered, "Category")
dataframe_braga_restaurants

Unnamed: 0,Name,Category,Address,Latitude,Longitude
3,Dona Petisca,Restaurant,Braga,41.549792,-8.427953
6,Michizaki,Japanese Restaurant,"R. D. Frei Caetano Brandão, 169",41.548418,-8.428323
11,Cozinha da Sé,Portuguese Restaurant,"R. D. Frei Caetano Brandão, 95",41.550021,-8.428584
13,Anjo Verde,Vegetarian / Vegan Restaurant,"Lg. da Praça Velha, 21 (ao Arco da Porta Nova)",41.550206,-8.429015
21,Adega Malhoa,Portuguese Restaurant,"R. D. Paio Mendes, 17",41.549642,-8.428807
23,Lakkana,Thai Restaurant,"R. Don Gualdim Pais, 34",41.549186,-8.427645
25,La Piola,Italian Restaurant,R. D. Afonso Henriques,41.548922,-8.427519
26,Lapa Sushi,Asian Restaurant,"Praça da República, 4",41.551688,-8.423488
27,Nikko,Sushi Restaurant,Largo de S. Paulo,41.548215,-8.427458
28,Casa de Pasto das Carvalheiras,Tapas Restaurant,"R. D. Afonso Henriques, 8",41.548484,-8.428799


Let's check if ratings are available for these venues. 

In [103]:
for i in dataframe_braga_restaurants.index.values:
    venue_id = venues[3]['venue']['categories'][0]['id']

    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)

    result = requests.get(url).json()
    try:
        print("Rating is:", result['response']['venue']['rating'])
    except:
        pass

print("If only this output is generated, none of the venues has been rated yet.")

If only this output is generated, none of the venues has been rated yet.


Since there are no ratings available, we will need to obtain them somehow. Here, I will web scrap the information stored in this [link](https://d7leadfinder.com/app/view-leads/1680829/) as it contains a list of restaurants in Braga with the respective rating. 

In [104]:
url_ratings = 'https://d7leadfinder.com/app/view-leads/1680829/'

# obtain the data from the url and convert it to a BeautifulSoup object
data  = requests.get(url_ratings).text 
soup = BeautifulSoup(data, 'html5lib')

tables = soup.find_all("tbody") #find tables
table = tables[0] #select table of interest

table_rows = table.find_all("tr") #select table rows

# get the name and rating of the restaurants and store them in separate lists
name = []
rating = []

for i,row in enumerate(table_rows): 
    if row.find_all("td")[0].a != None: # restaurants names that are embbeded as a hyperlink
        name.append(row.find_all("td")[0].a.text)
        rating.append(row.find_all("td")[11].text)
    
    if row.find_all("td")[0].a == None: # restaurant names that are NOT embbeded as a hyperlink
        name.append(row.find_all("td")[0].text)
        rating.append(row.find_all("td")[11].text)
        
# create a dataframe with the name and rating of each restaurant
braga_restaurants_rating = {"Name": name,
                            "Rating": rating }

df_restaurants_braga_rating = pd.DataFrame(braga_restaurants_rating)
df_restaurants_braga_rating

Unnamed: 0,Name,Rating
0,Bira dos Namorados,4.8 (2308)
1,Michizaki,4.8 (573)
2,Restaurant,4.4 (799)
3,Restaurante O Jacó,4.5 (833)
4,Lakkana,4.6 (354)
...,...,...
200,Take Away Wok-Grill,4.3 (881)
201,Mammamia - Gelato I...,4.4 (36)
202,Carreira do Tiro,4.3 (258)
203,Boutique do Leitão,4 (1)


Let us clean the data: drop the rows that don't have rating and convert the rating to a single integer (this means we will exclude the number of votes from the rating column). This information could be used to build a better recomendation system, but for now let's make it simple.

In [105]:
# dropping the rows with null rating values
df_restaurants_braga_rating = df_restaurants_braga_rating[df_restaurants_braga_rating!='']
df_restaurants_braga_rating.dropna(inplace=True)
df_restaurants_braga_rating.set_index(pd.Index([*range(0,len(df_restaurants_braga_rating))]), inplace=True)
df_restaurants_braga_rating

# dropping the number of votes from the rating column 
rating_no_NaN = df_restaurants_braga_rating["Rating"].tolist()

rating_split = []

for rat in rating_no_NaN:
    rating = str(rat).split(" ")[0]
    rating_split.append(float(rating))

df_restaurants_braga_rating["Rating"] = rating_split
df_restaurants_braga_rating

Unnamed: 0,Name,Rating
0,Bira dos Namorados,4.8
1,Michizaki,4.8
2,Restaurant,4.4
3,Restaurante O Jacó,4.5
4,Lakkana,4.6
...,...,...
196,Take Away Wok-Grill,4.3
197,Mammamia - Gelato I...,4.4
198,Carreira do Tiro,4.3
199,Boutique do Leitão,4.0


We want to include the ratings in the main dataframe ````dataframe_braga_restaurants````. This will be accomplished by:

1. Editing the values of the **Name** column of the rating dataframe ````df_restaurants_braga_rating```` so they are the same as the values of the **Name** column of the ````dataframe_braga_restaurants```` dataframe. *Be aware that the previously defined function search_word() could be very helpful for this step*.

2. Creating a new rating dataframe with the names of the main dataframe ````dataframe_braga_restaurants```` and the respective rankings.

3. Filtering the entries of the main dataframe based on the new rating dataframe, and add the ratings of the respective restaurants.

In [106]:
### 1. replace the names of the rating dataframe by the names of the main dataframe 
df_restaurants_braga_rating["Name"].replace({'BLB ( Braga Loves Bifa...':'BLB - Braga Loves Bifana',
                                             'Sale&Dolce / Nikko' : 'Nikko',
                                             'Tasquinha Dom Ferreira' : 'Tasca D. Ferreira',
                                             'Restaurante Dom Aug...' : 'Dom Augusto',
                                             'Restaurante Ignacio' : 'Ignácio',
                                             'Restaurante Trotas' : 'Trotas',
                                             'Alma dEça': "Alma d'Eça",
                                             'Restaurante O Norte da...':'Norte da China',
                                             'Cibo No Prato. Rest...':'Um Cibo no Prato',
                                             'Restaurante O Bacalhau':'O Bacalhau',
                                             'Café Astória': 'A Astoria',
                                             'Botafogo':'Restaurante Botafogo',
                                             'BRAC' : 'Brac'}, inplace=True)

# correct some entries such as the one nammed *Sale&Dolce / Nikko*, as it represents different restaurants.
saledolce = {"Name": 'Sale & Dolce',
             "Rating": 4.1}

df_restaurants_braga_rating.append(saledolce, ignore_index=True)

### 2. create a new rating dataframe with the new correct names and respective ranking
df_rating = df_restaurants_braga_rating[df_restaurants_braga_rating["Name"].isin(dataframe_braga_restaurants["Name"])][:]

df_rating.sort_values(by="Name", inplace = True)
df_rating.set_index(pd.Index([*range(0,len(df_rating))]), inplace=True)

df_rating

Unnamed: 0,Name,Rating
0,A Astoria,3.8
1,Adega Malhoa,4.2
2,Alfacinha,4.7
3,Alma d'Eça,4.5
4,BLB - Braga Loves Bifana,4.3
5,Boutique do Leitão,4.0
6,Brac,4.6
7,Dom Augusto,4.6
8,Dona Petisca,4.6
9,Gosto Superior,4.7


In [107]:
### 3. Select the entries of the main dataframe which have rating, and include its value in a new column named "Rating"   
dataframe_braga_restaurants = dataframe_braga_restaurants[dataframe_braga_restaurants["Name"].isin(df_rating["Name"])][:]

dataframe_braga_restaurants.sort_values(by="Name", inplace = True)
dataframe_braga_restaurants.set_index(pd.Index([*range(0,len(dataframe_braga_restaurants))]), inplace=True)

dataframe_braga_restaurants["Rating"] = df_rating["Rating"][:]
dataframe_braga_restaurants

Unnamed: 0,Name,Category,Address,Latitude,Longitude,Rating
0,A Astoria,Portuguese Restaurant,Portugal,41.551572,-8.423084,3.8
1,Adega Malhoa,Portuguese Restaurant,"R. D. Paio Mendes, 17",41.549642,-8.428807,4.2
2,Alfacinha,Vegetarian / Vegan Restaurant,R. D. Gonçalo Pereira,41.548897,-8.427012,4.7
3,Alma d'Eça,Sushi Restaurant,R. Eça de Queirós,41.551688,-8.425949,4.5
4,BLB - Braga Loves Bifana,Modern European Restaurant,Largo Da Senhora A Branca,41.551941,-8.416705,4.3
5,Boutique do Leitão,Restaurant,R. Eça de Queirós,41.551668,-8.426193,4.0
6,Brac,Restaurant,Campo das Carvalheiras,41.548646,-8.428869,4.6
7,Dom Augusto,Portuguese Restaurant,Rua de São Vicente 222,41.555397,-8.422152,4.6
8,Dona Petisca,Restaurant,Braga,41.549792,-8.427953,4.6
9,Gosto Superior,Vegetarian / Vegan Restaurant,Praça Mouzinho de Albuquerque,41.553455,-8.420367,4.7


Now, let's look at the different categories and edit them so we can build a better recommendation system. Some editing will be made based on the following category descriptions:
* **Portuguese Restaurant**: Traditional portuguese cuisine. 
* **Vegetarian / Vegan Restaurant**: Vegetarian / Vegan options only.
* **Modern European Restaurant**: Modern approach of Mediterranean cuisine.
* **Healthy Food Restaurant**: Includes vegetarian / vegan options as well as non-vegetarian healthy food.
* **Sushi Restaurant vs Japanese Restaurant**: Sushi restaurants focus mainly on Sushi while Japanese restaurants serve traditional japanese dishes.

**Important Note**: If a restaurant fits in more than one category, it is reasonable to insert it more than one time under different categories (eg. Michizaki serves traditional japanese cuisine as well as sushi, so we will insert two different entries for this restaurant). 

In [108]:
index_portuguese_restaurants = [5,17,19,21]
index_modern_european = [6,22]
index_tapas = [8]
index_healthy = [20]

dataframe_braga_restaurants.loc[index_portuguese_restaurants,"Category"] = 'Portuguese Restaurant'
dataframe_braga_restaurants.loc[index_modern_european,"Category"] = 'Modern European Restaurant'
dataframe_braga_restaurants.loc[index_tapas,"Category"] = 'Tapas Restaurant'
dataframe_braga_restaurants.loc[index_healthy,"Category"] = 'Healthy Food Restaurant'

michizaki = {"Name": dataframe_braga_restaurants.loc[14,"Name"],
             "Category": 'Sushi Restaurant',
             "Address" : dataframe_braga_restaurants.loc[14,"Address"],
             "Latitude" : dataframe_braga_restaurants.loc[14,"Latitude"],
             "Longitude" : dataframe_braga_restaurants.loc[14,"Longitude"],
             "Rating": 4.8}

dataframe_braga_restaurants_new = dataframe_braga_restaurants.append(michizaki, ignore_index=True)[:]
dataframe_braga_restaurants_new

Unnamed: 0,Name,Category,Address,Latitude,Longitude,Rating
0,A Astoria,Portuguese Restaurant,Portugal,41.551572,-8.423084,3.8
1,Adega Malhoa,Portuguese Restaurant,"R. D. Paio Mendes, 17",41.549642,-8.428807,4.2
2,Alfacinha,Vegetarian / Vegan Restaurant,R. D. Gonçalo Pereira,41.548897,-8.427012,4.7
3,Alma d'Eça,Sushi Restaurant,R. Eça de Queirós,41.551688,-8.425949,4.5
4,BLB - Braga Loves Bifana,Modern European Restaurant,Largo Da Senhora A Branca,41.551941,-8.416705,4.3
5,Boutique do Leitão,Portuguese Restaurant,R. Eça de Queirós,41.551668,-8.426193,4.0
6,Brac,Modern European Restaurant,Campo das Carvalheiras,41.548646,-8.428869,4.6
7,Dom Augusto,Portuguese Restaurant,Rua de São Vicente 222,41.555397,-8.422152,4.6
8,Dona Petisca,Tapas Restaurant,Braga,41.549792,-8.427953,4.6
9,Gosto Superior,Vegetarian / Vegan Restaurant,Praça Mouzinho de Albuquerque,41.553455,-8.420367,4.7


Since we are building a content-based recommendation system, it is ideal to feed categorical data using the One Hot Encoding technique. Basically we convert the respective features (in this case restaurant categories) in vectors, so that if a restaurant matches that feature it gives a value of 1, and if it doesn't then it retrieves a value of 0.

At this point you may be wondering why are we doing this if each restaurant fits in just one category. One must consider the possibility of a restaurant having different menu options such as Michizaki (serves both sushi and japanese traditional food). However, attributing different categories to the same restaurant has a drawback: since each restaurant has only one general rating it makes it a bit ambiguous (the rating **4.5** for the restaurant **Michizaki** applies for both sushi and japanese tradition food, despite of representing different categories). However, it is also reasonable to assume that if the restaurant has a very good/low rating you will probably enjoy/not enjoy the food for all the categories. On a separate note this feature could be useful in the user rating input since we can attribute different ratings for the same restaurant under different categories.

In [109]:
dataframe_braga_restaurants_freq = dataframe_braga_restaurants_new[:]

for category in dataframe_braga_restaurants_freq["Category"].unique():
    dataframe_braga_restaurants_freq["{}".format(category)] = np.zeros(len(dataframe_braga_restaurants_freq))

    for i in range(len(dataframe_braga_restaurants_freq)):
        if category == dataframe_braga_restaurants_freq.loc[i,"Category"]:
            dataframe_braga_restaurants_freq.at[i,"{}".format(category)] = 1
            
dataframe_braga_restaurants_onehot = dataframe_braga_restaurants_freq.drop(['Category','Address','Latitude','Longitude','Rating'],1)
dataframe_braga_restaurants_onehot

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,Name,Portuguese Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Modern European Restaurant,Tapas Restaurant,Italian Restaurant,Thai Restaurant,Japanese Restaurant,Chinese Restaurant,Healthy Food Restaurant
0,A Astoria,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Adega Malhoa,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alfacinha,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Alma d'Eça,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BLB - Braga Loves Bifana,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Boutique do Leitão,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Brac,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Dom Augusto,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Dona Petisca,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
9,Gosto Superior,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 3. Content-based recommendation system

A content-based recommendation system recommends the user new items based on the user's preferences. In this case, we are going to figure out which restaurant categories the user prefers and use it to suggest new restaurants. 

In order to build a content-based recommendation system, user rating data of different restaurants needs to be inserted. I will construct a user dataset nammed ````userInput```` based on my previous experiences. 

*Note*: Once the user starts going to different restaurants and provide a rating for each, the dataset ````userInput```` will grow and more accurate suggestions will be made.

In [110]:
name = ['Il Fiume', 'Sale & Dolce', 'Otsu Biru', 'Pórtico', 'Temple San', 'Café Aires', 'Pausa Útil']
category = ['Italian Restaurant', 'Italian Restaurant', 'Sushi Restaurant', 'Portuguese Restaurant', 'Sushi Restaurant', 'Portuguese Restaurant','Vegetarian / Vegan Restaurant']
rating = [4,4.9,4.5,4.7,4,3,4.5]

userInput = {"Name": name,
               "Category": category,
               "Rating": rating}

dataframe_user_rating = pd.DataFrame(userInput)
dataframe_user_rating

Unnamed: 0,Name,Category,Rating
0,Il Fiume,Italian Restaurant,4.0
1,Sale & Dolce,Italian Restaurant,4.9
2,Otsu Biru,Sushi Restaurant,4.5
3,Pórtico,Portuguese Restaurant,4.7
4,Temple San,Sushi Restaurant,4.0
5,Café Aires,Portuguese Restaurant,3.0
6,Pausa Útil,Vegetarian / Vegan Restaurant,4.5


Let's apply the One Hot Encoding technique to the user's input dataset.

In [111]:
dataframe_user_rating_freq = dataframe_user_rating

for category in dataframe_braga_restaurants_onehot.drop('Name',1).columns.values:
    dataframe_user_rating_freq["{}".format(category)] = np.zeros(len(dataframe_user_rating_freq))

    for i in range(len(dataframe_user_rating_freq)):
        if category == dataframe_user_rating_freq.loc[i,"Category"]:
            dataframe_user_rating_freq.at[i,"{}".format(category)] = 1

dataframe_user_rating_onehot = dataframe_user_rating_freq.drop(['Category','Rating'],1)
dataframe_user_rating_onehot

Unnamed: 0,Name,Portuguese Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Modern European Restaurant,Tapas Restaurant,Italian Restaurant,Thai Restaurant,Japanese Restaurant,Chinese Restaurant,Healthy Food Restaurant
0,Il Fiume,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1,Sale & Dolce,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,Otsu Biru,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Pórtico,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Temple San,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Café Aires,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Pausa Útil,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now, let's calculate the weights using the user's rating dataset.

In [112]:
weights_df = dataframe_user_rating_onehot.drop("Name",1).transpose()
weights_df["Weights"] = weights_df.dot(dataframe_user_rating["Rating"])
userCategory = weights_df["Weights"].sort_index(ascending=True)
userCategory

Chinese Restaurant               0.0
Healthy Food Restaurant          0.0
Italian Restaurant               8.9
Japanese Restaurant              0.0
Modern European Restaurant       0.0
Portuguese Restaurant            7.7
Sushi Restaurant                 8.5
Tapas Restaurant                 0.0
Thai Restaurant                  0.0
Vegetarian / Vegan Restaurant    4.5
Name: Weights, dtype: float64

In [113]:
dataframe_restaurants_onehot_sorted = dataframe_braga_restaurants_onehot.drop('Name',1).sort_index(axis=1)
weights_df_sorted = weights_df.sort_index(axis=0)
categoryTable = dataframe_restaurants_onehot_sorted[:]
categoryTable

Unnamed: 0,Chinese Restaurant,Healthy Food Restaurant,Italian Restaurant,Japanese Restaurant,Modern European Restaurant,Portuguese Restaurant,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


We are now going to take the weighted average of each restaurant and use its values to build the recommendation table.

In [114]:
recommendationValues = ((userCategory*categoryTable).sum(axis=1))/userCategory.sum()
recommendationValues

0     0.260135
1     0.260135
2     0.152027
3     0.287162
4     0.000000
5     0.260135
6     0.000000
7     0.260135
8     0.000000
9     0.152027
10    0.152027
11    0.260135
12    0.300676
13    0.000000
14    0.000000
15    0.287162
16    0.000000
17    0.260135
18    0.260135
19    0.260135
20    0.000000
21    0.260135
22    0.000000
23    0.260135
24    0.260135
25    0.287162
dtype: float64

In [115]:
recommendationValues.sort_values(ascending=False)

12    0.300676
15    0.287162
3     0.287162
25    0.287162
21    0.260135
24    0.260135
1     0.260135
23    0.260135
5     0.260135
7     0.260135
11    0.260135
0     0.260135
17    0.260135
18    0.260135
19    0.260135
9     0.152027
2     0.152027
10    0.152027
13    0.000000
14    0.000000
22    0.000000
20    0.000000
8     0.000000
16    0.000000
6     0.000000
4     0.000000
dtype: float64

In [116]:
recommendationTable = dataframe_braga_restaurants_new.loc[recommendationValues.sort_values(ascending=False).index.values,:]
recommendationTable

Unnamed: 0,Name,Category,Address,Latitude,Longitude,Rating
12,La Piola,Italian Restaurant,R. D. Afonso Henriques,41.548922,-8.427519,4.4
15,Nikko,Sushi Restaurant,Largo de S. Paulo,41.548215,-8.427458,4.1
3,Alma d'Eça,Sushi Restaurant,R. Eça de Queirós,41.551688,-8.425949,4.5
25,Michizaki,Sushi Restaurant,"R. D. Frei Caetano Brandão, 169",41.548418,-8.428323,4.8
21,Tasca D. Ferreira,Portuguese Restaurant,R. S. Vicente,41.554479,-8.423207,4.6
24,Um Cibo no Prato,Portuguese Restaurant,Braga,41.551852,-8.416348,4.5
1,Adega Malhoa,Portuguese Restaurant,"R. D. Paio Mendes, 17",41.549642,-8.428807,4.2
23,Trotas,Portuguese Restaurant,Rua do Raio,41.551326,-8.417657,4.5
5,Boutique do Leitão,Portuguese Restaurant,R. Eça de Queirós,41.551668,-8.426193,4.0
7,Dom Augusto,Portuguese Restaurant,Rua de São Vicente 222,41.555397,-8.422152,4.6


Note that the same restaurant can be recommended to you more than once ( eg. **Michizaki**), and each category of the respective restaurant will have its own place on the recommendation table. Contrary to content-based movies recommendation systems, where all categories/genres are incorporated in one movie entry, here the categories are evaluated separately. This makes sense, because when you watch a movie you have no choice upon the different genres which are incorporated in the movie whereas when you go to a restaurant you can choose what type of food to order.

One can further improve the recommendation system if we sort the restaurants with equal weighted average by rating. In this way, the higher rated restaurants that have the same weighted average will appear first on the recommendation table.

In [157]:
recommendationTable["weighted"] = recommendationValues.sort_values(ascending=False)
grouped = recommendationTable.groupby('weighted', sort=False).apply(lambda x: x.sort_values(by="Rating", ascending = False))
recommendationTableFinal = grouped.droplevel("weighted").drop(columns='weighted')
recommendationTableFinal

Unnamed: 0,Name,Category,Address,Latitude,Longitude,Rating
12,La Piola,Italian Restaurant,R. D. Afonso Henriques,41.548922,-8.427519,4.4
25,Michizaki,Sushi Restaurant,"R. D. Frei Caetano Brandão, 169",41.548418,-8.428323,4.8
3,Alma d'Eça,Sushi Restaurant,R. Eça de Queirós,41.551688,-8.425949,4.5
15,Nikko,Sushi Restaurant,Largo de S. Paulo,41.548215,-8.427458,4.1
21,Tasca D. Ferreira,Portuguese Restaurant,R. S. Vicente,41.554479,-8.423207,4.6
7,Dom Augusto,Portuguese Restaurant,Rua de São Vicente 222,41.555397,-8.422152,4.6
24,Um Cibo no Prato,Portuguese Restaurant,Braga,41.551852,-8.416348,4.5
23,Trotas,Portuguese Restaurant,Rua do Raio,41.551326,-8.417657,4.5
19,Restaurante Silvas,Portuguese Restaurant,C. C. Granjinhos (R. 25 de Abril),41.547361,-8.421606,4.4
1,Adega Malhoa,Portuguese Restaurant,"R. D. Paio Mendes, 17",41.549642,-8.428807,4.2
