# Making Recommendations Based on Popularity

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

They were originally published by: Blanca Vargas-Govea, Juan Gabriel González-Serna, Rafael Ponce-Medellín. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSys11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.

## Restaurants data

In [2]:
import numpy as np
import pandas as pd

In [3]:
# rating_final.csv
url = 'https://drive.google.com/file/d/1ptu4AlEXO4qQ8GytxKHoeuS1y4l_zWkC/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
frame = pd.read_csv(path)

# chefmozcuisine.csv
url = 'https://drive.google.com/file/d/1S0_EGSRERIkSKW4D8xHPGZMqvlhuUzp1/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
cuisine = pd.read_csv(path)

# 'geoplaces2.csv'
url = 'https://drive.google.com/file/d/1ee3ib7LqGsMUksY68SD9yBItRvTFELxo/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
geodata = pd.read_csv(path, encoding = 'CP1252') # change encoding to 'mbcs' in Windows

On the "frame" dataset we have the ratings users have given to places. Ratings go from 0 to 2.

In [4]:
frame.head(3)

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2


In the `geodata` dataset we have info about the places. We will only use the `name` column.

In [5]:
geodata.head(2)

Unnamed: 0,placeID,latitude,longitude,the_geom_meter,name,address,city,state,country,fax,...,alcohol,smoking_area,dress_code,accessibility,price,url,Rambience,franchise,area,other_services
0,134999,18.915421,-99.184871,0101000020957F000088568DE356715AC138C0A525FC46...,Kiku Cuernavaca,Revolucion,Cuernavaca,Morelos,Mexico,?,...,No_Alcohol_Served,none,informal,no_accessibility,medium,kikucuernavaca.com.mx,familiar,f,closed,none
1,132825,22.147392,-100.983092,0101000020957F00001AD016568C4858C1243261274BA5...,puesto de tacos,esquina santos degollado y leon guzman,s.l.p.,s.l.p.,mexico,?,...,No_Alcohol_Served,none,informal,completely,low,?,familiar,f,open,none


In [6]:
places =  geodata[['placeID', 'name']]
places.head()

Unnamed: 0,placeID,name
0,134999,Kiku Cuernavaca
1,132825,puesto de tacos
2,135106,El Rincón de San Francisco
3,132667,little pizza Emilio Portes Gil
4,132613,carnitas_mata


In the `cuisine` dataset we have the type of cuisine that restaurants offer.

In [7]:
cuisine.head(3)

Unnamed: 0,placeID,Rcuisine
0,135110,Spanish
1,135109,Italian
2,135107,Latin_American


## Popularity/Quality based recommmender system

Let's group places by rating, and look at their average rating. This is an **explicit** rating given by users.

In [8]:
rating = pd.DataFrame(frame.groupby('placeID')['rating'].mean())
rating.sort_values("rating", ascending=False).head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
132955,2.0
135034,2.0
134986,2.0
132922,1.833333
132755,1.8


The top rated places have a perfect score of 2/2. But how many reviews do these places have?

In [9]:
frame.query("placeID==132955")

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
934,U1004,132955,2,2,2
960,U1061,132955,2,2,2
996,U1059,132955,2,1,2
1014,U1097,132955,2,2,1
1080,U1096,132955,2,2,2


Looks like only 5 people went to this place. Maybe they're just the owner's friends! Or maybe they're really top-quality places, but too niche to recommend to the masses.

We can also look at how many times each restaurant has received a rating. The ratings count is an **implicit** rating.

In [10]:
rating['rating_count'] = frame.groupby('placeID')['rating'].count()
rating.sort_values("rating_count", ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


Some places have been visited around 30 times. They are more popular than the top rated places, but received lower explicit ratings.

Let's locate the most popular place, and get some info about it:

In [11]:
rating.sort_values('rating_count', ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


In [12]:
# placeId of most popular place
top_popular_placeID = rating.sort_values('rating_count', ascending=False).head(1).index[0]

# name of the most popular place
places[places['placeID']==top_popular_placeID]

Unnamed: 0,placeID,name
121,135085,Tortas Locas Hipocampo


In [13]:
# cuisine of the most popular place
cuisine[cuisine['placeID']==top_popular_placeID]

Unnamed: 0,placeID,Rcuisine
44,135085,Fast_Food


The most popular place is "Tortas Locas Hipocampo", a fast food place that has received 36 reviews and it has an average score of 1.33.

### Challenge:

Find a hybrid system to sort restaurants, so that you can recommend the "best" places: restaurants that are both high rated and popular.

In [14]:
cuisine.groupby('Rcuisine').count().sort_values(ascending=False,by='placeID').head(3) # it shows Mexican, International and American food Top 3 type of Restraurants who offer foods.

Unnamed: 0_level_0,placeID
Rcuisine,Unnamed: 1_level_1
Mexican,239
International,62
American,59


In [15]:
# service rating by me
rating['service'] = frame.groupby('placeID')['service_rating'].sum()
rating['food'] = frame.groupby('placeID')['food_rating'].sum()
#rating['rating'] = frame.groupby('placeID')['rating'].sum()

rating['popular'] = rating['service']*rating['food']
rating['popular'].sort_values(ascending=False).head()

placeID
135085    2226
132825    1290
135032     900
135052     841
132862     650
Name: popular, dtype: int64

In [16]:
top = rating.merge(places, on='placeID', how='left').sort_values('popular',ascending=False)
most = top.merge(cuisine,on='placeID',how='left').sort_values('popular',ascending=False)
most.head()

Unnamed: 0,placeID,rating,rating_count,service,food,popular,name,Rcuisine
0,135085,1.333333,36,42,53,2226,Tortas Locas Hipocampo,Fast_Food
1,132825,1.28125,32,30,43,1290,puesto de tacos,Mexican
2,135032,1.178571,28,30,30,900,Cafeteria y Restaurant El Pacifico,Cafeteria
3,135032,1.178571,28,30,30,900,Cafeteria y Restaurant El Pacifico,Contemporary
4,135052,1.28,25,29,29,841,La Cantina Restaurante,Bar


In [17]:
# service rating by Angelos
rating['rating_count_service'] = frame.groupby(
    'placeID')['service_rating'].count()
# food rating
rating['rating_count_food'] = frame.groupby(
#overall experience 
    'placeID')['food_rating'].count()
rating["rating_count"] = frame.groupby('placeID')['rating'].count()
rating["popularity"] = rating["rating"] * rating["rating_count"] * rating["rating_count_service"] * rating["rating_count_food"]
#merge the two dataframes
total = rating.merge(places, on='placeID', how='left').sort_values(
    "popularity", ascending=False)
top = total.merge(cuisine, on='placeID', how='left')
# select cuisine and name
top[['name', 'Rcuisine', 'popularity']]

Unnamed: 0,name,Rcuisine,popularity
0,Tortas Locas Hipocampo,Fast_Food,62208.0
1,puesto de tacos,Mexican,41984.0
2,Cafeteria y Restaurant El Pacifico,Cafeteria,25872.0
3,Cafeteria y Restaurant El Pacifico,Contemporary,25872.0
4,La Cantina Restaurante,Bar,20000.0
...,...,...,...
142,TACOS EL GUERO,Mexican,27.0
143,Arrachela Grill,,27.0
144,Mikasa,Japanese,18.0
145,Carnitas Mata Calle 16 de Septiembre,,16.0
