# Making Recommendations Based on Popularity

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

They were originally published by: Blanca Vargas-Govea, Juan Gabriel González-Serna, Rafael Ponce-Medellín. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSys11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.

## Restaurants data

In [None]:
import numpy as np
import pandas as pd

In [None]:
# rating_final.csv
url = 'https://drive.google.com/file/d/1ptu4AlEXO4qQ8GytxKHoeuS1y4l_zWkC/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
frame = pd.read_csv(path)

# chefmozcuisine.csv
url = 'https://drive.google.com/file/d/1S0_EGSRERIkSKW4D8xHPGZMqvlhuUzp1/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
cuisine = pd.read_csv(path)

# 'geoplaces2.csv'
url = 'https://drive.google.com/file/d/1ee3ib7LqGsMUksY68SD9yBItRvTFELxo/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
geodata = pd.read_csv(path, encoding = 'CP1252') # change encoding to 'mbcs' in Windows

On the "frame" dataset we have the ratings users have given to places. Ratings go from 0 to 2.

In [None]:
frame.head(3)

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2


In the `geodata` dataset we have info about the places. We will only use the `name` column.

In [None]:
geodata.head(2)

Unnamed: 0,placeID,latitude,longitude,the_geom_meter,name,address,city,state,country,fax,...,alcohol,smoking_area,dress_code,accessibility,price,url,Rambience,franchise,area,other_services
0,134999,18.915421,-99.184871,0101000020957F000088568DE356715AC138C0A525FC46...,Kiku Cuernavaca,Revolucion,Cuernavaca,Morelos,Mexico,?,...,No_Alcohol_Served,none,informal,no_accessibility,medium,kikucuernavaca.com.mx,familiar,f,closed,none
1,132825,22.147392,-100.983092,0101000020957F00001AD016568C4858C1243261274BA5...,puesto de tacos,esquina santos degollado y leon guzman,s.l.p.,s.l.p.,mexico,?,...,No_Alcohol_Served,none,informal,completely,low,?,familiar,f,open,none


In [None]:
places =  geodata[['placeID', 'name']]
places.head()

Unnamed: 0,placeID,name
0,134999,Kiku Cuernavaca
1,132825,puesto de tacos
2,135106,El Rincón de San Francisco
3,132667,little pizza Emilio Portes Gil
4,132613,carnitas_mata


In the `cuisine` dataset we have the type of cuisine that restaurants offer.

In [None]:
cuisine.head(3)

Unnamed: 0,placeID,Rcuisine
0,135110,Spanish
1,135109,Italian
2,135107,Latin_American


## Popularity/Quality based recommmender system

Let's group places by rating, and look at their average rating. This is an **explicit** rating given by users.

In [None]:
rating = pd.DataFrame(frame.groupby('placeID')['rating'].mean())
rating.sort_values("rating", ascending=False).head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
132955,2.0
135034,2.0
134986,2.0
132922,1.833333
132755,1.8


The top rated places have a perfect score of 2/2. But how many reviews do these places have?

In [None]:
frame.query("placeID==132955")

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
934,U1004,132955,2,2,2
960,U1061,132955,2,2,2
996,U1059,132955,2,1,2
1014,U1097,132955,2,2,1
1080,U1096,132955,2,2,2


Looks like only 5 people went to this place. Maybe they're just the owner's friends! Or maybe they're really top-quality places, but too niche to recommend to the masses.

We can also look at how many times each restaurant has received a rating. The ratings count is an **implicit** rating.

In [None]:
rating['rating_count'] = frame.groupby('placeID')['rating'].count()
rating.sort_values("rating_count", ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


Some places have been visited around 30 times. They are more popular than the top rated places, but received lower explicit ratings.

Let's locate the most popular place, and get some info about it:

In [None]:
rating.sort_values('rating_count', ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


In [None]:
# placeId of most popular place
top_popular_placeID = rating.sort_values('rating_count', ascending=False).head(1).index[0]

# name of the most popular place
places[places['placeID']==top_popular_placeID]

Unnamed: 0,placeID,name
121,135085,Tortas Locas Hipocampo


In [None]:
# cuisine of the most popular place
cuisine[cuisine['placeID']==top_popular_placeID]

Unnamed: 0,placeID,Rcuisine
44,135085,Fast_Food


The most popular place is "Tortas Locas Hipocampo", a fast food place that has received 36 reviews and it has an average score of 1.33.

### Challenge:

Find a hybrid system to sort restaurants, so that you can recommend the "best" places: restaurants that are both high rated and popular.

In [None]:
# Alex
rating['rating_alex'] = ((rating.rating_count / 10) + (rating.rating * 10)) / 10
rating.loc[(
    rating.rating_count >= rating.rating_count.mean())&(
        rating.rating >= rating.rating.mean())].sort_values(by='rating_alex', ascending=False).head(10)

Unnamed: 0_level_0,rating,rating_count,rating_alex
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
135075,1.692308,13,1.822308
135025,1.666667,15,1.816667
135059,1.666667,9,1.756667
135030,1.583333,12,1.703333
132768,1.6,10,1.7
135085,1.333333,36,1.693333
135028,1.533333,15,1.683333
134996,1.555556,9,1.645556
135066,1.5,12,1.62
132825,1.28125,32,1.60125


In [None]:
# Pat
rating['rating_pat'] = (rating['rating'] * rating['rating_count']) / (rating['rating'] + rating['rating_count'])
rating.sort_values(by='rating_pat', ascending=False).head(10)

Unnamed: 0_level_0,rating,rating_count,rating_alex,rating_pat
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
134986,2.0,8,2.08,1.6
135025,1.666667,15,1.816667,1.5
135075,1.692308,13,1.822308,1.497382
135034,2.0,5,2.05,1.428571
132955,2.0,5,2.05,1.428571
135059,1.666667,9,1.756667,1.40625
132922,1.833333,6,1.893333,1.404255
135030,1.583333,12,1.703333,1.398773
135028,1.533333,15,1.683333,1.391129
132768,1.6,10,1.7,1.37931


In [None]:
# Tal 1
rating['rating_tal_1'] = (rating['rating_count'] * rating['rating']) / 2
rating.sort_values(by='rating_tal_1', ascending=False).head(10)

Unnamed: 0_level_0,rating,rating_count,rating_alex,rating_pat,rating_tal_1
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
135085,1.333333,36,1.693333,1.285714,24.0
132825,1.28125,32,1.60125,1.231925,20.5
135032,1.178571,28,1.458571,1.130967,16.5
135052,1.28,25,1.53,1.217656,16.0
135038,1.208333,24,1.448333,1.150413,14.5
135062,1.238095,21,1.448095,1.169165,13.0
135060,1.136364,22,1.356364,1.08055,12.5
135042,1.25,20,1.45,1.176471,12.5
132862,1.388889,18,1.568889,1.289398,12.5
132834,1.0,25,1.25,0.961538,12.5


In [None]:
# Tal 2
rating['rating_tal_2'] = (rating['rating_count'] * rating['rating']) / 130
rating.sort_values(by='rating_tal_2', ascending=False).head(10)

Unnamed: 0_level_0,rating,rating_count,rating_alex,rating_pat,rating_tal_1,rating_tal_2
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
135085,1.333333,36,1.693333,1.285714,24.0,0.369231
132825,1.28125,32,1.60125,1.231925,20.5,0.315385
135032,1.178571,28,1.458571,1.130967,16.5,0.253846
135052,1.28,25,1.53,1.217656,16.0,0.246154
135038,1.208333,24,1.448333,1.150413,14.5,0.223077
135062,1.238095,21,1.448095,1.169165,13.0,0.2
135060,1.136364,22,1.356364,1.08055,12.5,0.192308
135042,1.25,20,1.45,1.176471,12.5,0.192308
132862,1.388889,18,1.568889,1.289398,12.5,0.192308
132834,1.0,25,1.25,0.961538,12.5,0.192308


In [None]:
# Liane
max = 36
rating_pop = rating.loc[rating.rating_count >= 0.5*max, :]
rating_pop.sort_values('rating', ascending=False)

Unnamed: 0_level_0,rating,rating_count,rating_alex,rating_pat,rating_tal_1,rating_tal_2
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
132862,1.388889,18,1.568889,1.289398,12.5,0.192308
135085,1.333333,36,1.693333,1.285714,24.0,0.369231
132825,1.28125,32,1.60125,1.231925,20.5,0.315385
135052,1.28,25,1.53,1.217656,16.0,0.246154
135042,1.25,20,1.45,1.176471,12.5,0.192308
135062,1.238095,21,1.448095,1.169165,13.0,0.2
135038,1.208333,24,1.448333,1.150413,14.5,0.223077
135032,1.178571,28,1.458571,1.130967,16.5,0.253846
135060,1.136364,22,1.356364,1.08055,12.5,0.192308
135058,1.111111,18,1.291111,1.046512,10.0,0.153846


In [None]:
# Dani
df = rating.sort_values("rating_count", ascending=False).head()
df['score'] = df['rating'] * df['rating_count']
df['% of all'] = df['score'] / frame.shape[0] * 100
df

Unnamed: 0_level_0,rating,rating_count,rating_alex,rating_pat,rating_tal_1,rating_tal_2,score,% of all
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
135085,1.333333,36,1.693333,1.285714,24.0,0.369231,48.0,4.134367
132825,1.28125,32,1.60125,1.231925,20.5,0.315385,41.0,3.531438
135032,1.178571,28,1.458571,1.130967,16.5,0.253846,33.0,2.842377
135052,1.28,25,1.53,1.217656,16.0,0.246154,32.0,2.756245
132834,1.0,25,1.25,0.961538,12.5,0.192308,25.0,2.153316


In [None]:
# Kaj
## Calculate average rating of all restaurants
avg_rating = frame['rating'].mean()
## rater deficient places
rater_deficient_places = pd.DataFrame(frame.groupby('placeID')['rating'].count())
rater_deficient_places = rater_deficient_places[rater_deficient_places['rating'] < 10]
rater_deficient_places

user_id_counter = frame['userID'].max()
rating_fluffed = frame[['placeID', 'rating']]
for placeID in rater_deficient_places.index:
  for i in range(rater_deficient_places.loc[placeID, 'rating'], 10):
    # print (f'{placeID}: {i}')
    rating_fluffed = rating_fluffed.append({'placeID': placeID, 'rating': avg_rating}, ignore_index=True)

# rating_fluffed.shape , frame.shape

average_ratings = rating_fluffed.groupby('placeID').mean()
average_ratings.sort_values(by='rating', ascending=False)

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
134986.0,1.839966
135075.0,1.692308
135025.0,1.666667
135059.0,1.619983
132768.0,1.600000
...,...
135040.0,0.819897
135086.0,0.800000
132663.0,0.779931
132732.0,0.739966
