# Making Recommendations Based on Popularity

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

They were originally published by: Blanca Vargas-Govea, Juan Gabriel González-Serna, Rafael Ponce-Medellín. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSys11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.

## Restaurants data

In [1]:
import numpy as np
import pandas as pd

In [2]:
# rating_final.csv
url = 'https://drive.google.com/file/d/1ptu4AlEXO4qQ8GytxKHoeuS1y4l_zWkC/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
frame = pd.read_csv(path)

# chefmozcuisine.csv
url = 'https://drive.google.com/file/d/1S0_EGSRERIkSKW4D8xHPGZMqvlhuUzp1/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
cuisine = pd.read_csv(path)

# 'geoplaces2.csv'
url = 'https://drive.google.com/file/d/1ee3ib7LqGsMUksY68SD9yBItRvTFELxo/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
geodata = pd.read_csv(path, encoding = 'CP1252') # change encoding to 'mbcs' in Windows

On the "frame" dataset we have the ratings users have given to places. Ratings go from 0 to 2.

In [5]:
frame.head(3)

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2


In the `geodata` dataset we have info about the places. We will only use the `name` column.

In [27]:
geodata.shape

(130, 21)

In [21]:
geodata.columns


Index(['placeID', 'latitude', 'longitude', 'the_geom_meter', 'name', 'address',
       'city', 'state', 'country', 'fax', 'zip', 'alcohol', 'smoking_area',
       'dress_code', 'accessibility', 'price', 'url', 'Rambience', 'franchise',
       'area', 'other_services'],
      dtype='object')

In [8]:
places =  geodata[['placeID', 'name']]
places.head(10)

Unnamed: 0,placeID,name
0,134999,Kiku Cuernavaca
1,132825,puesto de tacos
2,135106,El Rincón de San Francisco
3,132667,little pizza Emilio Portes Gil
4,132613,carnitas_mata
5,135040,Restaurant los Compadres
6,132732,Taqueria EL amigo
7,132875,shi ro ie
8,132609,Pollo_Frito_Buenos_Aires
9,135082,la Estrella de Dimas


In the `cuisine` dataset we have the type of cuisine that restaurants offer.

In [9]:
cuisine.head(3)

Unnamed: 0,placeID,Rcuisine
0,135110,Spanish
1,135109,Italian
2,135107,Latin_American


## Popularity/Quality based recommmender system

Let's group places by rating, and look at their average rating. This is an **explicit** rating given by users.

In [10]:
rating = pd.DataFrame(frame.groupby('placeID')['rating'].mean())
rating.sort_values("rating", ascending=False).head()

Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
132955,2.0
135034,2.0
134986,2.0
132922,1.833333
132755,1.8


The top rated places have a perfect score of 2/2. But how many reviews do these places have?

In [11]:
frame.query("placeID==132955")

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
934,U1004,132955,2,2,2
960,U1061,132955,2,2,2
996,U1059,132955,2,1,2
1014,U1097,132955,2,2,1
1080,U1096,132955,2,2,2


Looks like only 5 people went to this place. Maybe they're just the owner's friends! Or maybe they're really top-quality places, but too niche to recommend to the masses.

We can also look at how many times each restaurant has received a rating. The ratings count is an **implicit** rating.

In [12]:
rating['rating_count'] = frame.groupby('placeID')['rating'].count()
rating.sort_values("rating_count", ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


Some places have been visited around 30 times. They are more popular than the top rated places, but received lower explicit ratings.

Let's locate the most popular place, and get some info about it:

In [15]:
rating.sort_values('rating_count', ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


In [16]:
# placeId of most popular place
top_popular_placeID = rating.sort_values('rating_count', ascending=False).head(1).index[0]

# name of the most popular place
places[places['placeID']==top_popular_placeID]

Unnamed: 0,placeID,name
121,135085,Tortas Locas Hipocampo


In [17]:
top_popular_placeID

135085

In [None]:
# cuisine of the most popular place
cuisine[cuisine['placeID']==top_popular_placeID]

Unnamed: 0,placeID,Rcuisine
44,135085,Fast_Food


In [30]:
name_cuisine = pd.merge(places, cuisine, on = ['placeID'])
name_cuisine

Unnamed: 0,placeID,name,Rcuisine
0,134999,Kiku Cuernavaca,Japanese
1,132825,puesto de tacos,Mexican
2,135106,El Rincón de San Francisco,Mexican
3,132667,little pizza Emilio Portes Gil,Armenian
4,132613,carnitas_mata,Mexican
...,...,...,...
107,132866,Chaires,Bakery
108,132866,Chaires,Cafeteria
109,135072,Sushi Itto,Japanese
110,135109,Paniroles,Italian


In [37]:
#name_cuisine[name_cuisine['placeID'==top_popular_placeID]]
name_cuisine[name_cuisine['placeID']==135085]

Unnamed: 0,placeID,name,Rcuisine
103,135085,Tortas Locas Hipocampo,Fast_Food


The most popular place is "Tortas Locas Hipocampo", a fast food place that has received 36 reviews and it has an average score of 1.33.

### Challenge:

Find a hybrid system to sort restaurants, so that you can recommend the "best" places: restaurants that are both high rated and popular.

In [38]:
rating.sort_values('rating_count', ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


In [49]:
rating.sort_values('rating', ascending=False).head()

Unnamed: 0,placeID,rating,rating_count
57,132955,2.0,5
82,135034,2.0,5
62,134986,2.0,8
52,132922,1.833333,6
26,132755,1.8,5


In [48]:
rating.reset_index(inplace=True)

In [None]:
best_recommendation = rating.sort_values()

In [None]:
best_recommendation[] = np.where((rating.rating_count.mean()>0, df[var1]-df[var2], 0)

In [51]:
df = rating[rating.rating_count>cut_off_rating_count]

In [57]:
df.sort_values('rating', ascending=False).head(10)


Unnamed: 0,placeID,rating,rating_count
117,135075,1.692308,13
104,135059,1.666667,9
75,135025,1.666667,15
29,132768,1.6,10
79,135030,1.583333,12
65,134996,1.555556,9
78,135028,1.533333,15
110,135066,1.5,12
25,132754,1.461538,13
91,135045,1.461538,13


In [None]:
Therese Andrä10:48
rating["new_rating_3"] = (rating[rating["rating_count"] > 10]["rating_count"] * 0.5)/rating["rating_count"] + (rating[rating["rating_count"] > 10]["rating"] * 0.5)/2 
rating.loc[rating["rating_count"] <= 10, "new_rating_3"] = 0
rating.sort_values(by = "new_rating_3", ascending = False).head(10)

In [None]:
top_5 = rating.loc[(rating['rating']>1.5) & (rating['rating_count']>9)].sort_values('rating', ascending=False)
top_5 = top_5.merge(places, on='placeID', how='inner')
top_5to20 = prep_5to20.sort_values('rating', ascending=False)
top_5to20 = top_5to20.merge(places, on='placeID', how='inner')

top_20 = pd.concat([top_5, top_5to20])
top20 = top_20.drop_duplicates(subset=['placeID'])
prep_5to20 = rating.sort_values('product_scaled', ascending=False).head(20)
rating['product_scaled'] = (rating['rating'] * 5) * rating['rating_count']

In [None]:
Crystal10:58
rating.loc[(rating['rating']>1.4) & (rating['rating_count']>10)].sort_values("rating", ascending=False).head(5)

In [59]:

rating.loc[(rating['rating']>rating.rating.mean()) & (rating['rating_count']>rating.rating_count.mean())].sort_values("rating", ascending=False).head(5)

Unnamed: 0,placeID,rating,rating_count
117,135075,1.692308,13
104,135059,1.666667,9
75,135025,1.666667,15
29,132768,1.6,10
79,135030,1.583333,12


In [69]:
rating

Unnamed: 0,placeID,rating,rating_count
0,132560,0.500000,4
1,132561,0.750000,4
2,132564,1.250000,4
3,132572,1.000000,15
4,132583,1.000000,4
...,...,...,...
125,135088,1.000000,6
126,135104,0.857143,7
127,135106,1.200000,10
128,135108,1.181818,11
