# 4th Study Case: Recommender System

Restaurant and Consumer Recommender System<br>
*with Content-Based Filtering*

Naufal Mu'afi<br>
nmuafi1@gmail.com

---

In [21]:
import zipfile
import pandas as pd
import numpy as np

## 1. Data Understanding
---

### Load the Data

In [3]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00232/RCdata.zip

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [7]:
local_zip = './RCdata.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('./data')
zip_ref.close()

### Read the Data

In [10]:
accepts = pd.read_csv('./data/chefmozaccepts.csv')
cuisine = pd.read_csv('./data/chefmozcuisine.csv')
hours = pd.read_csv('./data/chefmozhours4.csv')
parking = pd.read_csv('./data/chefmozparking.csv')
geo = pd.read_csv('./data/geoplaces2.csv', encoding = "ISO-8859-1")
usercuisine = pd.read_csv('./data/usercuisine.csv')
payment = pd.read_csv('./data/userpayment.csv')
profile = pd.read_csv('./data/userprofile.csv')
rating = pd.read_csv('./data/rating_final.csv')
 
print('The amount of payment data the restaurant accepts ', len(accepts.placeID.unique()))
print('Number of food data in restaurants: ', len(cuisine.placeID.unique()))
print('Number of restaurant opening times data: ', len(hours.placeID.unique()))
print('Number of restaurant location data: ', len(geo.placeID.unique()))
print('Number of user cuisine data: ', len(usercuisine.userID.unique()))
print('Number of user profile data: ', len(profile.userID.unique()))
print('Number of user-provided ratings data: ', len(rating.userID.unique()))
print('Number of restaurant rating data: ', len(rating.placeID.unique()))

The amount of payment data the restaurant accepts  615
Number of food data in restaurants:  769
Number of restaurant opening times data:  694
Number of restaurant location data:  130
Number of user cuisine data:  138
Number of user profile data:  138
Number of user-provided ratings data:  138
Number of restaurant rating data:  130


## 2. Univariate Exploratory Data Analysis (EDA)
---

In this project, we're just going to explore some of variables/features, namely: `accept`, `cuisine`, `profile`, `rating`.

The `accept` and `cuisine` variable categorized as restaurant data, and the `profile` and `rating` variable categorized as user data.

### 2.1. Accept Variable

In [11]:
accepts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1314 entries, 0 to 1313
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   placeID   1314 non-null   int64 
 1   Rpayment  1314 non-null   object
dtypes: int64(1), object(1)
memory usage: 20.7+ KB


In [13]:
print(f"Number of Data: {len(accepts.placeID.unique())}")
print(f"Number of Rpayment that accepts: {len(accepts.Rpayment.unique())}")
print(f"Type of Rpayment that accepts: {accepts.Rpayment.unique()}")

Number of Data: 615
Number of Rpayment that accepts: 12
Type of Rpayment that accepts: ['cash' 'VISA' 'MasterCard-Eurocard' 'American_Express' 'bank_debit_cards'
 'checks' 'Discover' 'Carte_Blanche' 'Diners_Club' 'Visa'
 'Japan_Credit_Bureau' 'gift_certificates']


### 2.2. Cuisine Variable

In [14]:
cuisine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 916 entries, 0 to 915
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   placeID   916 non-null    int64 
 1   Rcuisine  916 non-null    object
dtypes: int64(1), object(1)
memory usage: 14.4+ KB


In [15]:
print(f"Number of Food Type: {len(cuisine.Rcuisine.unique())}")
print(f"Type of Food: {cuisine.Rcuisine.unique()}")

Number of Food Type: 59
Type of Food: ['Spanish' 'Italian' 'Latin_American' 'Mexican' 'Fast_Food' 'Burgers'
 'Dessert-Ice_Cream' 'Hot_Dogs' 'Steaks' 'Asian' 'International'
 'Mongolian' 'Vegetarian' 'Brazilian' 'Cafe-Coffee_Shop' 'Cafeteria'
 'Contemporary' 'Deli-Sandwiches' 'Diner' 'Japanese' 'Sushi' 'Seafood'
 'Chinese' 'Bar' 'Bar_Pub_Brewery' 'Pizzeria' 'Mediterranean' 'American'
 'Family' 'Caribbean' 'African' 'Breakfast-Brunch' 'Regional' 'Afghan'
 'Bakery' 'Game' 'Armenian' 'Vietnamese' 'Korean' 'Thai' 'Barbecue'
 'Polish' 'Dutch-Belgian' 'French' 'German' 'Southwestern' 'Persian'
 'Ethiopian' 'Juice' 'Soup' 'Continental-European' 'Greek' 'Southern'
 'Eastern_European' 'California' 'Bagels' 'Turkish' 'Organic-Healthy'
 'Fine_Dining']


### 2.3. Profile Variable

In [16]:
print(profile.shape)

(138, 19)


In [17]:
profile.head()

Unnamed: 0,userID,latitude,longitude,smoker,drink_level,dress_preference,ambience,transport,marital_status,hijos,birth_year,interest,personality,religion,activity,color,weight,budget,height
0,U1001,22.139997,-100.978803,False,abstemious,informal,family,on foot,single,independent,1989,variety,thrifty-protector,none,student,black,69,medium,1.77
1,U1002,22.150087,-100.983325,False,abstemious,informal,family,public,single,independent,1990,technology,hunter-ostentatious,Catholic,student,red,40,low,1.87
2,U1003,22.119847,-100.946527,False,social drinker,formal,family,public,single,independent,1989,none,hard-worker,Catholic,student,blue,60,low,1.69
3,U1004,18.867,-99.183,False,abstemious,informal,family,public,single,independent,1940,variety,hard-worker,none,professional,green,44,medium,1.53
4,U1005,22.183477,-100.959891,False,abstemious,no preference,family,public,single,independent,1992,none,thrifty-protector,Catholic,student,black,65,medium,1.69


### 2.4. Rating Variable

In [18]:
rating.head()

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2


In [19]:
rating.describe()

Unnamed: 0,placeID,rating,food_rating,service_rating
count,1161.0,1161.0,1161.0,1161.0
mean,134192.041344,1.199828,1.215332,1.090439
std,1100.916275,0.773282,0.792294,0.790844
min,132560.0,0.0,0.0,0.0
25%,132856.0,1.0,1.0,0.0
50%,135030.0,1.0,1.0,1.0
75%,135059.0,2.0,2.0,2.0
max,135109.0,2.0,2.0,2.0


In [20]:
print(f"Number of UserID: {len(rating.userID.unique())}")
print(f"Number of PlaceID: {len(rating.placeID.unique())}")
print(f"Total amount of rating data: {len(rating)}")

Number of UserID: 138
Number of PlaceID: 130
Total amount of rating data: 1161


## 3. Data Preprocessing
---

### Concatenate All Restaurant

In [23]:
# concatenate all placeID in Restaurant category
resto_all = np.concatenate((
  accepts.placeID.unique(),  
  cuisine.placeID.unique(),  
  hours.placeID.unique(),  
  parking.placeID.unique(),  
  geo.placeID.unique(),  
))

# sort the data and remove the duplicate data
resto_all = np.sort(np.unique(resto_all))

print(f"Total amount of restaurant data by placeID: {len(resto_all)}")

Total amount of restaurant data by placeID: 938


### Concatenate All User

In [24]:
# concatenate all userID
user_all = np.concatenate((
  usercuisine.userID.unique(),  
  payment.userID.unique(),  
  profile.userID.unique(),    
))

# sort the data and remove the duplicate data
user_all = np.sort(np.unique(user_all))

print(f"Total amount of user data by userID: {len(user_all)}")

Total amount of user data by userID: 138


### The Numbers of Rating

In [25]:
# concatenate restaurant info
resto_info = pd.concat([accepts, geo, parking, hours])

# merge rating df with resto_info by placeID value
resto = pd.merge(rating, resto_info, on='placeID', how='left')
resto

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,Rpayment,latitude,longitude,the_geom_meter,name,...,accessibility,price,url,Rambience,franchise,area,other_services,parking_lot,hours,days
0,U1077,135085,2,2,2,cash,,,,,...,,,,,,,,,,
1,U1077,135085,2,2,2,,22.150802,-100.982680,0101000020957F00009F823DA6094858C18A2D4D37F9A4...,Tortas Locas Hipocampo,...,no_accessibility,medium,?,familiar,f,closed,none,,,
2,U1077,135085,2,2,2,,,,,,...,,,,,,,,public,,
3,U1077,135085,2,2,2,,,,,,...,,,,,,,,,00:00-00:00;,Mon;Tue;Wed;Thu;Fri;
4,U1077,135085,2,2,2,,,,,,...,,,,,,,,,00:00-00:00;,Sat;
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8073,U1068,132660,0,0,0,,23.752943,-99.164679,0101000020957F00003D7905C9DC8157C13FCD1AB7334E...,carnitas mata calle Emilio Portes Gil,...,completely,low,?,familiar,f,closed,none,,,
8074,U1068,132660,0,0,0,,,,,,...,,,,,,,,none,,
8075,U1068,132660,0,0,0,,,,,,...,,,,,,,,,00:00-23:30;,Mon;Tue;Wed;Thu;Fri;
8076,U1068,132660,0,0,0,,,,,,...,,,,,,,,,00:00-23:30;,Sat;


The implication is the emergence of many missing values, which can be observed in:

In [26]:
resto.isnull().sum()

userID               0
placeID              0
rating               0
food_rating          0
service_rating       0
Rpayment          5781
latitude          6917
longitude         6917
the_geom_meter    6917
name              6917
address           6917
city              6917
state             6917
country           6917
fax               6917
zip               6917
alcohol           6917
smoking_area      6917
dress_code        6917
accessibility     6917
price             6917
url               6917
Rambience         6917
franchise         6917
area              6917
other_services    6917
parking_lot       6917
hours             4619
days              4619
dtype: int64

In [27]:
# calculate the number of rating, food_rating, and service then group by placeID
resto.groupby('placeID').sum()

Unnamed: 0_level_0,userID,rating,food_rating,service_rating,Rpayment,latitude,longitude,the_geom_meter,name,address,...,accessibility,price,url,Rambience,franchise,area,other_services,parking_lot,hours,days
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
132560,U1067U1067U1067U1067U1067U1067U1082U1082U1082U...,12,24,6,cashcashcashcash,95.009216,-396.667653,0101000020957F0000FC60BDA8E88157C1B2C357D6DA4E...,puesto de gorditaspuesto de gorditaspuesto de ...,frente al tecnologicofrente al tecnologicofren...,...,no_accessibilityno_accessibilityno_accessibili...,lowlowlowlow,????,familiarfamiliarfamiliarfamiliar,ffff,openopenopenopen,nonenonenonenone,publicpublicpublicpublic,08:00-12:00;00:00-00:00;00:00-00:00;08:00-12:0...,Mon;Tue;Wed;Thu;Fri;Sat;Sun;Mon;Tue;Wed;Thu;Fr...
132561,U1026U1026U1026U1026U1026U1129U1129U1129U1129U...,15,20,20,0,94.907276,-396.506024,0101000020957F000004457BB7AA8657C15F10835CD944...,cafe ambarcafe ambarcafe ambarcafe ambar,????,...,completelycompletelycompletelycompletely,lowlowlowlow,????,familiarfamiliarfamiliarfamiliar,ffff,closedclosedclosedclosed,nonenonenonenone,nonenonenonenone,00:00-23:30;00:00-23:30;00:00-23:30;00:00-23:3...,Mon;Tue;Wed;Thu;Fri;Sat;Sun;Mon;Tue;Wed;Thu;Fr...
132564,U1060U1060U1060U1060U1060U1080U1080U1080U1080U...,25,25,30,0,94.923698,-396.580739,0101000020957F0000EA4F00C5A08557C140085474D949...,churchschurchschurchschurchs,????,...,completelycompletelycompletelycompletely,lowlowlowlow,????,familiarfamiliarfamiliarfamiliar,ffff,closedclosedclosedclosed,nonenonenonenone,nonenonenonenone,00:00-23:30;00:00-23:30;00:00-23:30;00:00-23:3...,Mon;Tue;Wed;Thu;Fri;Sat;Sun;Mon;Tue;Wed;Thu;Fr...
132572,U1108U1108U1108U1108U1108U1108U1055U1055U1055U...,90,90,84,cashcashcashcashcashcashcashcashcashcashcashca...,332.124707,-1514.890677,0101000020957F00005D19BF45294958C18FF7F8E260A8...,Cafe ChairesCafe ChairesCafe ChairesCafe Chair...,???????????????,...,completelycompletelycompletelycompletelycomple...,lowlowlowlowlowlowlowlowlowlowlowlowlowlowlow,???????????????,familiarfamiliarfamiliarfamiliarfamiliarfamili...,fffffffffffffff,closedclosedclosedclosedclosedclosedclosedclos...,nonenonenonenonenonenonenonenonenonenonenoneno...,yesyesyesyesyesyesyesyesyesyesyesyesyesyesyes,00:00-23:30;00:00-23:30;00:00-23:30;00:00-23:3...,Mon;Tue;Wed;Thu;Fri;Sat;Sun;Mon;Tue;Wed;Thu;Fr...
132583,U1044U1044U1044U1044U1044U1044U1118U1118U1118U...,24,24,30,cashVISAMasterCard-Eurocardbank_debit_cardscas...,75.689162,-396.937328,0101000020957F0000FBE7171F056F5AC1E8A6C0A5AF55...,McDonalds CentroMcDonalds CentroMcDonalds Cent...,Rayon sn col. CentroRayon sn col. CentroRayon ...,...,partiallypartiallypartiallypartially,lowlowlowlow,nononono,familiarfamiliarfamiliarfamiliar,tttt,closedclosedclosedclosed,nonenonenonenone,nonenonenonenone,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135088,U1044U1044U1044U1044U1044U1044U1030U1030U1030U...,36,42,36,cashcashcashcashcashcash,113.256068,-595.319338,0101000020957F0000E14AD4DBC7765AC1F7B33C85B153...,Cafeteria cenidetCafeteria cenidetCafeteria ce...,Interior Internado Palmira SNInterior Internad...,...,no_accessibilityno_accessibilityno_accessibili...,lowlowlowlowlowlow,www.cenidet.edu.mxwww.cenidet.edu.mxwww.cenide...,quietquietquietquietquietquiet,ffffff,closedclosedclosedclosedclosedclosed,nonenonenonenonenonenone,publicpublicpublicpublicpublicpublic,09:00-16:00;00:00-00:00;00:00-00:00;09:00-16:0...,Mon;Tue;Wed;Thu;Fri;Sat;Sun;Mon;Tue;Wed;Thu;Fr...
135104,U1068U1068U1068U1068U1068U1068U1068U1068U1067U...,48,80,48,cashVISAMasterCard-EurocardcashVISAMasterCard-...,166.270875,-694.179039,0101000020957F00007CDF5EAFC58157C1645743B23E4F...,vipsvipsvipsvipsvipsvipsvips,???????,...,completelycompletelycompletelycompletelycomple...,mediummediummediummediummediummediummedium,???????,familiarfamiliarfamiliarfamiliarfamiliarfamili...,ttttttt,closedclosedclosedclosedclosedclosedclosed,varietyvarietyvarietyvarietyvarietyvarietyvariety,yesyesyesyesyesyesyes,00:00-23:30;00:00-23:30;00:00-23:30;00:00-23:3...,Mon;Tue;Wed;Thu;Fri;Sat;Sun;Mon;Tue;Wed;Thu;Fr...
135106,U1055U1055U1055U1055U1055U1055U1055U1055U1126U...,96,96,96,cashVISAMasterCard-EurocardcashVISAMasterCard-...,221.497088,-1009.760928,0101000020957F0000649D6F21634858C119AE9BF528A3...,El Rincón de San FranciscoEl Rincón de San Fra...,Universidad 169Universidad 169Universidad 169U...,...,partiallypartiallypartiallypartiallypartiallyp...,mediummediummediummediummediummediummediummedi...,??????????,familiarfamiliarfamiliarfamiliarfamiliarfamili...,ffffffffff,openopenopenopenopenopenopenopenopenopen,nonenonenonenonenonenonenonenonenonenone,nonenonenonenonenonenonenonenonenonenone,18:00-23:30;18:00-23:30;18:00-21:00;18:00-23:3...,Mon;Tue;Wed;Thu;Fri;Sat;Sun;Mon;Tue;Wed;Thu;Fr...
135108,U1088U1088U1088U1088U1088U1126U1126U1126U1126U...,65,65,55,0,243.498787,-1110.269437,0101000020957F00008FAE40D59E4B58C112C66046D597...,PotzocalliPotzocalliPotzocalliPotzocalliPotzoc...,Carretera Central SnCarretera Central SnCarret...,...,completelycompletelycompletelycompletelycomple...,lowlowlowlowlowlowlowlowlowlowlow,???????????,familiarfamiliarfamiliarfamiliarfamiliarfamili...,fffffffffff,closedclosedclosedclosedclosedclosedclosedclos...,nonenonenonenonenonenonenonenonenonenonenone,nonenonenonenonenonenonenonenonenonenonenone,00:00-23:30;00:00-23:30;00:00-23:30;00:00-23:3...,Mon;Tue;Wed;Thu;Fri;Sat;Sun;Mon;Tue;Wed;Thu;Fr...


### Merge data with resto name feature

First, define the variable 'all_resto_rate' with the ratings variable that was known before

In [28]:
all_resto_rate = rating
all_resto_rate

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2
...,...,...,...,...,...
1156,U1043,132630,1,1,1
1157,U1011,132715,1,1,0
1158,U1068,132733,1,1,0
1159,U1068,132594,1,1,1


In [29]:
# Merge 'all_resto_rate' with the 'geo' dataframe based on the placeID."
all_resto_name = pd.merge(all_resto_rate, geo[['placeID','name']], on='placeID', how='left') 
all_resto_name

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name
0,U1077,135085,2,2,2,Tortas Locas Hipocampo
1,U1077,135038,2,2,1,Restaurant la Chalita
2,U1077,132825,2,2,2,puesto de tacos
3,U1077,135060,1,2,2,Restaurante Marisco Sam
4,U1068,135104,1,1,2,vips
...,...,...,...,...,...,...
1156,U1043,132630,1,1,1,palomo tec
1157,U1011,132715,1,1,0,tacos de la estacion
1158,U1068,132733,1,1,0,Little Cesarz
1159,U1068,132594,1,1,1,tacos de barbacoa enfrente del Tec


### Merge data with Food Resto feature

In [30]:
# Merge the 'cuisine' dataframe with 'all_resto_name' and store it in the variable 'all_resto'
all_resto = pd.merge(all_resto_name, cuisine, on='placeID', how='left')
all_resto

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name,Rcuisine
0,U1077,135085,2,2,2,Tortas Locas Hipocampo,Fast_Food
1,U1077,135038,2,2,1,Restaurant la Chalita,
2,U1077,132825,2,2,2,puesto de tacos,Mexican
3,U1077,135060,1,2,2,Restaurante Marisco Sam,Seafood
4,U1068,135104,1,1,2,vips,Mexican
...,...,...,...,...,...,...,...
1326,U1043,132630,1,1,1,palomo tec,Mexican
1327,U1011,132715,1,1,0,tacos de la estacion,Mexican
1328,U1068,132733,1,1,0,Little Cesarz,Pizzeria
1329,U1068,132594,1,1,1,tacos de barbacoa enfrente del Tec,Mexican


## 4. Data Preparation
---

### Handle Missing Values

In [31]:
all_resto.isnull().sum()

userID              0
placeID             0
rating              0
food_rating         0
service_rating      0
name                0
Rcuisine          288
dtype: int64

In [32]:
all_resto_clean = all_resto.dropna()
all_resto_clean

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name,Rcuisine
0,U1077,135085,2,2,2,Tortas Locas Hipocampo,Fast_Food
2,U1077,132825,2,2,2,puesto de tacos,Mexican
3,U1077,135060,1,2,2,Restaurante Marisco Sam,Seafood
4,U1068,135104,1,1,2,vips,Mexican
5,U1068,132740,0,0,0,Carreton de Flautas y Migadas,Mexican
...,...,...,...,...,...,...,...
1325,U1043,132732,1,1,1,Taqueria EL amigo,Mexican
1326,U1043,132630,1,1,1,palomo tec,Mexican
1327,U1011,132715,1,1,0,tacos de la estacion,Mexican
1328,U1068,132733,1,1,0,Little Cesarz,Pizzeria


In [33]:
all_resto_clean.isnull().sum()

userID            0
placeID           0
rating            0
food_rating       0
service_rating    0
name              0
Rcuisine          0
dtype: int64

### Standardizing Cuisine Types

In [34]:
# Sorting restaurants based on PlaceID and then storing them in the variable 'fix_resto'
fix_resto = all_resto_clean.sort_values('placeID', ascending=True)
fix_resto

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name,Rcuisine
1303,U1087,132560,1,2,1,puesto de gorditas,Regional
1288,U1050,132560,0,2,0,puesto de gorditas,Regional
14,U1067,132560,1,0,0,puesto de gorditas,Regional
42,U1082,132560,0,0,0,puesto de gorditas,Regional
1052,U1013,132572,1,1,0,Cafe Chaires,Cafeteria
...,...,...,...,...,...,...,...
438,U1024,135106,1,1,1,El Rincón de San Francisco,Mexican
178,U1020,135109,2,2,1,Paniroles,Italian
1071,U1041,135109,1,2,1,Paniroles,Italian
99,U1030,135109,0,0,0,Paniroles,Italian


In [35]:
# Checking the number of entries in 'fix_resto'
len(fix_resto.placeID.unique())

95

In [36]:
# Checking category of unique cuisine
fix_resto.Rcuisine.unique()

array(['Regional', 'Cafeteria', 'American', 'Mexican', 'Fast_Food',
       'Italian', 'Armenian', 'Pizzeria', 'Japanese', 'Vietnamese',
       'Family', 'International', 'Game', 'Burgers', 'Bakery', 'Bar',
       'Breakfast-Brunch', 'Bar_Pub_Brewery', 'Mediterranean',
       'Cafe-Coffee_Shop', 'Contemporary', 'Seafood', 'Chinese'],
      dtype=object)

Take note, among all the cuisine categories in the data, there is one that stands out, namely the cuisine category called `Game`. Which restaurant has the `Game` category?

In [37]:
# checking `Game` cuisine
fix_resto[fix_resto['Rcuisine'] == 'Game']

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name,Rcuisine
781,U1015,132851,1,1,1,KFC,Game
509,U1052,132851,1,0,2,KFC,Game
708,U1008,132851,1,1,1,KFC,Game
770,U1037,132851,2,2,1,KFC,Game
574,U1069,132851,1,0,0,KFC,Game
1188,U1131,132851,2,2,2,KFC,Game
764,U1111,132851,2,1,0,KFC,Game


As it turns out, 'Game' is a cuisine category for the restaurant named KFC. Interesting. The next question is, are there any other cuisine categories (Rcuisine) for KFC?

In [38]:
# checking cuisine in KFC name restaurant
fix_resto[fix_resto['name'] == 'KFC']

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name,Rcuisine
781,U1015,132851,1,1,1,KFC,Game
508,U1052,132851,1,0,2,KFC,American
780,U1015,132851,1,1,1,KFC,American
509,U1052,132851,1,0,2,KFC,Game
708,U1008,132851,1,1,1,KFC,Game
707,U1008,132851,1,1,1,KFC,American
770,U1037,132851,2,2,1,KFC,Game
769,U1037,132851,2,2,1,KFC,American
1187,U1131,132851,2,2,2,KFC,American
574,U1069,132851,1,0,0,KFC,Game


Indeed, KFC has two different cuisine categories, 'Game' and 'American.' Surely, this needs to be corrected

In [39]:
# change `Game` into `American`
fix_resto = fix_resto.replace('Game', 'American')
fix_resto[fix_resto['name'] == 'KFC']

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name,Rcuisine
781,U1015,132851,1,1,1,KFC,American
508,U1052,132851,1,0,2,KFC,American
780,U1015,132851,1,1,1,KFC,American
509,U1052,132851,1,0,2,KFC,American
708,U1008,132851,1,1,1,KFC,American
707,U1008,132851,1,1,1,KFC,American
770,U1037,132851,2,2,1,KFC,American
769,U1037,132851,2,2,1,KFC,American
1187,U1131,132851,2,2,2,KFC,American
574,U1069,132851,1,0,0,KFC,American


### The Preparation

In [40]:
preparation = fix_resto
preparation.sort_values('placeID')

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name,Rcuisine
1303,U1087,132560,1,2,1,puesto de gorditas,Regional
1288,U1050,132560,0,2,0,puesto de gorditas,Regional
14,U1067,132560,1,0,0,puesto de gorditas,Regional
42,U1082,132560,0,0,0,puesto de gorditas,Regional
184,U1055,132572,2,2,2,Cafe Chaires,Cafeteria
...,...,...,...,...,...,...,...
1224,U1002,135106,1,1,1,El Rincón de San Francisco,Mexican
99,U1030,135109,0,0,0,Paniroles,Italian
178,U1020,135109,2,2,1,Paniroles,Italian
1071,U1041,135109,1,2,1,Paniroles,Italian


In [41]:
preparation = preparation.drop_duplicates('placeID')
preparation

Unnamed: 0,userID,placeID,rating,food_rating,service_rating,name,Rcuisine
1303,U1087,132560,1,2,1,puesto de gorditas,Regional
1052,U1013,132572,1,1,0,Cafe Chaires,Cafeteria
168,U1118,132583,0,0,0,McDonalds Centro,American
24,U1107,132584,2,2,2,Gorditas Dona Tota,Mexican
1329,U1068,132594,1,1,1,tacos de barbacoa enfrente del Tec,Mexican
...,...,...,...,...,...,...,...
681,U1095,135086,1,2,1,Mcdonalds Parque Tangamanga,Fast_Food
175,U1020,135088,1,2,0,Cafeteria cenidet,Cafeteria
4,U1068,135104,1,1,2,vips,Mexican
488,U1004,135106,2,2,2,El Rincón de San Francisco,Mexican


Next, we need to convert the series data into a list.

In [42]:
resto_id = preparation['placeID'].tolist() 
resto_name = preparation['name'].tolist()
resto_cuisine = preparation['Rcuisine'].tolist()
 
print(len(resto_id))
print(len(resto_name))
print(len(resto_cuisine))

95
95
95


In [43]:
# create a dictionary to data ‘resto_id’, ‘resto_name’, dan ‘cuisine’
resto_new = pd.DataFrame({
    'id': resto_id,
    'resto_name': resto_name,
    'cuisine': resto_cuisine
})
resto_new

Unnamed: 0,id,resto_name,cuisine
0,132560,puesto de gorditas,Regional
1,132572,Cafe Chaires,Cafeteria
2,132583,McDonalds Centro,American
3,132584,Gorditas Dona Tota,Mexican
4,132594,tacos de barbacoa enfrente del Tec,Mexican
...,...,...,...
90,135086,Mcdonalds Parque Tangamanga,Fast_Food
91,135088,Cafeteria cenidet,Cafeteria
92,135104,vips,Mexican
93,135106,El Rincón de San Francisco,Mexican


## 5. Model Development with Content Based Filtering
---

## 6. Model Development with Collaborative Filtering
---