# 4th Study Case: Recommender System

Restaurant and Consumer Recommender System<br>
*with Content-Based Filtering*

Naufal Mu'afi<br>
nmuafi1@gmail.com

---

In [8]:
import zipfile
import pandas as pd

## 1. Data Understanding
---

### Load the Data

In [3]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00232/RCdata.zip

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [7]:
local_zip = './RCdata.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('./data')
zip_ref.close()

### Read the Data

In [10]:
accepts = pd.read_csv('./data/chefmozaccepts.csv')
cuisine = pd.read_csv('./data/chefmozcuisine.csv')
hours = pd.read_csv('./data/chefmozhours4.csv')
parking = pd.read_csv('./data/chefmozparking.csv')
geo = pd.read_csv('./data/geoplaces2.csv', encoding = "ISO-8859-1")
usercuisine = pd.read_csv('./data/usercuisine.csv')
payment = pd.read_csv('./data/userpayment.csv')
profile = pd.read_csv('./data/userprofile.csv')
rating = pd.read_csv('./data/rating_final.csv')
 
print('The amount of payment data the restaurant accepts ', len(accepts.placeID.unique()))
print('Number of food data in restaurants: ', len(cuisine.placeID.unique()))
print('Number of restaurant opening times data: ', len(hours.placeID.unique()))
print('Number of restaurant location data: ', len(geo.placeID.unique()))
print('Number of user cuisine data: ', len(usercuisine.userID.unique()))
print('Number of user profile data: ', len(profile.userID.unique()))
print('Number of user-provided ratings data: ', len(rating.userID.unique()))
print('Number of restaurant rating data: ', len(rating.placeID.unique()))

The amount of payment data the restaurant accepts  615
Number of food data in restaurants:  769
Number of restaurant opening times data:  694
Number of restaurant location data:  130
Number of user cuisine data:  138
Number of user profile data:  138
Number of user-provided ratings data:  138
Number of restaurant rating data:  130


## 2. Univariate Exploratory Data Analysis (EDA)
---

In this project, we're just going to explore some of variables/features, namely: `accept`, `cuisine`, `profile`, `rating`.

The `accept` and `cuisine` variable categorized as restaurant data, and the `profile` and `rating` variable categorized as user data.

### 2.1. Accept Variable

In [11]:
accepts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1314 entries, 0 to 1313
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   placeID   1314 non-null   int64 
 1   Rpayment  1314 non-null   object
dtypes: int64(1), object(1)
memory usage: 20.7+ KB


In [13]:
print(f"Number of Data: {len(accepts.placeID.unique())}")
print(f"Number of Rpayment that accepts: {len(accepts.Rpayment.unique())}")
print(f"Type of Rpayment that accepts: {accepts.Rpayment.unique()}")

Number of Data: 615
Number of Rpayment that accepts: 12
Type of Rpayment that accepts: ['cash' 'VISA' 'MasterCard-Eurocard' 'American_Express' 'bank_debit_cards'
 'checks' 'Discover' 'Carte_Blanche' 'Diners_Club' 'Visa'
 'Japan_Credit_Bureau' 'gift_certificates']


### 2.2. Cuisine Variable

In [14]:
cuisine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 916 entries, 0 to 915
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   placeID   916 non-null    int64 
 1   Rcuisine  916 non-null    object
dtypes: int64(1), object(1)
memory usage: 14.4+ KB


In [15]:
print(f"Number of Food Type: {len(cuisine.Rcuisine.unique())}")
print(f"Type of Food: {cuisine.Rcuisine.unique()}")

Number of Food Type: 59
Type of Food: ['Spanish' 'Italian' 'Latin_American' 'Mexican' 'Fast_Food' 'Burgers'
 'Dessert-Ice_Cream' 'Hot_Dogs' 'Steaks' 'Asian' 'International'
 'Mongolian' 'Vegetarian' 'Brazilian' 'Cafe-Coffee_Shop' 'Cafeteria'
 'Contemporary' 'Deli-Sandwiches' 'Diner' 'Japanese' 'Sushi' 'Seafood'
 'Chinese' 'Bar' 'Bar_Pub_Brewery' 'Pizzeria' 'Mediterranean' 'American'
 'Family' 'Caribbean' 'African' 'Breakfast-Brunch' 'Regional' 'Afghan'
 'Bakery' 'Game' 'Armenian' 'Vietnamese' 'Korean' 'Thai' 'Barbecue'
 'Polish' 'Dutch-Belgian' 'French' 'German' 'Southwestern' 'Persian'
 'Ethiopian' 'Juice' 'Soup' 'Continental-European' 'Greek' 'Southern'
 'Eastern_European' 'California' 'Bagels' 'Turkish' 'Organic-Healthy'
 'Fine_Dining']


### 2.3. Profile Variable

In [16]:
print(profile.shape)

(138, 19)


In [17]:
profile.head()

Unnamed: 0,userID,latitude,longitude,smoker,drink_level,dress_preference,ambience,transport,marital_status,hijos,birth_year,interest,personality,religion,activity,color,weight,budget,height
0,U1001,22.139997,-100.978803,False,abstemious,informal,family,on foot,single,independent,1989,variety,thrifty-protector,none,student,black,69,medium,1.77
1,U1002,22.150087,-100.983325,False,abstemious,informal,family,public,single,independent,1990,technology,hunter-ostentatious,Catholic,student,red,40,low,1.87
2,U1003,22.119847,-100.946527,False,social drinker,formal,family,public,single,independent,1989,none,hard-worker,Catholic,student,blue,60,low,1.69
3,U1004,18.867,-99.183,False,abstemious,informal,family,public,single,independent,1940,variety,hard-worker,none,professional,green,44,medium,1.53
4,U1005,22.183477,-100.959891,False,abstemious,no preference,family,public,single,independent,1992,none,thrifty-protector,Catholic,student,black,65,medium,1.69


### 2.4. Rating Variable

In [18]:
rating.head()

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2


In [19]:
rating.describe()

Unnamed: 0,placeID,rating,food_rating,service_rating
count,1161.0,1161.0,1161.0,1161.0
mean,134192.041344,1.199828,1.215332,1.090439
std,1100.916275,0.773282,0.792294,0.790844
min,132560.0,0.0,0.0,0.0
25%,132856.0,1.0,1.0,0.0
50%,135030.0,1.0,1.0,1.0
75%,135059.0,2.0,2.0,2.0
max,135109.0,2.0,2.0,2.0


In [20]:
print(f"Number of UserID: {len(rating.userID.unique())}")
print(f"Number of PlaceID: {len(rating.placeID.unique())}")
print(f"Total amount of rating data: {len(rating)}")

Number of UserID: 138
Number of PlaceID: 130
Total amount of rating data: 1161


## 3. Data Preprocessing
---

## 4. Data Preparation
---

## 5. Model Development with Content Based Filtering
---

## 6. Model Development with Collaborative Filtering
---