<a id='Top'></a>

# Yelp Recommender System Project
## Part 3 Recommender System

### Overview

1. <a href='#1'>Data Preparation for Recommender system</a>
1. <a href='#2'>Collaborative filtering</a>
1. <a href='#3'>Content-based filtering</a> 

_The datasets were cleaned and explored in the data wrangling and EDA parts of this project (see Data wrangling and EDA notebooks for details). Note that the datasets used here contain only __food and restaurant__ related businesses, users, and reviews._

In [1]:
import pandas as pd
import random

# Surprise packages
from surprise import Reader
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import GridSearchCV

from surprise import NormalPredictor
from surprise import BaselineOnly
from surprise import SVD
from surprise import SVDpp
from surprise import NMF

<a id='1'></a>
### Data Preparation for Recommender system

In [2]:
df = pd.read_csv('reviews_business_user_info.csv', index_col=0)

  mask |= (ar1 == a)


In [3]:
df.head(3)

Unnamed: 0,review_id,user_id,business_id,stars,date,business_name,city,state,city_state,latitude,longitude,price_range,categories,user_name,yelping_since
0,x7mDIiDB3jEiPGPHOmDzyw,msQe1u7Z_XuqjGoqhB0J5g,iCQpiavjjPzJ5_3gPD5Ebg,2,2011-02-25,Secret Pizza,Las Vegas,NV,"Las Vegas, NV",36.109837,-115.174212,1.0,"['Pizza', 'Restaurants']",Melissa,2011-02-24
1,dDl8zu1vWPdKGihJrwQbpw,msQe1u7Z_XuqjGoqhB0J5g,pomGBqfbxcqPv14c3XH-ZQ,5,2012-11-13,Leticia's Mexican Cocina,Las Vegas,NV,"Las Vegas, NV",36.298875,-115.280088,2.0,"['Restaurants', 'Mexican', 'Bars', 'Nightlife']",Melissa,2011-02-24
2,LZp4UX5zK3e-c5ZGSeo3kA,msQe1u7Z_XuqjGoqhB0J5g,jtQARsP6P-LbkyjbO1qNGg,1,2014-10-23,H&H BBQ Plus 2,Las Vegas,NV,"Las Vegas, NV",36.241809,-115.234495,2.0,"['American (New)', 'Barbeque', 'Restaurants']",Melissa,2011-02-24


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4017884 entries, 0 to 4017883
Data columns (total 15 columns):
review_id        object
user_id          object
business_id      object
stars            int64
date             object
business_name    object
city             object
state            object
city_state       object
latitude         float64
longitude        float64
price_range      float64
categories       object
user_name        object
yelping_since    object
dtypes: float64(3), int64(1), object(11)
memory usage: 490.5+ MB


#### Selecting one metropolitan area

For building a recommender system, I will use food and restaurant businesses only in one metropolitan area for several reasons. First of all, people normally want to have recommendations in some areas they plan to visit. Moreover, if all cities are used, the matrix by users (rows) and businesses (columns) becomes very sparse and this makes it hard to predict stars.  

The city with the most number of reviews was Las Vegas.

In [5]:
df.city_state.value_counts()[:20]

Las Vegas, NV          1139203
Phoenix, AZ             408423
Toronto, ON             380025
Scottsdale, AZ          217414
Charlotte, NC           198690
Pittsburgh, PA          157270
MontrÃ©al, QC           115208
Tempe, AZ               115027
Henderson, NV           109831
Mesa, AZ                 88576
Chandler, AZ             86319
Cleveland, OH            81433
Madison, WI              71896
Gilbert, AZ              68186
Calgary, AB              63940
Glendale, AZ             53566
Mississauga, ON          39518
Markham, ON              39319
Peoria, AZ               29418
North Las Vegas, NV      25355
Name: city_state, dtype: int64

I noticed some of the other cities above are also in the [Las Vegas–Henderson–Paradise, NV  metropolitan area](https://en.wikipedia.org/wiki/Las_Vegas%E2%80%93Henderson%E2%80%93Paradise,_NV_Metropolitan_Statistical_Area). The cities belong to the metropolitan area are Henderson,  North Las Vegas, Paradise, Las Vegas, and Boulder City and I will include reviews in all of these cities. 

In the EDA part, I already transformed 5 kinds of strings representing Las Vegas into 'Las Vegas'. 'Henderson and Las vegas' actually  representing Henderson was also fixed. Here, I will further clean up strings for North Las Vegas. I did not find any multiple strings for Paradise and Boulder City. I will also check whether there are any same name cities in other states.

##### Fixing city names

In [6]:
df[df.city=='Las Vegas'].city_state.value_counts()

Las Vegas, NV    1139203
Name: city_state, dtype: int64

In [7]:
df[df.city=='Paradise'].city_state.value_counts()

Paradise, NV    110
Name: city_state, dtype: int64

In [8]:
df[df.city=='Boulder City'].city_state.value_counts()

Boulder City, NV    5746
Name: city_state, dtype: int64

In [9]:
df[df.city=='Henderson'].city_state.value_counts()

Henderson, NV    109831
Henderson, VA         3
Name: city_state, dtype: int64

There is Henderson in Virginia! Thus, I need to use the city_state column I made to select the 5 cities in my interest.

In [10]:
df[df.city.isin(['N Las Vegas', 'N. Las Vegas', 'North Las Vegas'])].city_state.value_counts()

North Las Vegas, NV    25355
N. Las Vegas, NV         287
N Las Vegas, NV          113
Name: city_state, dtype: int64

They are altogether 25755 and 'N Las Vegas' and 'N. Las Vegas'(400 of them) will be fixed to 'North Las Vegas'.

In [11]:
# function from the EDA part
def unify_city_names(df, col_name, possible_names, correct_name):
    '''
    This function correct all possible city names to a correct name
    '''
    correct_dict = dict(zip(possible_names,[correct_name]*len(possible_names)))
    print(correct_dict)
    df[col_name]=df[col_name].replace(correct_dict)

In [12]:
unify_city_names(df, 'city', ['N Las Vegas', 'N. Las Vegas'], 'North Las Vegas')

{'N Las Vegas': 'North Las Vegas', 'N. Las Vegas': 'North Las Vegas'}


In [13]:
sum(df.city.isin(['N Las Vegas', 'N. Las Vegas'])) 

0

In [14]:
sum(df.city=='North Las Vegas')

25755

The number looks correct! I also need to fix the city_state column.

In [15]:
sum(df.city_state=='North Las Vegas, NV')

25355

In [16]:
sum(df.city_state.isin(['N Las Vegas, NV', 'N. Las Vegas, NV']))

400

In [17]:
unify_city_names(df, 'city_state', ['N Las Vegas, NV', 'N. Las Vegas, NV'], 'North Las Vegas, NV')

{'N Las Vegas, NV': 'North Las Vegas, NV', 'N. Las Vegas, NV': 'North Las Vegas, NV'}


In [18]:
sum(df.city_state.isin(['N Las Vegas, NV', 'N. Las Vegas, NV']))

0

In [19]:
sum(df.city_state=='North Las Vegas, NV')

25755

##### Selecting reviews in those cities

In [20]:
city_states_Vegas = ['Las Vegas, NV','North Las Vegas, NV','Paradise, NV',
                    'Boulder City, NV', 'Henderson, NV']

In [21]:
1139203+110+5746+109831+25755

1280645

In [22]:
df_Vegas = df[df.city_state.isin(city_states_Vegas)]
len(df_Vegas)

1280645

In [23]:
len(df_Vegas) == 1139203+110+5746+109831+25755

True

The number of reviews, 1,280,645, looks alright!

In [24]:
# Number of users
df_Vegas.user_id.nunique()

422409

In [25]:
# Number of businesses
df_Vegas.business_id.nunique()

9674

The Las Vegas reviews have 
- 1,280,645 reviews
- 422,409 users  
- 9,674 businesses 

In [26]:
# save Vegas reviews
#df_Vegas.to_csv('reviews_business_user_info_Vegas.csv')

#### Selecting businesses and users with enough reviews

In [28]:
def ids_with_enough_reviews(df, column, threshold):
    review_counts = df[column].value_counts() 
    return review_counts[review_counts >= threshold].index

In [27]:
sum(df_Vegas.user_id.value_counts()>=10)

20340

In [29]:
user_ids_10more = ids_with_enough_reviews(df_Vegas, 'user_id', 10)

In [30]:
df_Vegas[df_Vegas.user_id.isin(user_ids_10more)].user_id.nunique()

20340

In [31]:
sum(df_Vegas.business_id.value_counts()>=20)

6268

In [32]:
business_ids_20more = ids_with_enough_reviews(df_Vegas, 'business_id', 20)

In [33]:
df_Vegas[df_Vegas.business_id.isin(business_ids_20more)].business_id.nunique()

6268

In [34]:
len(df_Vegas[(df_Vegas.business_id.isin(business_ids_20more))&(df_Vegas.user_id.isin(user_ids_10more))])

493658

In [35]:
df_Vegas_over_10_20 = df_Vegas[(df_Vegas.business_id.isin(business_ids_20more))&(df_Vegas.user_id.isin(user_ids_10more))]

In [36]:
df_Vegas_over_10_20.user_id.nunique(), df_Vegas_over_10_20.business_id.nunique(), len(df_Vegas_over_10_20)

(20340, 6266, 493658)

After removing users with less than 10 reviews and businesses with less than 20 reviews, the dataset has
- 493,658 reviews
- 20,340 users
- 6,266 businesses

In [56]:
df_Vegas_over_10_20.to_csv('reviews_business_user_info_Vegas_over_10_20.csv')

<a id='2'></a>
### Collaborative Filtering

Now it's time to build recommender systems!

I am going to use the recommender system package called [Surprise](https://surprise.readthedocs.io/en/stable/index.html) for collaborative filtering algorithms.

#### Split test and training sets

In [37]:
# Load the dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df_Vegas_over_10_20[['user_id', 'business_id', 'stars']], reader)
raw_ratings = data.raw_ratings

# shuffle ratings 
random.Random(32).shuffle(raw_ratings)

# 90% training and 10% test data
threshold = int(.9 * len(raw_ratings))
train_raw_ratings = raw_ratings[:threshold]
test_raw_ratings = raw_ratings[threshold:]

In [38]:
raw_ratings[:5]

[('U4INQZOPSUaj8hMjLlZ3KA', '4JNXUYY8wbaaDmk3BPzlWw', 3.0, None),
 ('cg2P244yON3-_GXWkgAgsw', 'umXvdus9LbC6oxtLdXelFQ', 4.0, None),
 ('YHdXkAmndIfuIczWOnsjeQ', '7HIa2lYy5jgcZuADlRjKSg', 1.0, None),
 ('CstEf6M4JSom9Msm0qIYew', 'wdOOK3K6vzQy1d_OIk-U9w', 3.0, None),
 ('Cwkkowhq9MZue1Xyk57BMg', '7sPNbCx7vGAaH7SbNPZ6oA', 4.0, None)]

#### Make helper functions

In [69]:
def training_test_algorithm(train_set, test_set, algorithm, param_grid, n_cv, cv_result=True):

    data.raw_ratings = train_set  # data is now the set A

    # grid search cross validation
    gs = GridSearchCV(algorithm, param_grid, measures=['rmse'], cv=n_cv, n_jobs=-1)
    gs.fit(data)
    best_algo = gs.best_estimator['rmse']
    # hyper-parameters for the best RMSE score
    best_params = gs.best_params['rmse']
    print("Best hyper-parameters:", best_params)
    print()
    
    # show RMSE for each hyper-parameter combination if cv_result=True
    # if cv_result=False, only show the RMSE for the best combination
    if cv_result:
        print("Training set RMSE for each hyper-parameter combination:")
    else:
        print("Training set RMSE for the best hyper-parameter combination:")
    means = gs.cv_results['mean_test_rmse']
    stds = gs.cv_results['std_test_rmse']
    parameters = gs.cv_results['params']
    for mean, std, params in zip(means, stds, parameters):
        if (cv_result)|(params == best_params):
            print("%0.4f (+/-%0.04f) for %r" % (mean, std * 2, params))                
    print()

    # Compute performance on the whole training set 
    trainset = data.build_full_trainset()
    best_algo.fit(trainset)
    pred = best_algo.test(trainset.build_testset())
#    print('Train set', end='  ')
#    accuracy.rmse(pred)
    
    # Compute performance on test set
    testset = data.construct_testset(test_set)  # testset is now the set B
    pred = best_algo.test(testset)
    print('Test set', end='  ')
    accuracy.rmse(pred)

Surprise GridSearchCV accepts only classes, so I added all hyper-parameters (e.g., random_state) of an algorithm as a grid search parameter whether it is tuned or not. https://github.com/NicolasHug/Surprise/issues/212

#### NormalPredictor Algorithm

NormalPredictor predicts ratings randomly from the normal distribution with mean and standard deviation estimated by the training set. This is a good base model to be compared with more complex models.  

In [40]:
%%time
param_grid = {}
training_test_algorithm(train_raw_ratings, test_raw_ratings, NormalPredictor, param_grid, n_cv=3)

Best hyper-parameters: {}

Training set RMSE for each hyper-parameter combination:
1.6269 (+/-0.0041) for {}

Test set  RMSE: 1.6317
Wall time: 26.1 s


#### Baseline Algorithm

In [41]:
%%time
param_grid = {}
training_test_algorithm(train_raw_ratings, test_raw_ratings, BaselineOnly, param_grid, n_cv=3)

Best hyper-parameters: {}

Training set RMSE for each hyper-parameter combination:
1.1062 (+/-0.0029) for {}

Estimating biases using als...
Test set  RMSE: 1.0941
Wall time: 27 s


- The baseline model is much better than the Normal predictor! 
- By default, the above used Alternating Least Squares (ALS) and I am going to check how Stochastic Gradient Descent (SGD) peforms.
- I also tried n_cv=5 for many algorithms, but it did not improve the performance on the test set although the performanceo on the training set is a little better with 5. Thus, I decided use 3 since it saves more time.

In [52]:
%%time
param_grid = {'bsl_options':{'method': ['als','sgd']}}
training_test_algorithm(train_raw_ratings, test_raw_ratings, BaselineOnly, param_grid, n_cv=3)

Best hyper-parameters: {'bsl_options': {'method': 'sgd'}}

Training set RMSE for each hyper-parameter combination:
1.1063 (+/-0.0027) for {'bsl_options': {'method': 'als'}}
1.1005 (+/-0.0025) for {'bsl_options': {'method': 'sgd'}}

Estimating biases using sgd...
Test set  RMSE: 1.0891
Wall time: 42.4 s


In [61]:
%%time
param_grid = {'bsl_options':{'method': ['als','sgd']}}
training_test_algorithm(train_raw_ratings, test_raw_ratings, BaselineOnly, param_grid, n_cv=3)

Best hyper-parameters: {'bsl_options': {'method': 'sgd'}}

Training set RMSE for each hyper-parameter combination:
1.1064 (+/-0.0035) for {'bsl_options': {'method': 'als'}}
Training set RMSE for each hyper-parameter combination:
1.1006 (+/-0.0034) for {'bsl_options': {'method': 'sgd'}}

Estimating biases using sgd...
Test set  RMSE: 1.0891
Wall time: 43.6 s


SGD is slightly better (lower RMSE) than the default method ALS for the baseline model!

In [70]:
%%time
param_grid = {'bsl_options':{'method': ['sgd'],'n_epochs': [10,20,30]}}
training_test_algorithm(train_raw_ratings, test_raw_ratings, BaselineOnly, param_grid, n_cv=3)

Best hyper-parameters: {'bsl_options': {'method': 'sgd', 'n_epochs': 20}}

Training set RMSE for each hyper-parameter combination:
1.1078 (+/-0.0031) for {'bsl_options': {'method': 'sgd', 'n_epochs': 10}}
1.0999 (+/-0.0025) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20}}
1.1002 (+/-0.0024) for {'bsl_options': {'method': 'sgd', 'n_epochs': 30}}

Estimating biases using sgd...
Test set  RMSE: 1.0891
Wall time: 59.8 s


In [57]:
%%time
param_grid = {'bsl_options':{'method': ['sgd'],'n_epochs': [20, 30, 40],
                            'reg':[.05,.1,.15], 'learning_rate':[.002,.005,.01]}}
training_test_algorithm(train_raw_ratings, test_raw_ratings, BaselineOnly, param_grid, n_cv=3)

Best hyper-parameters: {'bsl_options': {'method': 'sgd', 'n_epochs': 30, 'reg': 0.1, 'learning_rate': 0.005}}

Training set RMSE for each hyper-parameter combination:
1.1130 (+/-0.0023) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.05, 'learning_rate': 0.002}}
1.1002 (+/-0.0024) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.05, 'learning_rate': 0.005}}
1.1024 (+/-0.0026) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.05, 'learning_rate': 0.01}}
1.1137 (+/-0.0022) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.1, 'learning_rate': 0.002}}
1.1004 (+/-0.0023) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.1, 'learning_rate': 0.005}}
1.1017 (+/-0.0024) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.1, 'learning_rate': 0.01}}
1.1146 (+/-0.0021) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.15, 'learning_rate': 0.002}}
1.1009 (+/-0.0022) for {'bsl_options': {'method': 'sgd', 'n_e

In [71]:
%%time
param_grid = {'bsl_options':{'method': ['sgd'],'n_epochs': [20,30,40,50,70,100],
                            'reg':[.01,.015,.02,.05,.1,.15,.2], 'learning_rate':[.005]}}
training_test_algorithm(train_raw_ratings, test_raw_ratings, BaselineOnly, param_grid, n_cv=3)

Best hyper-parameters: {'bsl_options': {'method': 'sgd', 'n_epochs': 30, 'reg': 0.1, 'learning_rate': 0.005}}

Training set RMSE for each hyper-parameter combination:
1.0999 (+/-0.0046) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.01, 'learning_rate': 0.005}}
1.0999 (+/-0.0046) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.015, 'learning_rate': 0.005}}
1.0999 (+/-0.0046) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.02, 'learning_rate': 0.005}}
1.0998 (+/-0.0046) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.05, 'learning_rate': 0.005}}
1.1000 (+/-0.0045) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.1, 'learning_rate': 0.005}}
1.1006 (+/-0.0044) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.15, 'learning_rate': 0.005}}
1.1015 (+/-0.0043) for {'bsl_options': {'method': 'sgd', 'n_epochs': 20, 'reg': 0.2, 'learning_rate': 0.005}}
1.1005 (+/-0.0046) for {'bsl_options': {'method': 'sgd', 

#### SVD algorithm

[Easy explanation](https://medium.com/@m_n_malaeb/singular-value-decomposition-svd-in-recommender-systems-for-non-math-statistics-programming-4a622de653e9)

In [42]:
%%time
param_grid = {'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1151 (+/-0.0057) for {'random_state': 32}

Test set  RMSE: 1.1066
Wall time: 1min 3s


SVD without tuning hyper-parameters is not better than the baseline models.

In [43]:
%%time
param_grid = {'n_epochs':[10, 20, 30], 'lr_all': [0.002, 0.005,.01], 
              'reg_all': [0.02, 0.05, .1], 'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.1, 'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1367 (+/-0.0066) for {'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.02, 'random_state': 32}
1.1364 (+/-0.0066) for {'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.05, 'random_state': 32}
1.1364 (+/-0.0066) for {'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.1, 'random_state': 32}
1.1149 (+/-0.0063) for {'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.02, 'random_state': 32}
1.1133 (+/-0.0063) for {'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.05, 'random_state': 32}
1.1124 (+/-0.0063) for {'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.1, 'random_state': 32}
1.1147 (+/-0.0059) for {'n_epochs': 10, 'lr_all': 0.01, 'reg_all': 0.02, 'random_state': 32}
1.1080 (+/-0.0059) for {'n_epochs': 10, 'lr_all': 0.01, 'reg_all': 0.05, 'random_state': 32}
1.1045 (+/-0.0059) for {'n_epochs': 10, 'lr_all': 0.01, 'reg_all': 0.1, 'random_state': 32}
1.1188 (+/

In [45]:
%%time
param_grid = {'n_epochs':[30,50,70,90], 'lr_all': [.001,.002, 0.005], 
              'reg_all': [ .1, .2, .3, .5], 'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'n_epochs': 90, 'lr_all': 0.002, 'reg_all': 0.2, 'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1247 (+/-0.0007) for {'n_epochs': 30, 'lr_all': 0.001, 'reg_all': 0.1, 'random_state': 32}
1.1254 (+/-0.0008) for {'n_epochs': 30, 'lr_all': 0.001, 'reg_all': 0.2, 'random_state': 32}
1.1271 (+/-0.0009) for {'n_epochs': 30, 'lr_all': 0.001, 'reg_all': 0.3, 'random_state': 32}
1.1313 (+/-0.0010) for {'n_epochs': 30, 'lr_all': 0.001, 'reg_all': 0.5, 'random_state': 32}
1.1094 (+/-0.0012) for {'n_epochs': 30, 'lr_all': 0.002, 'reg_all': 0.1, 'random_state': 32}
1.1098 (+/-0.0013) for {'n_epochs': 30, 'lr_all': 0.002, 'reg_all': 0.2, 'random_state': 32}
1.1116 (+/-0.0013) for {'n_epochs': 30, 'lr_all': 0.002, 'reg_all': 0.3, 'random_state': 32}
1.1168 (+/-0.0013) for {'n_epochs': 30, 'lr_all': 0.002, 'reg_all': 0.5, 'random_state': 32}
1.1045 (+/-0.0020) for {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.1, 'random_state': 32}
1.1022 (+/-0.

In [58]:
%%time
param_grid = {'n_epochs':[80,90,100], 'lr_all': [.001,.002, 0.005], 
              'reg_all': [ .1, .2, .3], 'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.2, 'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1054 (+/-0.0022) for {'n_epochs': 80, 'lr_all': 0.001, 'reg_all': 0.1, 'random_state': 32}
1.1055 (+/-0.0021) for {'n_epochs': 80, 'lr_all': 0.001, 'reg_all': 0.2, 'random_state': 32}
1.1074 (+/-0.0021) for {'n_epochs': 80, 'lr_all': 0.001, 'reg_all': 0.3, 'random_state': 32}
1.1042 (+/-0.0025) for {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.1, 'random_state': 32}
1.1017 (+/-0.0024) for {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.2, 'random_state': 32}
1.1031 (+/-0.0023) for {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.3, 'random_state': 32}
1.1168 (+/-0.0028) for {'n_epochs': 80, 'lr_all': 0.005, 'reg_all': 0.1, 'random_state': 32}
1.1049 (+/-0.0028) for {'n_epochs': 80, 'lr_all': 0.005, 'reg_all': 0.2, 'random_state': 32}
1.1045 (+/-0.0025) for {'n_epochs': 80, 'lr_all': 0.005, 'reg_all': 0.3, 'random_state': 32}
1.1044 (+/-0.

In [73]:
%%time
param_grid = {'n_epochs':[70,80,90], 'lr_all': [.001,.002, 0.005], 
              'reg_all': [ .15, .2, .25], 'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.2, 'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1068 (+/-0.0063) for {'n_epochs': 70, 'lr_all': 0.001, 'reg_all': 0.15, 'random_state': 32}
1.1073 (+/-0.0063) for {'n_epochs': 70, 'lr_all': 0.001, 'reg_all': 0.2, 'random_state': 32}
1.1081 (+/-0.0063) for {'n_epochs': 70, 'lr_all': 0.001, 'reg_all': 0.25, 'random_state': 32}
1.1022 (+/-0.0058) for {'n_epochs': 70, 'lr_all': 0.002, 'reg_all': 0.15, 'random_state': 32}
1.1022 (+/-0.0059) for {'n_epochs': 70, 'lr_all': 0.002, 'reg_all': 0.2, 'random_state': 32}
1.1027 (+/-0.0060) for {'n_epochs': 70, 'lr_all': 0.002, 'reg_all': 0.25, 'random_state': 32}
1.1075 (+/-0.0055) for {'n_epochs': 70, 'lr_all': 0.005, 'reg_all': 0.15, 'random_state': 32}
1.1048 (+/-0.0058) for {'n_epochs': 70, 'lr_all': 0.005, 'reg_all': 0.2, 'random_state': 32}
1.1043 (+/-0.0059) for {'n_epochs': 70, 'lr_all': 0.005, 'reg_all': 0.25, 'random_state': 32}
1.1052 

In [74]:
%%time
param_grid = {'n_epochs':[80], 'lr_all': [.002], 
              'reg_all': [ .18, .2, .22], 'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.18, 'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1016 (+/-0.0048) for {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.18, 'random_state': 32}
1.1016 (+/-0.0048) for {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.2, 'random_state': 32}
1.1017 (+/-0.0048) for {'n_epochs': 80, 'lr_all': 0.002, 'reg_all': 0.22, 'random_state': 32}

Test set  RMSE: 1.0914
Wall time: 4min 54s


I found some smaller number of epochs are as good as big ones below. 

In [47]:
%%time
param_grid = {'n_epochs':[30], 'lr_all': [0.005], 
              'reg_all': [.15,.2,.3,.5], 'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.2, 'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1024 (+/-0.0037) for {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.15, 'random_state': 32}
1.1022 (+/-0.0036) for {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.2, 'random_state': 32}
1.1035 (+/-0.0034) for {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.3, 'random_state': 32}
1.1084 (+/-0.0032) for {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.5, 'random_state': 32}

Test set  RMSE: 1.0916
Wall time: 2min 29s


In [48]:
%%time
param_grid = {'n_epochs':[30], 'lr_all': [0.005], 
              'reg_all': [.18,.2,.25], 'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.18, 'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1021 (+/-0.0028) for {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.18, 'random_state': 32}
1.1021 (+/-0.0027) for {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.2, 'random_state': 32}
1.1026 (+/-0.0027) for {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.25, 'random_state': 32}

Test set  RMSE: 1.0914
Wall time: 2min 5s


In [72]:
%%time
param_grid = {'n_epochs':[20,30,40], 'lr_all': [.002, 0.005], 
              'reg_all': [.17,.18,.19,.2], 'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVD, param_grid, n_cv=3)

Best hyper-parameters: {'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.18, 'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1176 (+/-0.0034) for {'n_epochs': 20, 'lr_all': 0.002, 'reg_all': 0.17, 'random_state': 32}
1.1177 (+/-0.0034) for {'n_epochs': 20, 'lr_all': 0.002, 'reg_all': 0.18, 'random_state': 32}
1.1178 (+/-0.0034) for {'n_epochs': 20, 'lr_all': 0.002, 'reg_all': 0.19, 'random_state': 32}
1.1180 (+/-0.0034) for {'n_epochs': 20, 'lr_all': 0.002, 'reg_all': 0.2, 'random_state': 32}
1.1035 (+/-0.0029) for {'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.17, 'random_state': 32}
1.1035 (+/-0.0028) for {'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.18, 'random_state': 32}
1.1036 (+/-0.0028) for {'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.19, 'random_state': 32}
1.1037 (+/-0.0028) for {'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.2, 'random_state': 32}
1.1094 (+/-0.0032) for {'n_epochs': 30, 'lr_all': 0.002, 'reg_all': 0.17, 'random_state': 32}
1.109

#### SVDpp algorithm

In [50]:
%%time
param_grid = {'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, SVDpp, param_grid, n_cv=3)

Best hyper-parameters: {'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.1175 (+/-0.0042) for {'random_state': 32}

Test set  RMSE: 1.1108
Wall time: 9min 37s


SVDpp is so much slower and higher in RMSE than SVD when both are not tuned, so I will not use this algorithm. 

#### NMF algorithm

In [51]:
%%time
param_grid = {'random_state':[32]}
training_test_algorithm(train_raw_ratings, test_raw_ratings, NMF, param_grid, n_cv=3)

Best hyper-parameters: {'random_state': 32}

Training set RMSE for each hyper-parameter combination:
1.2139 (+/-0.0019) for {'random_state': 32}

Test set  RMSE: 1.1755
Wall time: 1min 8s


- NMF is much worse in RMSE and slightly slower than SVD when not tuned, so I will not use this algorithm.
- I also tried 4 KNN-based algorithms in the Surprise, but they threw a memory error while I'm using only 50% of 24GB memory. 

<a id='3'></a>
### Content-Based Filtering algorithms 

To be added

### References

Surprise
https://surprise.readthedocs.io/en/stable/index.html
