# Singular Value Decomposition Model

Previously we saw a RMSE of 3.2, which on the surface may seem to be alright, however if we interpret it, it means that our memory-based collaborative filtering approach predicts the rating to be either 64% more or less than the actual rating. This is not ideal, and there are a few possible reasons for this. 

1. Overfit
2. Doesn't scale well to massive datasets, the dataset we have elected to use is the 100k dataset, and in reality, there are many many more times larger datasets in real-life production scenarios.

As such we most likely need to explore some dimensionality reduction techniques to help us improve our RMSE score. 
Here are some benefits we are likely to see from reducing dimensionality.

To achieve this, we will make use of the `surprise` library that is available in python for building and analyzing recommender systems that deal with explicit rating data. [A paper going into much greater detail about this awesome library is cited here!](https://doi.org/10.21105/joss.02174). This is relevant here as we have a 100 Thousand dataset with explicit rating for movies.

### I will be running ALL the surprise models. I think it will actually be quite simple. What I am concerned about is actually error analysis. How can this be done? Additionally, I think using neural networks for recommender systems is actually going to be not as easy as plug and play like some previous examples that I saw...

In [1]:
from surprise import (BaselineOnly, KNNBasic, KNNWithMeans, KNNWithZScore, KNNBaseline, SVD, SVDpp, NMF, SlopeOne, 
CoClustering)
from surprise import Dataset, Reader
from surprise.model_selection import cross_validate, GridSearchCV, train_test_split
from surprise.accuracy import rmse, mae
from sklearn.metrics import mean_squared_error

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from math import sqrt

In [3]:
import statistics

In order to have reproducible results, we will create a random seed for this whole notebook.

In [4]:
import random

my_seed = 42
random.seed(my_seed)
np.random.seed(my_seed)

In [5]:
df = pd.read_csv('./datasets/merged_users+movies.csv')

In [4]:
df.head()

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure Animation Children Comedy Fantasy
1,5,1,4.0,847434962,Toy Story (1995),Adventure Animation Children Comedy Fantasy
2,7,1,4.5,1106635946,Toy Story (1995),Adventure Animation Children Comedy Fantasy
3,15,1,2.5,1510577970,Toy Story (1995),Adventure Animation Children Comedy Fantasy
4,17,1,4.5,1305696483,Toy Story (1995),Adventure Animation Children Comedy Fantasy


# Start HERE

Even though the surprise library has been modeled and inspired by the excellent scikit library, the way some of the methods are used is different since we are dealing with a library that has been built specifically for recommender systems for datasets with explicit ratings. We will make use of the documentation found [here](http://surpriselib.com/).

In [159]:
# we will use the min and max ratings observed from our data
reader = Reader(rating_scale=(min(df['rating']), max(df['rating'])))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)

Now we are ready to start modeling.

One of the metrics that we are interested in using is the precision@k and recall@k. In order to use this metric, we need to identify a `threshold`. We will go about doing this now. 
(also need to explain how this metric works).

Now we are ready to start modeling. Due to the way the surprise library is built, the way we fit our data for our model with and without `GridSearchCV` is different. So we will need to redefine our `data` once we are ready to perform hyper parameter tuning. Let's start with all the base models first.

In [26]:
svd = SVD()
svd.fit(data.build_full_trainset())

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x254649e23a0>

In [28]:
svd.sim_options

{'user_based': True}

In [37]:
cross_val = cross_validate(svd, data, measures=['rmse', 'mae'], cv=3, verbose=True, return_train_measures=True)

Evaluating RMSE, MAE of algorithm SVD on 3 split(s).

                  Fold 1  Fold 2  Fold 3  Mean    Std     
RMSE (testset)    0.8646  0.8615  0.8646  0.8636  0.0015  
MAE (testset)     0.6633  0.6608  0.6638  0.6626  0.0013  
RMSE (trainset)   0.6432  0.6422  0.6417  0.6424  0.0006  
MAE (trainset)    0.4981  0.4976  0.4966  0.4974  0.0006  
Fit time          2.21    2.29    2.32    2.27    0.05    
Test time         0.20    0.14    0.13    0.16    0.03    


In [35]:
pd.DataFrame(cross_val)

Unnamed: 0,test_rmse,test_mae,fit_time,test_time
0,0.856328,0.65722,2.272714,0.203294
1,0.87041,0.666395,2.458013,0.20689
2,0.859516,0.660783,2.348071,0.142336


In [92]:
models = {'baseline': BaselineOnly(),
          'knn_basic': KNNBasic(),
          'knn_means': KNNWithMeans(),
          'knn_zscore': KNNWithZScore(),
          'knn_baseline': KNNBaseline(),
          'svd': SVD(),
          'svdpp': SVDpp(),
          'nmf': NMF(),
          'slope': SlopeOne(),
          'cluster': CoClustering(),
         }

In [93]:
model_eval = []

In [404]:
# Function to run model -- input model
def get_base_model_scores(model_name,
                          mod):
    
    """Function accepts following inputs:
    Name of model (str), model to be used (str)."""
    
    # empty dict for appending results
    results = {}
    
    cross_val_model = cross_validate(models[mod],
                                     data,
                                     measures=['rmse'],
                                     cv=3,
                                     return_train_measures=True,
                                     n_jobs=-1,
                                     verbose=False
                                    )
    
    results['model_name'] = model_name
    results['model'] = mod
    results['train_rmse'] = cross_val_model['train_rmse'].mean()
    results['test_rmse'] = cross_val_model['test_rmse'].mean()
    
    # using statistics method as the below two attributes
    # are tuples, so the .mean() method will not work
    # the above are arrays, so the .mean() method works
    
    results['fit_time'] = statistics.mean(cross_val_model['fit_time'])
    results['test_time'] = statistics.mean(cross_val_model['test_time'])
    
    model_eval.append(results)
    print('Base model fit is complete.')
    
    return cross_val_model

In [414]:
base_baseline_only = get_base_model_scores('base_baseline_only', 'baseline')

Base model fit is complete.


In [413]:
base_knn_basic = get_base_model_scores('base_knn_basic', 'knn_basic')

Base model fit is complete.


In [412]:
base_knn_means = get_base_model_scores('base_knn_means', 'knn_means')

Base model fit is complete.


In [411]:
base_knn_zscore = get_base_model_scores('base_knn_zscore', 'knn_zscore')

Base model fit is complete.


In [410]:
base_knn_baseline = get_base_model_scores('base_knn_baseline', 'knn_baseline')

Base model fit is complete.


In [409]:
base_svd = get_base_model_scores('base_svd', 'svd')

Base model fit is complete.


In [408]:
base_svdpp = get_base_model_scores('base_svdpp', 'svdpp')

Base model fit is complete.


In [407]:
base_nmf = get_base_model_scores('base_nmf', 'nmf')

Base model fit is complete.


In [406]:
base_slop = get_base_model_scores('base_slop', 'slope')

Base model fit is complete.


In [405]:
base_cluster = get_base_model_scores('base_cluster', 'cluster')

Base model fit is complete.


In [417]:
base_model_scores = pd.DataFrame(model_eval)

In [418]:
base_model_scores.sort_values('test_rmse')

Unnamed: 0,model_name,model,train_rmse,test_rmse,fit_time,test_time
6,base_svdpp,svdpp,0.675951,0.866679,302.668346,6.992896
0,base_baseline_only,baseline,0.837641,0.876241,0.050725,0.111516
5,base_svd,svd,0.639353,0.879827,2.926874,0.167982
4,base_knn_baseline,knn_baseline,0.655359,0.882639,0.160189,1.92627
3,base_knn_zscore,knn_zscore,0.668391,0.903679,0.157216,1.621699
2,base_knn_means,knn_means,0.673119,0.904506,0.11151,1.488612
8,base_slop,slope,0.551231,0.909538,2.661436,4.934569
7,base_nmf,nmf,0.577744,0.937035,3.727779,0.152249
9,base_cluster,cluster,0.812592,0.954283,1.698625,0.132729
1,base_knn_basic,knn_basic,0.697346,0.959823,0.094301,1.339809


Based on the above table, we can see that the lowest(below 90%) `rmse` scores are for the `SVD++`, `BaselineOnly`, `SVD` amd `KNNBaseline`. As `BaselineOnly` predicts a rating based on the distribution of the training set, (assumed to be normal), we will not further tune it's hyperparameters, instead, we will focus on the other three models.

The reason for this is so that this model could be applicable to other movie rating datasets, and may follow a different distribution, (*not normal*).

In the following section, we will tune the hyperparameters of these three models using the `surprise` library `GridSearchCV` method.

Considering that the base `SVD++` model took 302s (5mins) on average to fit 1 fold, we will start with `SVD`, then move on to `KNNBaseline` and then move to `SVD++`. `SVD++` hyperparameters tuning will most likley take a lot of time, even if it is our best performing model so far.

**Observations about fit_time.**

---

# Tuning our best models

Since we are using `GridSearchCV`, we will need to make some changes to our modeling function we used previously. We will also need to re-instantiate our models in the way that is specifically required for `GridSearchCV`. 

In [440]:
# as per the surprise documentation, the models
# must be instatiated in GridSearchCV without
# parentheses
model_tuned = {'knn_baseline': KNNBaseline,
               'svd': SVD,
               'svdpp': SVDpp,
              }

In [319]:
# Function to run model -- input model
def tuned_model_scores(model_name, 
                       mod, 
                       mod_params={}):
    
    """Function accepts following inputs:
    Name of model (str), model to be used (str), 
    model params(dict, optional)."""
    
    # empty dict for appending results
    results = {}
    
    # instantiate GridSearchCV
    gs = GridSearchCV(model_tuned[mod], 
                      param_grid=mod_params,
                      measures=['rmse'],
                      cv=3,
                      n_jobs=-1,
                      return_train_measures=True
                     )

    # fit model
    gs.fit(data)
    
    temp_df = pd.DataFrame(gs.cv_results)
    num_fits = len(temp_df)*3

    temp_df = temp_df.loc[gs.best_index['rmse'], 
                          ['mean_train_rmse', 
                           'mean_test_rmse', 
                           'mean_fit_time', 
                           'mean_test_time', 
                           'params']].values.tolist()
    
    # Retrieve metrics and add to results
    results['model_name'] = model_name
    results['model'] = mod
    results['train_rmse'] = temp_df[0]
    results['test_rmse'] = temp_df[1]
    results['fit_time'] = temp_df[2]
    results['test_time'] = temp_df[3]
    
    # add results to list for model evaluation later
    model_eval.append(results)
    
    print(f'{num_fits} fits have been completed.')
    
    return gs

With this function, we are now ready to begin with hyperparameter tuning for our 3 selected models.

In [304]:
# defining SVD parameters
svd_params = {'n_factors': [80, 90, 100, 110, 120], 
              'n_epochs': [15, 20, 25, 30, 35], 
              'biased': [True, False], 
              'lr_all': [0.003, 0.004, 0.005, 0.006, 0.007],
              'reg_all': [0.01, 0.02, 0.03, 0.04]
             }

In [338]:
# first tuning attempt
svd_tuned_1 = tuned_model_scores('svd_tuned_1', 
                                 'svd', 
                                 mod_params=svd_params)

3000 fits have been completed.


In [339]:
# getting best params
svd_tuned_1.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'biased': True,
  'lr_all': 0.007,
  'reg_all': 0.04}}

In [420]:
# getting rmse, base rmse was 0.879827
svd_tuned_1.best_score

{'rmse': 0.8705537233081375}

We have managed to see an improvement in `rmse` by almost **0.01**. <br>
However, if we compare the best parameters chosen by `GridSeachCV`, we can see that more tuning can be done for `n_factors`, `lr_all`, and `reg_all`. Let's continue tuning the model, by redefining new params for `GridSearchCV` to search through.

In [341]:
# redefining new params based on previous tuned model
svd_params_V2 = {'n_factors': [30, 50, 80], 
                 'n_epochs': [25], 
                 'biased': [True], 
                 'lr_all': [0.007, 0.008, 0.009],
                 'reg_all': [0.04, 0.05, 0.06, 0.07, 0.08]
                }

In [342]:
# second tuning attempt
svd_tuned_2 = tuned_model_scores('svd_tuned_2', 
                                 'svd', 
                                 mod_params=svd_params_V2)

135 fits have been completed.


In [344]:
svd_tuned_2.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'biased': True,
  'lr_all': 0.009,
  'reg_all': 0.08}}

In [421]:
# getting rmse, base rmse was 0.879827
svd_tuned_2.best_score

{'rmse': 0.8655018320388175}

We have managed to see an improvement in `rmse` by **0.005** more. <br>
Again, if we compare the best parameters chosen by `GridSeachCV`, we can see that more tuning can be done for `lr_all`, and `reg_all`. Let's continue tuning the model, by redefining new params for `GridSearchCV` to search through.

In [346]:
# redefining new params based on previous tuned model
svd_params_V3 = {'n_factors': [60, 70, 80], 
                 'n_epochs': [25], 
                 'biased': [True], 
                 'lr_all': [0.0085, 0.009, 0.01],
                 'reg_all': [0.075, 0.08, 0.09]
                }

In [347]:
# third tuning attempt
svd_tuned_3 = tuned_model_scores('svd_tuned_3', 
                                 'svd', 
                                 mod_params=svd_params_V3)

81 fits have been completed.


In [348]:
# getting best params
svd_tuned_3.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'biased': True,
  'lr_all': 0.01,
  'reg_all': 0.075}}

In [422]:
# getting rmse, base rmse was 0.879827
svd_tuned_3.best_score

{'rmse': 0.8629833150886892}

We have managed to see an improvement in `rmse` by **0.003** more. <br>
Again, if we compare the best parameters chosen by `GridSeachCV`, we can see that more tuning can be done for `lr_all`. Let's continue tuning the model, by redefining new params for `GridSearchCV` to search through.

In [350]:
# redefining new params based on previous tuned model
svd_params_V4 = {'n_factors': [80], 
                 'n_epochs': [25], 
                 'biased': [True], 
                 'lr_all': [0.0095, 0.01, 0.012, 0.014],
                 'reg_all': [0.075]
                }

In [351]:
# fourth tuning attempt
svd_tuned_4 = tuned_model_scores('svd_tuned_4', 
                                 'svd', 
                                 mod_params=svd_params_V4)

12 fits have been completed.


In [353]:
# getting best params
svd_tuned_4.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'biased': True,
  'lr_all': 0.014,
  'reg_all': 0.075}}

In [423]:
# getting rmse, base rmse was 0.879827
svd_tuned_4.best_score

{'rmse': 0.8621820643257244}

We have managed to see an improvement in `rmse` by **0.0008** more. This is now a very small improvement in the model. <br>
Again, if we compare the best parameters chosen by `GridSeachCV`, we can see that more tuning can be done for `lr_all`. Let's make one final attempt to tune our `SVD` model. 

In [None]:
# redefining new params based on previous tuned model
svd_params_V5 = {'n_factors': [80], 
                 'n_epochs': [25], 
                 'biased': [True], 
                 'lr_all': [0.014, 0.018, 0.02, 0.03, 0.05],
                 'reg_all': [0.075]
                }

In [367]:
# fifth tuning attempt
svd_tuned_5 = tuned_model_scores('svd_tuned_5', 
                                 'svd', 
                                 mod_params=svd_params_V5)

15 fits have been completed.


In [368]:
svd_tuned_5.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'biased': True,
  'lr_all': 0.014,
  'reg_all': 0.075}}

In [380]:
svd_tuned_5.best_score

{'rmse': 0.861168716056909}

We have managed to see an improvement in `rmse` by **0.001** more. Oddly enough we are seeing the same parameters.<br>
Most likely we are close to the best possible model, if not the best for `SVD`. As such, this will be our final `SVD` tuned model. Let's take a look at all the scores so far.

In [34]:
model_scores = pd.DataFrame(model_eval)
model_scores[model_scores['model'] == 'svd']

Unnamed: 0,model_name,model,train_rmse,test_rmse,fit_time,test_time
5,base_svd,svd,0.639353,0.879827,2.926874,0.167982
10,svd_tuned_1,svd,0.569728,0.870554,7.784396,0.455209
11,svd_tuned_2,svd,0.644704,0.865502,6.249327,0.207198
12,svd_tuned_3,svd,0.603065,0.862983,7.057509,0.37629
13,svd_tuned_4,svd,0.521653,0.862182,6.787233,0.303089
14,svd_tuned_5,svd,0.521521,0.861169,6.284704,0.372672


As we can see from the above, we have managed to see an improvement in our `test_rmse` score for the `SVD` model, by **0.01866**. I believe we have pushed this model as far as it can go. Let's move on to tuning the `KNNBaseline` model.

In [438]:
knnbase_params_als = {'k': [20, 30, 40, 50, 60], 
                      'min_k': [1, 2, 3],
                      'bsl_options': {'method': ['als'],
                                      'n_epochs': [8, 9, 10, 11, 12], 
                                      'reg_u': [13, 14, 15, 16, 17],
                                      'reg_i': [8, 9, 10, 11, 12]
                                  }
                     }

knnbase_params_sgd = {'k': [20, 30, 40, 50, 60], 
                      'min_k': [1, 2, 3], 
                      'bsl_options': {'method': ['sgd'], 
                                      'learning_rate': [0.00003, 0.00004, 0.00005, 0.00006, 0.00007]
                                     }
                     }

In [441]:
knn_als_tuned_1 = tuned_model_scores('knn_als_tuned_1', 
                                     'knn_baseline', 
                                     mod_params=knnbase_params_als)

5625 fits have been completed.


In [442]:
knn_sgd_tuned_1 = tuned_model_scores('knn_sgd_tuned_1', 
                                     'knn_baseline', 
                                     mod_params=knnbase_params_sgd)

225 fits have been completed.


In [33]:
model_scores = pd.DataFrame(model_eval)
model_scores

Unnamed: 0,model_name,model,train_rmse,test_rmse,fit_time,test_time
0,base_baseline_only,baseline,0.837641,0.876241,0.050725,0.111516
1,base_knn_basic,knn_basic,0.697346,0.959823,0.094301,1.339809
2,base_knn_means,knn_means,0.673119,0.904506,0.11151,1.488612
3,base_knn_zscore,knn_zscore,0.668391,0.903679,0.157216,1.621699
4,base_knn_baseline,knn_baseline,0.655359,0.882639,0.160189,1.92627
5,base_svd,svd,0.639353,0.879827,2.926874,0.167982
6,base_svdpp,svdpp,0.675951,0.866679,302.668346,6.992896
7,base_nmf,nmf,0.577744,0.937035,3.727779,0.152249
8,base_slop,slope,0.551231,0.909538,2.661436,4.934569
9,base_cluster,cluster,0.812592,0.954283,1.698625,0.132729


In [444]:
knn_als_tuned_1.best_params

{'rmse': {'k': 30,
  'min_k': 3,
  'bsl_options': {'method': 'als', 'n_epochs': 12, 'reg_u': 13, 'reg_i': 8}}}

In [446]:
knnbase_params_als_V2 = {'k': [30], 
                         'min_k': [3, 4, 5], 
                         'bsl_options': {'method': ['als'], 
                                         'n_epochs': [12, 13, 14], 
                                         'reg_u': [11, 12, 13], 
                                         'reg_i': [6, 7, 8] 
                                        }
                        }

In [447]:
knn_als_tuned_2 = tuned_model_scores('knn_als_tuned_2', 
                                     'knn_baseline', 
                                     mod_params=knnbase_params_als_V2)

243 fits have been completed.


In [450]:
knn_als_tuned_2.best_score

{'rmse': 0.8700311882994001}

In [451]:
knn_als_tuned_2.best_params

{'rmse': {'k': 30,
  'min_k': 5,
  'bsl_options': {'method': 'als', 'n_epochs': 14, 'reg_u': 11, 'reg_i': 6}}}

In [452]:
knnbase_params_als_V3 = {'k': [30], 
                         'min_k': [5, 10, 15], 
                         'bsl_options': {'method': ['als'], 
                                         'n_epochs': [14, 18, 22], 
                                         'reg_u': [7, 9, 11], 
                                         'reg_i': [2, 4, 6] 
                                        }
                        }

In [453]:
knn_als_tuned_3 = tuned_model_scores('knn_als_tuned_3', 
                                     'knn_baseline', 
                                     mod_params=knnbase_params_als_V3)

243 fits have been completed.


In [455]:
knn_als_tuned_3.best_params

{'rmse': {'k': 30,
  'min_k': 10,
  'bsl_options': {'method': 'als', 'n_epochs': 22, 'reg_u': 7, 'reg_i': 2}}}

In [457]:
knn_als_tuned_3.best_score

{'rmse': 0.86391808648022}

In [458]:
knnbase_params_als_V4 = {'k': [30], 
                         'min_k': [10], 
                         'bsl_options': {'method': ['als'], 
                                         'n_epochs': [22, 25, 30], 
                                         'reg_u': [3, 5, 7], 
                                         'reg_i': [0.5, 1, 1.5, 2] 
                                        }
                        }

In [459]:
knn_als_tuned_4 = tuned_model_scores('knn_als_tuned_4', 
                                     'knn_baseline', 
                                     mod_params=knnbase_params_als_V4)

108 fits have been completed.


In [461]:
knn_als_tuned_4.best_params

{'rmse': {'k': 30,
  'min_k': 10,
  'bsl_options': {'method': 'als', 'n_epochs': 30, 'reg_u': 7, 'reg_i': 2}}}

In [462]:
knn_als_tuned_4.best_score

{'rmse': 0.8647604619133166}

We will attempt tuning one more time, as I think we can make just a bit more improvement on this model.

In [483]:
knnbase_params_als_V5 = {'k': [30], 
                         'min_k': [10], 
                         'bsl_options': {'method': ['als'], 
                                         'n_epochs': [30, 50, 100, 120], 
                                         'reg_u': [7], 
                                         'reg_i': [2] 
                                        }
                        }

In [484]:
knn_als_tuned_5 = tuned_model_scores('knn_als_tuned_5', 
                                     'knn_baseline', 
                                     mod_params=knnbase_params_als_V5)

12 fits have been completed.


In [485]:
knn_als_tuned_5.best_score

{'rmse': 0.8637914129499696}

In [486]:
knn_als_tuned_5.best_params

{'rmse': {'k': 30,
  'min_k': 10,
  'bsl_options': {'method': 'als', 'n_epochs': 120, 'reg_u': 7, 'reg_i': 2}}}

Based on this result here, after tuning our model, we still see that the `SVD` tuned model has the lowest `rmse` score so far. We will stop tuning this model here. <br>
Let's evaluate the score of this model vs all the tuning iterations that we have done so far.

In [32]:
model_scores = pd.DataFrame(model_eval)
model_scores[model_scores['model'] == 'knn_baseline']

Unnamed: 0,model_name,model,train_rmse,test_rmse,fit_time,test_time
4,base_knn_baseline,knn_baseline,0.655359,0.882639,0.160189,1.92627
15,knn_als_tuned_1,knn_baseline,0.684323,0.872226,0.39073,4.714972
16,knn_sgd_tuned_1,knn_baseline,0.691578,0.909773,0.749931,4.241354
17,knn_als_tuned_2,knn_baseline,0.701165,0.870031,0.392658,4.737912
18,knn_als_tuned_3,knn_baseline,0.701315,0.863918,0.667222,4.990317
19,knn_als_tuned_4,knn_baseline,0.701389,0.86476,0.633972,5.16285
20,knn_als_tuned_5,knn_baseline,0.701122,0.863791,1.595689,3.787077


From this we can see that we have improved the `knn_baseline` model `test_rmse` performance by **0.0188**. This is a great improvement, however our best model so far is the `SVD` tuned model that we tuned before.<br>
Let's move on to our final model we are going to tune from the surprise library, `SVD++`.

---

## SVDpp Tuning

When we ran the base models, our `SVD++` model took the longest time for fitting. **302s** to be exact. This was much longer than any of the other models. As such, let's first try out the hyperparamters that we have tuned for our `SVD` model, and see if we can use that to help us tune the `SVD++` model.

In [498]:
svd_tuned_5.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'biased': True,
  'lr_all': 0.014,
  'reg_all': 0.075}}

In [501]:
svdpp_params = {'n_factors': [80], 
                'n_epochs': [25], 
                'lr_all': [0.014],
                'reg_all': [0.075], 
                'init_mean': [0, 5, 10], 
                'init_std_dev': [0.05, 0.1, 0.2]
               }

In [502]:
svdpp_tuned_1 = tuned_model_scores('svdpp_tuned_1', 
                                   'svdpp', 
                                   mod_params=svdpp_params
                                  )

27 fits have been completed.


In [503]:
svdpp_tuned_1.best_score

{'rmse': 0.8552280403135537}

In [504]:
svdpp_tuned_1.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'lr_all': 0.014,
  'reg_all': 0.075,
  'init_mean': 0,
  'init_std_dev': 0.05}}

This fit took a couple of hours, so let's just try one more tuning attempt. So far this is the best model we have with the lowest `test_rmse` score.

In [507]:
svdpp_params_V2 = {'n_factors': [80], 
                   'n_epochs': [25], 
                   'lr_all': [0.014], 
                   'reg_all': [0.075], 
                   'init_mean': [0], 
                   'init_std_dev': [0.01, 0.025, 0.05]
                  }

In [508]:
svdpp_tuned_2 = tuned_model_scores('svdpp_tuned_2', 
                                   'svdpp', 
                                   mod_params=svdpp_params_V2
                                  )

9 fits have been completed.


In [509]:
svdpp_tuned_2.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'lr_all': 0.014,
  'reg_all': 0.075,
  'init_mean': 0,
  'init_std_dev': 0.025}}

In [510]:
svdpp_tuned_2.best_score

{'rmse': 0.853009224619437}

Let's take a look at all the `SVD++` scores so far.

In [29]:
model_scores = pd.DataFrame(model_eval)
model_scores[model_scores['model'] == 'svdpp']

Unnamed: 0,model_name,model,train_rmse,test_rmse,fit_time,test_time
6,base_svdpp,svdpp,0.675951,0.866679,302.668346,6.992896
21,svdpp_tuned_1,svdpp,0.543026,0.855228,2312.05477,18.585806
22,svdpp_tuned_2,svdpp,0.613652,0.853009,1801.290747,14.459661


From this we can see that we managed to improve our based model `test_rmse` score by **0.0137**. Considering the amount of time taken for this model to be trained, we will stop here for the tuning of this model.

Now lets take a look at our three best models and compare the `test_rmse` scores.

In [31]:
model_scores.loc[[14, 20, 22],['model_name', 'model', 'test_rmse']]

Unnamed: 0,model_name,model,test_rmse
14,svd_tuned_5,svd,0.861169
20,knn_als_tuned_5,knn_baseline,0.863791
22,svdpp_tuned_2,svdpp,0.853009


From the above table, we can see that our best model is indeed the `SVD++` model with a `test_rmse` of **0.853**. Interpreting this with reference to our dataset, this means that our model is likely to predict a rating for a movie for a specific user within a range of plus-minus **17%** of the actual rating. <br>

Let's take a closer look at our best model and conduct some error analysis on our model.

In [521]:
best_model = svdpp_tuned_2.best_estimator['rmse']

*What I am going to do here is stop working here, and use this part to instantiate a new SVDpp() model. Then, I will input all of the hyperparameters as per what the best model is. Then I will pretend it has been loaded from the "best model". This will ensure some sort of flow in my notebook. All that is pending for this notebook now, is error analysis of the SVD++ model. And then start with my research for neural network. By friday EOD, the neural network should have been fit and rmse obtained. Then I have Sat to finish up everything to do with the neural network and check through everything once, to ensure that my whole project seems to be fine. I might want to check once with Chee Yong if he might have time for a quick 10min call on Sat. After that, my main work will be markdown, comments, cleanup, readme, and slides. Make sure all the formulas, and hyperparameters have been input. I may have to write down the formulas for each of the models that I am hyper parameter tuning. That might be required. Its good practice for me to know also.*

**Just realized while looking at the test time, that SVDpp takes 18 SECONDS to produce the test results. Thats way too long mate!! this means we are going to use our second best model, SVD.** 

In [522]:
best_model.fit()

<surprise.prediction_algorithms.matrix_factorization.SVDpp at 0x25470d06ca0>

In [523]:
svdpp_tuned_2.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'lr_all': 0.014,
  'reg_all': 0.075,
  'init_mean': 0,
  'init_std_dev': 0.025}}

# add in that most likely, to let the ratings make sense, we need to view the history of each user, then we need to make changes to the ratings. So for the specific user, what is the mean? if the mean is 3.5, bring it down to 2.5. And replicate this for all. If this isn't done, it will result in mismatch of recommendations. Because if 1 user likes something 5, that could be a 4 for another user. So we have to look at the distributions for each user, and "normalize" (wrong word) it so that across the board, a 2.5 refers to an average movie. I will try this during the error evaluation portion.

In [6]:
# we will use the min and max ratings observed from our data
reader = Reader(rating_scale=(min(df['rating']), max(df['rating'])))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)

In [8]:
trainset, testset = train_test_split(data,
                                    test_size=0.2,
                                    random_state=42
                                    )

We will define all the parameters of the `SVDpp` model as per the best params that we saw earlier.

In [26]:
algo_best_model = SVDpp(n_factors=80, 
                        n_epochs=25, 
                        lr_all=0.014, 
                        reg_all=0.075, 
                        init_mean=0, 
                        init_std_dev=0.025, 
                        verbose=False
                       )

In [10]:
algo1.fit(trainset);

<surprise.prediction_algorithms.matrix_factorization.SVDpp at 0x28057f60b80>

In [13]:
predictions = algo1.test(testset)

In [14]:
rmse(predictions)

RMSE: 0.8442


0.8442193432294534

Here we can see that the `rmse` is even better than when we were tuning. However the predictions took about **8 seconds** to generate, and due to that reason, this model may not be the best model when it comes to movie recommendations. Users will not wait around for **8 seconds** for an app to load on their mobile device, they tend to just switch apps, and now that there is so much fierce competition in the video streaming space, this length of time is too long. 

Let's test out how long the `SVD` model (our 2nd best model) takes to create predictions.

In [498]:
svd_tuned_5.best_params

{'rmse': {'n_factors': 80,
  'n_epochs': 25,
  'biased': True,
  'lr_all': 0.014,
  'reg_all': 0.075}}

In [29]:
algo2 = SVD(n_factors=80,
            n_epochs=25,
            biased=True,
            lr_all=0.014,
            reg_all=0.075,
            verbose=False
)

In [30]:
algo2.fit(trainset);

In [17]:
predictions2 = algo2.test(testset)

In [18]:
rmse(predictions2)

RMSE: 0.8533


0.8533256808314602

The cell generating the predictions took a split second to run. If we refer back to the model scores table, we can see that the test times were very different for these two models.

## Next step: Error Analysis. Find that the ratings can be quite wrong. Then do mean-centering on all the user ratings, then rerun the model, see if it improves the `rmse` and the error analysis. I think it should.

In [31]:
7

7

After consideration of these results, I realized something about the ratings, which was also made clear by the EDA done earlier. 

The ratings for each user might mean something different. 

For example, user A may give a 3 for an average movie, and a 4 for movies they really like, maybe a maximum rating of 4.5.

User B, may give a rating of 2.5 for an average movie. 3 for a movie they like, and 5 for a movie they will watch again. 

User C may give a 2 for any movie they would not watch again (below average) and a 4 for any movie they like, regardless of how much they like it. 

Given these discrepancies between users, it might be better for us to do some feature engineering on the ratings given by all the users. This would have to be done on a per user basis, by analysis the distribution of all of the users past ratings, and normalizing it to a mean of 3. i.e. users who have a mean of 2.5 will get 0.5 added to all their ratings, and users who have a mean of 3.5 will get 0.5 subtracted from their ratings.

These new ratings, would then "mean" the same across all the users. 

### my thoughts: I think this is relevant because the model is recommending movies based on other user's rating of the movies, and the movies that the user has watched.

# From Chee Yong for Neural Networks

1. key thing to tune in the NN is the learning rate. 
2. look at regularization aspects to make sure that you don't overfit
3. depending on architecture, how many layers? more/less complex
4. depending on how fast the model runs, batch size (using batch normalization)

# Everything below this is meant to be removed!

## Testing out the surprise library

In [7]:
model_df = df[['userId', 'movieId', 'rating']]
reader = Reader(rating_scale=(1, 5))

data = Dataset.load_from_df(model_df, reader)

In [14]:
cross_validate(
    SVD(),
    data,
    cv=2)

{'test_rmse': array([0.88826356, 0.89173906]),
 'test_mae': array([0.68305157, 0.68694068]),
 'fit_time': (2.442307233810425, 2.249831438064575),
 'test_time': (0.43283677101135254, 0.3179023265838623)}

In [15]:
model_eval = []

In [16]:
model_eval.append(cross_validate(
    SVD(),
    data,
    cv=2))

In [17]:
model_eval

[{'test_rmse': array([0.88875795, 0.89281425]),
  'test_mae': array([0.68532927, 0.68730914]),
  'fit_time': (3.0132787227630615, 3.4397733211517334),
  'test_time': (0.3871781826019287, 0.30802178382873535)}]

In [31]:
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x2e801dd3e50>

In [41]:
rmse(algo.test(trainset.build_testset()))

RMSE: 0.6363


0.6363030508187035

In [40]:
rmse(algo.test(testset))

RMSE: 0.8726


0.8726184158765463

In [27]:
train_data

<surprise.dataset.DatasetAutoFolds at 0x2e8041683a0>

In [29]:
# please take note that this train_test_split is from the 
# SURPRISE library, it is built to accept the type
# of data that the surprise algorithms take
trainset, testset = train_test_split(data,
                                    test_size=0.2, 
                                    random_state=42)

In [30]:
trainset

<surprise.trainset.Trainset at 0x2e801ee34c0>

In [110]:
# method for splitting the data if we want to
# tune the hyperparameters for our model
raw_ratings = data.raw_ratings

# shuffling our raw_ratings
random.shuffle(raw_ratings)

# train = 80% of the data, test = 20% of the data
# this gets the last row of the train set
train_rows = int(0.8 * len(raw_ratings))

# splitting raw_ratings into train and test set
train_raw_ratings = raw_ratings[:train_rows]
test_raw_ratings = raw_ratings[train_rows:]  # we can use our testset as is

# redefining 'data' as our train set
data.raw_ratings = train_raw_ratings 