# MovieLens

Evaluating k-nearest neighbors and singular value decomposition techniques for collaborative filtering recommender systems

#### Variables

*int* `userId`: the integer ID of the anonymized user  
*int* `movieId`: the integer ID of the movie  
*int* `rating`: integer rating ranging from 1 to 5 given by the user to the movie  
*int* `timestamp`: the number of seconds had elapsed since the Unix epoch until the user rated the movie  

<br>

### Setting up

Install the required libraries.

In [None]:
%pip install -r requirements.txt

Import the dependencies to be used.

In [3]:
import numpy as np

from surprise import Dataset
from surprise import KNNBasic
from surprise import SVDpp
from surprise.model_selection import cross_validate
from surprise.model_selection import GridSearchCV

For reproducible results, set the seed for the pseudorandom number generator.

In [4]:
np.random.seed(0)

<br>

### Loading the dataset

Use the built-in MovieLens 100K dataset from the `surprise` library.

In [11]:
data = Dataset.load_builtin('ml-100k')

<br>

### Implementing the benchmark model

Implement a user-based neighborhood method with cosine as its similarity measure and run a 3-fold cross-validation on the model.

In [13]:
algo = KNNBasic(sim_options={'name': 'cosine'}, verbose=False)
cross_validate(algo, data, measures=['rmse'], cv=3, verbose=True)

Evaluating RMSE of algorithm KNNBasic on 3 split(s).

                  Fold 1  Fold 2  Fold 3  Mean    Std     
RMSE (testset)    1.0174  1.0276  1.0208  1.0219  0.0042  
Fit time          2.29    1.27    1.29    1.61    0.47    
Test time         5.59    4.88    4.56    5.01    0.43    


<br>

### Implementing SVD++

Perform grid search with 3-fold cross-validation to find the optimal parameter combination for the SVD++ model.

In [None]:
param_grid = {
    'n_factors': [10, 15],
    'n_epochs': [10, 15],
    'lr_all': [0.005, 0.0075],
    'reg_all': [0.02, 0.03],
    'verbose': [True]
}
gs = GridSearchCV(SVDpp, param_grid, measures=['rmse'], cv=3, joblib_verbose=2)
gs.fit(data)

This is the optimal parameter combination selected by the grid search.

In [20]:
print("Best SVD++ model:", gs.best_params['rmse'])
print("RMSE:", gs.best_score['rmse'], end="\n\n")

Best SVD++ model: {'n_factors': 10, 'n_epochs': 15, 'lr_all': 0.0075, 'reg_all': 0.02, 'verbose': True}
RMSE: 0.9266467179657721



<br>

### Results

The SVD++ algorithm with 0.9266 RMSE outperforms the benchmark k-NN model with 1.0219 by 9.33%.

<br>

- - -

#### Code authorship

2022 © Jessan Rendell G. Belenzo

<br>

#### Terms of use

Licensed under the GNU General Public License v3.0. See [LICENSE](https://github.com/jessanrendell/movielens/blob/main/LICENSE).

<br>

## Acknowledgments

Hug, Nicolas. "[Surprise: A Python library for recommender systems.](https://surpriselib.com/)" Journal of Open Source Software 5.52 (2020): 2174.

Harper, F. Maxwell, and Joseph A. Konstan. "[The movielens datasets: History and context.](https://grouplens.org/datasets/movielens/100k/)" Acm transactions on interactive intelligent systems (tiis) 5.4 (2015): 1-19.

Ricci, Francesco, Lior Rokach, and Bracha Shapira. "[Recommender systems handbook.](https://link.springer.com/chapter/10.1007/978-0-387-85820-3_1)" Springer, Boston, MA, 2011. 1-35.

Koren, Yehuda. "[Factorization meets the neighborhood: a multifaceted collaborative filtering model.](https://dl.acm.org/doi/abs/10.1145/1401890.1401944)" Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008.