### Building a Recommender system with Surprise

This try-it focuses on exploring additional algorithms with the `Suprise` library to generate recommendations.  Your goal is to identify the optimal algorithm by minimizing the mean squared error using cross validation. You are also going to select a dataset to use from [grouplens](https://grouplens.org/datasets/movielens/) example datasets.  

To begin, head over to grouplens and examine the different datasets available.  Choose one so that it is easy to create the data as expected in `Surprise` with user, item, and rating information.  Then, compare the performance of at least the `KNNBasic`, `SVD`, `NMF`, `SlopeOne`, and `CoClustering` algorithms to build your recommendations.  For more information on the algorithms see the documentation for the algorithm package [here](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html).

Share the results of your investigation and include the results of your cross validation and a basic description of your dataset with your peers.



In [10]:
from surprise import Dataset, Reader, SVD, NMF, KNNBasic, SlopeOne, CoClustering
from surprise.model_selection import cross_validate

import pandas as pd

In [11]:
df = pd.read_csv('ml-latest-small/ratings.csv')
reader = Reader(rating_scale=(0, 5))
sf = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)
train = sf.build_full_trainset()

In [12]:
def cv(algo, data):
    return cross_validate(algo, data, measures=['RMSE'], verbose=True)

### KNNBasic

In [13]:
cv_knnbasic = cross_validate(KNNBasic(), sf, measures=['RMSE'], verbose=True)

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Evaluating RMSE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9621  0.9442  0.9410  0.9490  0.9446  0.9482  0.0074  
Fit time          0.08    0.08    0.08    0.09    0.08    0.08    0.00    
Test time         0.89    0.74    0.73    0.76    0.83    0.79    0.06    


### SVD

In [21]:
cv_svd = cross_validate(SVD(), sf, measures=['RMSE'], verbose=True)

Evaluating RMSE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8754  0.8713  0.8740  0.8765  0.8726  0.8739  0.0019  
Fit time          0.82    0.81    0.84    0.96    0.82    0.85    0.05    
Test time         0.11    0.17    0.11    0.20    0.11    0.14    0.04    


### NMF

In [22]:
cv_nmf = cross_validate(NMF(), sf, measures=['RMSE'], verbose=True)

Evaluating RMSE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9278  0.9239  0.9211  0.9161  0.9193  0.9216  0.0040  
Fit time          1.48    1.48    1.59    1.53    1.46    1.51    0.05    
Test time         0.09    0.17    0.09    0.16    0.09    0.12    0.04    


### SlopeOne

In [16]:
cv_slopeone = cross_validate(SlopeOne(), sf, measures=['RMSE'], verbose=True)

Evaluating RMSE of algorithm SlopeOne on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9059  0.9023  0.9067  0.8961  0.9021  0.9026  0.0038  
Fit time          2.02    2.12    2.18    2.18    2.06    2.11    0.07    
Test time         3.90    3.91    3.96    3.93    3.82    3.90    0.05    


### CoClustering

In [17]:
cv_coclustering = cross_validate(CoClustering(), sf, measures=['RMSE'], verbose=True)

Evaluating RMSE of algorithm CoClustering on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9521  0.9390  0.9507  0.9445  0.9505  0.9473  0.0049  
Fit time          1.78    1.80    1.78    1.76    1.83    1.79    0.02    
Test time         0.13    0.08    0.15    0.07    0.07    0.10    0.03    
