### Codio Activity 19.6: Using SURPRISE

**Expected Time = 60 minutes**

**Total Points = 50**

This activity focuses on using the `Surprise` library to predict user ratings.  You will use a dataset derived from the movieLens data -- a common benchmark for recommendation algorithms.  Using `Surprise` you will load the data, create a train set and test set, make predictions for a test set, and cross validate the model on the dataset. 

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)

In [2]:
from surprise import Dataset, Reader, SVD
import pandas as pd
import numpy as np

ModuleNotFoundError: No module named 'surprise'

### The Data

The data is derived from the MovieLens data [here](https://grouplens.org/datasets/movielens/).  A smaller sample has been culled so the processing is faster, but the data is user reviews of different movies.  We have information on the user, movie, and the associated ratings when they exist.

In [None]:
movie_ratings = pd.read_csv("./data/movie_ratings.csv", index_col=0)

NameError: name 'pd' is not defined

In [None]:
movie_ratings.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5


[Back to top](#-Index)

### Problem 1

#### Loading a Dataset

**10 Points**

Below, use the `Reader` and `Dataset` objects to create a dataset object named `sf` below.  Use the dataset to construct a train set named `train`.

In [None]:
### GRADED
reader = Reader(rating_scale=(0, 5))
sf = Dataset.load_from_df(movie_ratings[["userId", "title", "rating"]], reader)
train = sf.build_full_trainset()

### ANSWER CHECK
print(type(sf))
print(type(train))

<class 'surprise.dataset.DatasetAutoFolds'>
<class 'surprise.trainset.Trainset'>


[Back to top](#-Index)

### Problem 2

#### Instantiate the `SVD` model

**10 Points**

Below, create an `SVD` object with 2 factors and assign it as `model` below.

In [None]:
### GRADED
model = SVD(n_factors=2)

### ANSWER CHECK
print(model.n_factors)

2


[Back to top](#-Index)

### Problem 3

### Fitting the Model

**10 Points**

Below, fit the model on the training data. 

In [None]:
### GRADED
model.fit(train)

### ANSWER CHECK
print(model)

<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x794138058cf8>


[Back to top](#-Index)

### Problem 4

### Making Predictions

**10 Points**

Build a testset named `test` and use this to create a list of predictions for the testset.  Assign this to `predictions_list` below.

In [None]:
### GRADED
test = train.build_testset()
predictions_list = model.test(test)


### ANSWER CHECK
print(predictions_list[:5])

[Prediction(uid=1, iid='Toy Story (1995)', r_ui=4.0, est=4.687038954261898, details={'was_impossible': False}), Prediction(uid=1, iid='Grumpier Old Men (1995)', r_ui=4.0, est=4.016474540158991, details={'was_impossible': False}), Prediction(uid=1, iid='Heat (1995)', r_ui=4.0, est=4.750845084940402, details={'was_impossible': False}), Prediction(uid=1, iid='Seven (a.k.a. Se7en) (1995)', r_ui=5.0, est=4.7561820985923315, details={'was_impossible': False}), Prediction(uid=1, iid='Usual Suspects, The (1995)', r_ui=5.0, est=5, details={'was_impossible': False})]


[Back to top](#-Index)

### Problem 5

#### Cross Validate the Model

**10 Points**

You may use the test data to evaluate the model, but we can also cross validate the model using the data object `sf`.  Use `RMSE` to cross validate and assign these to `cross_val_results` below. 

In [None]:
from surprise.model_selection import cross_validate

In [None]:
### GRADED
cross_val_results = cross_validate(model, sf, measures=["RMSE"])

### ANSWER CHECK
print(cross_val_results)

Evaluating RMSE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8765  0.8653  0.8765  0.8723  0.8724  0.8726  0.0041  
Fit time          6.45    6.79    6.43    6.84    6.24    6.55    0.23    
Test time         0.19    0.36    0.20    0.31    0.22    0.26    0.07    
{'test_rmse': array([0.87653033, 0.86525245, 0.87653556, 0.87225129, 0.87241005]), 'fit_time': (6.452810525894165, 6.787092447280884, 6.427112579345703, 6.836379289627075, 6.240561008453369), 'test_time': (0.18954253196716309, 0.3618159294128418, 0.1995084285736084, 0.31185436248779297, 0.21920013427734375)}
