### Required Codio Assignment 19.2: Using SURPRISE

**Expected Time = 60 minutes**

**Total Points = 50**

This activity focuses on using the `Surprise` library to predict user ratings.  You will use a dataset derived from the movieLens data -- a common benchmark for recommendation algorithms.  Using `Surprise` you will load the data, create a train set and test set, make predictions for a test set, and cross validate the model on the dataset. 

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)

In [1]:
from surprise import Dataset, Reader, SVD
import pandas as pd
import numpy as np

### The Data

The data is derived from the MovieLens data [here](https://grouplens.org/datasets/movielens/).  The original dataset has been sampled so the processing is faster.

The dataframe contain information about the user, movie, and the associated ratings when they exist.

In [2]:
movie_ratings = pd.read_csv('data/movie_ratings.csv', index_col=0)

In [3]:
movie_ratings.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5


[Back to top](#-Index)

### Problem 1

#### Loading a Dataset

**10 Points**

Extract the columns `userId`, `title`, and `rating` from the `movie_ratings` dataframe and assign them to the variable `a`.

Initialize a `Reader` object, specifying that the ratings are on a scale from 0 to 5 and assign this result to `reader `. Next, use the `Dataset` object to convert the selected dataframe `a` into the format expected by `Surprise` using the `reader` object. Assign this result to `sf`.

Finally, use the `build_full_trainset` function on `sf` to build the full training set from the dataset, making it ready for training a recommendation algorithm. Assign this result to `train`.


In [4]:
### GRADED
reader = ''
sf = ''
train = ''


# YOUR CODE HERE
a = movie_ratings[['userId', 'title', 'rating']]
reader = Reader(rating_scale=(0, 5))
sf = Dataset.load_from_df(a, reader)
train = sf.build_full_trainset()

### ANSWER CHECK
print(type(sf))
print(type(train))

<class 'surprise.dataset.DatasetAutoFolds'>
<class 'surprise.trainset.Trainset'>


[Back to top](#-Index)

### Problem 2

#### Instantiate the `SVD` model

**10 Points**

Below, create an `SVD` object with 2 factors and assign it as `model` below.

In [5]:
### GRADED
model = ''


# YOUR CODE HERE
model = SVD(n_factors=2)

### ANSWER CHECK
print(model.n_factors)

2


[Back to top](#-Index)

### Problem 3

### Fitting the Model

**10 Points**

Below, fit `model` on the training data `train`. 

In [6]:
### GRADED
#fit your model below. No variable needs to be assigned.


# YOUR CODE HERE
model.fit(train)

### ANSWER CHECK
print(model)

<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x1378b7650>


[Back to top](#-Index)

### Problem 4

### Making Predictions

**10 Points**

Use the `build_testset` function on `train` to build a testset named `test`. Next, use `test` to create a list of predictions for the testset.  Assign the result to `predictions_list` below.

In [7]:
### GRADED
test = ''
predictions_list = ''


# YOUR CODE HERE
test = train.build_testset()
predictions_list = model.test(test)

### ANSWER CHECK
print(predictions_list[:5])

[Prediction(uid=1, iid='Toy Story (1995)', r_ui=4.0, est=4.702848699875371, details={'was_impossible': False}), Prediction(uid=1, iid='Grumpier Old Men (1995)', r_ui=4.0, est=4.0579502169927535, details={'was_impossible': False}), Prediction(uid=1, iid='Heat (1995)', r_ui=4.0, est=4.741655353170765, details={'was_impossible': False}), Prediction(uid=1, iid='Seven (a.k.a. Se7en) (1995)', r_ui=5.0, est=4.815078083414714, details={'was_impossible': False}), Prediction(uid=1, iid='Usual Suspects, The (1995)', r_ui=5.0, est=5, details={'was_impossible': False})]


[Back to top](#-Index)

### Problem 5

#### Cross Validate the Model

**10 Points**

You may use the test data to evaluate the model, as well as also cross validate the model using the data object `sf`. 

In the code cell below, use the `cross_validate` function to calculate the RMSE of the model. Assign the result to `cross_val_results` below. 

In [8]:
from surprise.model_selection import cross_validate

In [9]:
### GRADED
cross_val_results = ''


# YOUR CODE HERE
cross_val_results = cross_validate(model, sf, measures=['RMSE'], cv=5, verbose=True)

### ANSWER CHECK
print(cross_val_results)

Evaluating RMSE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8735  0.8687  0.8630  0.8727  0.8711  0.8698  0.0038  
Fit time          0.17    0.17    0.17    0.16    0.18    0.17    0.01    
Test time         0.04    0.04    0.04    0.04    0.04    0.04    0.00    
{'test_rmse': array([0.87352313, 0.86867322, 0.8629549 , 0.8726912 , 0.87108131]), 'fit_time': (0.17071104049682617, 0.17007899284362793, 0.16532516479492188, 0.15677118301391602, 0.18244409561157227), 'test_time': (0.040605783462524414, 0.03585195541381836, 0.04009294509887695, 0.0357818603515625, 0.03801274299621582)}


# Using Surprise for Recommendation Systems - Summary

This notebook demonstrates how to implement a basic recommendation system using the Surprise library with MovieLens movie ratings data. The exercises walk through the complete workflow of building and evaluating a recommendation system.

## Key Steps Covered
1. Data Preparation
   - Loading MovieLens data into pandas
   - Converting data to Surprise's expected format using Reader and Dataset objects
   - Building training datasets

2. Model Implementation
   - Creating an SVD (Singular Value Decomposition) model
   - Setting model parameters (factors)
   - Training the model on prepared data

3. Prediction & Evaluation
   - Generating predictions on test data
   - Implementing cross-validation
   - Evaluating model performance using RMSE

## Key Takeaways
- Surprise requires specific data formatting through Reader and Dataset objects
- SVD is a common algorithm for recommendation systems
- Model evaluation can be done through both direct predictions and cross-validation
- The workflow demonstrates both training/testing splits and k-fold cross-validation approaches

This exercise provides hands-on experience with the fundamental components of building a recommendation system, from data preparation through model evaluation.