### Codio Activity 19.7: Hybrid Recommendations with Surprise

**Expected Time = 90 minutes**

**Total Points = 50**

This activity introduces the idea of using hybrid recommendations with the Surprise library.  Below, you will combine different algorithms predictions to create these hybrid recommendations.  You are again to use the `SVD` algorithm and will combine with the `KNNBasic` algorithm for hybrid recommendations.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)


In [32]:
! pip install scikit-surprise


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [33]:
import pandas as pd
from surprise import Reader, SVD, Dataset, NormalPredictor, KNNBasic
from surprise.model_selection import cross_validate

#### The Data

Again you will use a small set of data from Movie Lens.  The data is loaded and displayed below.

In [34]:
df = pd.read_csv("./data/movie_ratings.csv", index_col=0).tail(5_000)

In [35]:
df.head()

Unnamed: 0,movieId,title,userId,rating
95836,103335,Despicable Me 2 (2013),509,3.5
95837,103335,Despicable Me 2 (2013),534,4.0
95838,103335,Despicable Me 2 (2013),567,0.5
95839,103335,Despicable Me 2 (2013),586,4.5
95840,103339,White House Down (2013),10,4.0


[Back to top](#-Index)

### Problem 1

#### Loading the Data 

**10 Points**

Below, use the `Reader` and `Dataset` objects to prepare the data for Surprise using the `title`, `userId` and `rating` columns in order.  Assign these to `reader` and `data` below.  

Create a train and test dataset, assign to `train` and `test` respectively.

In [36]:
### GRADED
reader = Reader(line_format="item user rating")
data = Dataset.load_from_df(df[["title", "userId", "rating"]], reader)
train = data.build_full_trainset()
test = train.build_testset()

### ANSWER CHECK
print(type(train))
print(type(test))

<class 'surprise.trainset.Trainset'>
<class 'list'>


[Back to top](#-Index)

### Problem 2

#### SVD Model

**10 Points**

Now, create and fit an `SVD` model as `svd` below.  Make predictions using the model on the test set and assign these to `svd_preds` below.  Be sure to set `random_state = 42` in the `SVD` algorithm.

In [37]:
### GRADED
svd = SVD(random_state=42)
svd.fit(train)
svd_preds = svd.test(test)

### ANSWER CHECK
svd_preds[:5]

[Prediction(uid='Despicable Me 2 (2013)', iid=509, r_ui=3.5, est=3.402965276329225, details={'was_impossible': False}),
 Prediction(uid='Despicable Me 2 (2013)', iid=534, r_ui=4.0, est=3.5071187963564667, details={'was_impossible': False}),
 Prediction(uid='Despicable Me 2 (2013)', iid=567, r_ui=0.5, est=1.341166074407866, details={'was_impossible': False}),
 Prediction(uid='Despicable Me 2 (2013)', iid=586, r_ui=4.5, est=4.475855505329594, details={'was_impossible': False}),
 Prediction(uid='White House Down (2013)', iid=10, r_ui=4.0, est=3.531689923032194, details={'was_impossible': False})]

[Back to top](#-Index)

### Problem 3

#### SlopeOne Model

**10 Points**

Next, create a `SlopeOne` model below as `slope_one`.  Fit this on the train and assign test set predictions to `slope_one_preds` below. 

In [38]:
from surprise import SlopeOne

In [39]:
### GRADED
slope_one = SlopeOne()
slope_one.fit(train)
slope_one_preds = slope_one.test(test)

### ANSWER CHECK
slope_one_preds[:5]

[Prediction(uid='Despicable Me 2 (2013)', iid=509, r_ui=3.5, est=3.303717320261438, details={'was_impossible': False}),
 Prediction(uid='Despicable Me 2 (2013)', iid=534, r_ui=4.0, est=3.1084280303030303, details={'was_impossible': False}),
 Prediction(uid='Despicable Me 2 (2013)', iid=567, r_ui=0.5, est=2.0097296494355317, details={'was_impossible': False}),
 Prediction(uid='Despicable Me 2 (2013)', iid=586, r_ui=4.5, est=4.078125, details={'was_impossible': False}),
 Prediction(uid='White House Down (2013)', iid=10, r_ui=4.0, est=2.4237373737373735, details={'was_impossible': False})]

[Back to top](#-Index)

### Problem 4

#### Hybrid Predictions

**10 Points**

Now, use both the slope one and svd predictions to average the predicted values for each user as new predictions.  Assign your results to the list `hybrid_preds` below.

In [40]:
### GRADED
hybrid_preds = [
    0.5 * svd_preds[k].est + 0.5 * slope_one_preds[k].est for k in range(len(test))
]

### ANSWER CHECK
hybrid_preds[:5]

[3.3533412982953315,
 3.3077734133297483,
 1.6754478619216988,
 4.276990252664797,
 2.9777136483847837]

[Back to top](#-Index)

### Problem 5

#### DataFrame of predictions

**10 Points**

Finally, create a DataFrame consisting of the user id, movie, and predicted hybrid ratings as `hybrid_df` below.  The table should begin as:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>Title</th>      <th>user_id</th>      <th>hybrid_rating</th>      <th>svd_rating</th>      <th>slope_one_rating</th>    </tr>  </thead>  <tbody>    <tr>      <th>0</th>      <td>Toy Story (1995)</td>      <td>1</td>      <td>4.482524</td>      <td>4.402274</td>      <td>4.562774</td>    </tr>    <tr>      <th>1</th>      <td>Toy Story (1995)</td>      <td>5</td>      <td>3.954242</td>      <td>4.032047</td>      <td>3.876437</td>    </tr>    <tr>      <th>2</th>      <td>Toy Story (1995)</td>      <td>7</td>      <td>3.942570</td>      <td>4.118478</td>      <td>3.766662</td>    </tr>    <tr>      <th>3</th>      <td>Toy Story (1995)</td>      <td>15</td>      <td>3.425153</td>      <td>3.298924</td>      <td>3.551382</td>    </tr>    <tr>      <th>4</th>      <td>Toy Story (1995)</td>      <td>17</td>      <td>4.152070</td>      <td>4.191087</td>      <td>4.113053</td>    </tr>  </tbody></table>

In [43]:
### GRADED
# First attempt - seemingly correct, but they are grading a wrong definition
hybrid_df = pd.DataFrame(
    {
        "Title": df["title"],
        "user_id": df["userId"],
        "hybrid_rating": hybrid_preds,
        "svd_rating": [svd_preds[k].est for k in range(len(svd_preds))],
        "slope_one_rating": [
            slope_one_preds[k].est for k in range(len(slope_one_preds))
        ],
    }
).reset_index(drop=True)

# Second attempt - title column is lower-cased and the title/user ID contents are swapped
hybrid_df = pd.DataFrame(
    {
        "user_id": df["title"],
        "title": df["userId"],
        "hybrid_rating": hybrid_preds,
        "svd_rating": [svd_preds[k].est for k in range(len(svd_preds))],
        "slope_one_rating": [
            slope_one_preds[k].est for k in range(len(slope_one_preds))
        ],
    }
).reset_index(drop=True)


### ANSWER CHECK
hybrid_df

Unnamed: 0,user_id,title,hybrid_rating,svd_rating,slope_one_rating
0,Despicable Me 2 (2013),509,3.353341,3.402965,3.303717
1,Despicable Me 2 (2013),534,3.307773,3.507119,3.108428
2,Despicable Me 2 (2013),567,1.675448,1.341166,2.009730
3,Despicable Me 2 (2013),586,4.276990,4.475856,4.078125
4,White House Down (2013),10,2.977714,3.531690,2.423737
...,...,...,...,...,...
4995,Black Butler: Book of the Atlantic (2017),184,3.953240,3.906480,4.000000
4996,No Game No Life: Zero (2017),184,3.509900,3.519799,3.500000
4997,Flint (2017),184,3.556615,3.613230,3.500000
4998,Bungo Stray Dogs: Dead Apple (2018),184,3.450516,3.401031,3.500000


### Conclusion

There are many further steps with hybrid recommendations including writing a custom algorithm object with `Surprise`.  Note that you can incorporate the similarity of the objects much like we had in our distance based recommendations.