# Implementing Recommender Systems - Lab

## Introduction

In this lab, you'll practice creating a recommender system model using surprise. You'll also get the chance to create a more complete recommender system pipeline to obtain the top recommendations for a specific user.


## Objectives
You will be able to:
* Fit a recommender system model to a set of data
* Create a function that will return the top recommendations for a user
* Introduce a new user to a rating matrix and make recommendations for them

For this lab, we will be using the famous 1M movie dataset. It contains a collection of user ratings for many different movies. In the last  lesson, you got exposed to working with Surprise datasets. In this lab, you will also go through the process of reading in a dataset into the Surprise dataset format. To begin with, load the dataset into a pandas dataframe. Determine which columns are necessary for your recommendation system and drop any extraneous ones.

In [1]:
import pandas as pd
df = pd.read_csv('./ml-latest-small/ratings.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100836 entries, 0 to 100835
Data columns (total 4 columns):
userId       100836 non-null int64
movieId      100836 non-null int64
rating       100836 non-null float64
timestamp    100836 non-null int64
dtypes: float64(1), int64(3)
memory usage: 3.1 MB


In [2]:
#drop unnecessary columns
new_df = df.drop('timestamp', axis=1)
new_df.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


It's now time to transform the dataset into something compatible with Surprise. In order to do this, you're going to need `Reader` and `Dataset` classes. There's a method in `Dataset` specifically for loading dataframes.

In [3]:
from surprise import Reader, Dataset

# read in values as Surprise dataset 
reader = Reader()
data = Dataset.load_from_df(new_df, reader)

Let's look at how many users and items we have in our dataset. If using neighborhood-based methods, this will help us determine whether or not we should perform user-user or item-item similarity

In [4]:
dataset = data.build_full_trainset()
print('Number of users: ',dataset.n_users)
print('Number of items: ',dataset.n_items)

Number of users:  610
Number of items:  9724


## Determine the Best Model
Now, compare the different models and see which ones perform best. For consistency sake, use RMSE to evaluate models. Remember to cross-validate! Can you get a model with a higher average RMSE on test data than 0.869?

In [5]:
# importing relevant libraries
from surprise.model_selection import cross_validate
from surprise.prediction_algorithms import SVD
from surprise.prediction_algorithms import KNNWithMeans, KNNBasic, KNNBaseline
from surprise.model_selection import GridSearchCV
import numpy as np

In [None]:
## Perform a gridsearch with SVD
params = {'n_factors' :[20, 50, 100],
         'reg_all':[0.02, 0.05, 0.1]}
g_s_svd = GridSearchCV(SVD, param_grid=params, n_jobs=-1)
g_s_svd.fit(data)

In [None]:
# print out optimal parameters for SVD after GridSearch
print(g_s_svd.best_score)
print(g_s_svd.best_params)

In [6]:
# cross validating with KNNBasic
knn_basic = KNNBasic(sim_options={'name':'pearson','user_based':True})
cv_knn_basic= cross_validate(knn_basic,data,n_jobs=-1)

In [7]:
# print out the average RMSE score for the test set
for i in cv_knn_basic.items():
    print(i)
print('-----------------------')
print(np.mean(cv_knn_basic['test_rmse']))

('test_rmse', array([0.97536243, 0.97487089, 0.96943037, 0.97721424, 0.96876573]))
('test_mae', array([0.75275869, 0.75176691, 0.74811418, 0.75505071, 0.74974807]))
('fit_time', (0.3779940605163574, 0.37999534606933594, 0.3849964141845703, 0.3709995746612549, 0.3379640579223633))
('test_time', (1.2250373363494873, 1.1830697059631348, 1.1610040664672852, 1.0650453567504883, 1.034999132156372))
-----------------------
0.973128730806397


In [None]:
# cross validating with KNNBaseline
knn_baseline = KNNBaseline(sim_options={'name':'pearson', 'user_based':True})
cv_knn_baseline= cross_validate(knn_baseline, data, n_jobs=-1)

In [None]:
# print out the average score for the test set
for i in cv_knn_baseline.items():
    print(i)
print('-----------------------')
print(np.mean(cv_knn_baseline['test_rmse']))

Based off these outputs, it seems like the best performing model is the SVD model with n_factors = 50 and a regularization rate of 0.05. Let's use that model to make some predictions. Use that model or if you found one that performs better, feel free to use that.

## Making Recommendations

This next section is going to involve making recommendations, and it's important that the output for the recommendation is interpretable to people. Rather than returning the movie_id values, it would be far more valuable to return the actual title of the movie. As a first step, let's read in the movies to a dataframe and take a peek at what information we have about them.

In [8]:
df_movies = pd.read_csv('./ml-latest-small/movies.csv')

In [9]:
df_movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## Making simple predictions
Just as a reminder, let's look at how you make a prediction for an individual user and item. First, we'll fit the SVD model we had from before.

In [22]:
svd = SVD(n_factors= 50, reg_all=0.05)
svd.fit(dataset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1e592f70358>

In [23]:
svd.predict(2,4)

Prediction(uid=2, iid=4, r_ui=None, est=3.1575135089923005, details={'was_impossible': False})

This prediction value is a tuple and each of the values within it can be accessed by way of indexing. Now let's put all of our knowledge of recommendation systems to do something interesting: making predictions for a new user!

## Obtaining User Ratings 

It's great that we have working models and everything, but wouldn't it be nice to get to recommendations specifically tailored to your preferences? That's what we'll be doing now. The first step to go let's create a function that allows students to pick randomly selected movies. The function should present users with a movie and ask them to rate it. If they have not seen the movie, they should be able to skip rating it. 

The function `movie_rater` should take as parameters:
* movie_df : DataFrame - a dataframe containing the movie ids, name of movie, and genres
* num : int - number of ratings
* genre : string - a specific genre from which to draw movies

The function returns:
* rating_list : list - a collection of dictionaries in the format of {'userId': int  , 'movieId': int  ,'rating': float  }

#### This function is optional, but fun :) 

In [15]:
def movie_rater(movie_df, num, genre=None):
    userID = 1000
    rating_list = []
    while num > 0:
        if genre:
            movie = movie_df[movie_df['genres'].str.contains(genre)].sample(1)
        else:
            movie = movie_df.sample(1)
        display(movie)
        rating = input('How do you rate this movie on a scale of 1-5, press n if you have not seen it: ')
        if rating == 'n':
            continue
        else:
            rating_one_movie = {'userId':userID,'movieId':movie['movieId'].values[0],'rating':rating}
            rating_list.append(rating_one_movie) 
            num -= 1
    return rating_list

In [16]:
user_rating = movie_rater(df_movies, 10, 'Action')

Unnamed: 0,movieId,title,genres
9722,189547,Iron Soldier (2010),Action|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
2779,3717,Gone in 60 Seconds (2000),Action|Crime


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 4


Unnamed: 0,movieId,title,genres
7646,88140,Captain America: The First Avenger (2011),Action|Adventure|Sci-Fi|Thriller|War


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 3


Unnamed: 0,movieId,title,genres
9035,141408,Scouts Guide to the Zombie Apocalypse (2015),Action|Comedy|Horror


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
5621,27156,Neon Genesis Evangelion: The End of Evangelion...,Action|Animation|Drama|Fantasy|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
9703,185435,"Game Over, Man! (2018)",Action|Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
2038,2716,Ghostbusters (a.k.a. Ghost Busters) (1984),Action|Comedy|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 5


Unnamed: 0,movieId,title,genres
7162,71810,Legionnaire (1998),Action|Adventure|Drama|War


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 3


Unnamed: 0,movieId,title,genres
1173,1556,Speed 2: Cruise Control (1997),Action|Romance|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
3029,4053,Double Take (2001),Action|Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
431,494,Executive Decision (1996),Action|Adventure|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
7770,91542,Sherlock Holmes: A Game of Shadows (2011),Action|Adventure|Comedy|Crime|Mystery|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
8477,112897,The Expendables 3 (2014),Action|Adventure


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
9287,158254,Kindergarten Cop 2 (2016),Action|Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
9184,149406,Kung Fu Panda 3 (2016),Action|Adventure|Animation


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
2452,3265,Hard-Boiled (Lat sau san taam) (1992),Action|Crime|Drama|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
296,338,Virtuosity (1995),Action|Sci-Fi|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
6425,51412,Next (2007),Action|Sci-Fi|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
7018,68358,Star Trek (2009),Action|Adventure|Sci-Fi|IMAX


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 4


Unnamed: 0,movieId,title,genres
59,66,Lawnmower Man 2: Beyond Cyberspace (1996),Action|Sci-Fi|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
3217,4344,Swordfish (2001),Action|Crime|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 3


Unnamed: 0,movieId,title,genres
1058,1375,Star Trek III: The Search for Spock (1984),Action|Adventure|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 4


Unnamed: 0,movieId,title,genres
7211,72982,Alice (2009),Action|Adventure|Fantasy


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
6731,59037,Speed Racer (2008),Action|Children|Sci-Fi|IMAX


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
5956,34450,Miracles - Mr. Canton and Lady Rose (1989),Action|Comedy|Crime|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
5465,26159,Tokyo Drifter (Tôkyô nagaremono) (1966),Action|Crime|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
6013,37853,Into the Blue (2005),Action|Adventure|Crime|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
4175,6014,National Security (2003),Action|Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
6274,47538,Crime Busters (1977),Action|Adventure|Comedy|Crime


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
5985,36529,Lord of War (2005),Action|Crime|Drama|Thriller|War


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
5057,7925,"Hidden Fortress, The (Kakushi-toride no san-ak...",Action|Adventure


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
8704,123553,In the Name of the King III (2014),Action|Adventure|Drama|Fantasy


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
9713,188301,Ant-Man and the Wasp (2018),Action|Adventure|Comedy|Fantasy|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
5852,32596,Sahara (2005),Action|Adventure|Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
7526,84523,Kill! (Kiru) (1968),Action|Comedy|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
4940,7445,Man on Fire (2004),Action|Crime|Drama|Mystery|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
193,227,Drop Zone (1994),Action|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
800,1047,"Long Kiss Goodnight, The (1996)",Action|Drama|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
6664,57368,Cloverfield (2008),Action|Mystery|Sci-Fi|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 4


Unnamed: 0,movieId,title,genres
8451,112175,How to Train Your Dragon 2 (2014),Action|Adventure|Animation


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
3283,4443,Outland (1981),Action|Sci-Fi|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
7097,70451,Max Manus (2008),Action|Drama|War


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
7399,79895,"Extraordinary Adventures of Adèle Blanc-Sec, T...",Action|Adventure|Fantasy|Mystery


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
615,780,Independence Day (a.k.a. ID4) (1996),Action|Adventure|Sci-Fi|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 5


Unnamed: 0,movieId,title,genres
7063,69524,Raiders of the Lost Ark: The Adaptation (1989),Action|Adventure|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
9330,160527,Sympathy for the Underdog (1971),Action|Crime|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
8597,117646,Dragonheart 2: A New Beginning (2000),Action|Adventure|Comedy|Drama|Fantasy|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
124,151,Rob Roy (1995),Action|Drama|Romance|War


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
7799,92198,Seeking Justice (2011),Action|Drama|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
6825,61210,Mutant Chronicles (2008),Action|Adventure|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen it: n


Unnamed: 0,movieId,title,genres
1979,2628,Star Wars: Episode I - The Phantom Menace (1999),Action|Adventure|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen it: 3


If you're struggling to come up with the above function, you can use this list of user ratings to complete the next segment

In [17]:
user_rating

[{'userId': 1000, 'movieId': 3717, 'rating': '4'},
 {'userId': 1000, 'movieId': 88140, 'rating': '3'},
 {'userId': 1000, 'movieId': 2716, 'rating': '5'},
 {'userId': 1000, 'movieId': 71810, 'rating': '3'},
 {'userId': 1000, 'movieId': 68358, 'rating': '4'},
 {'userId': 1000, 'movieId': 4344, 'rating': '3'},
 {'userId': 1000, 'movieId': 1375, 'rating': '4'},
 {'userId': 1000, 'movieId': 57368, 'rating': '4'},
 {'userId': 1000, 'movieId': 780, 'rating': '5'},
 {'userId': 1000, 'movieId': 2628, 'rating': '3'}]

### Making Predictions With the New Ratings
Now that you have new ratings, you can use them to make predictions for this new user. The proper way this should work is:

* add the new ratings to the original ratings DataFrame, read into a Surprise dataset
* train a model using the new combined DataFrame
* make predictions for the user
* order those predictions from highest rated to lowest rated
* return the top n recommendations with the text of the actual movie (rather than just the index number)

In [18]:
## add the new ratings to the original ratings DataFrame
newnew_df = new_df.append(user_rating, ignore_index=True)
new_data = Dataset.load_from_df(newnew_df, reader)

In [26]:
# train a model using the new combined DataFrame
svd = SVD(n_factors= 50, reg_all=0.05)
svd.fit(new_data)

AttributeError: 'DatasetAutoFolds' object has no attribute 'global_mean'

In [20]:
# make predictions for the user
# you'll probably want to create a list of tuples in the format (movie_id, predicted_score)
user_preds = []
for movie in list(df_movies.movieId.unique()):
    pred = svd.predict(1000, movie)
    user_preds.append((movie, pred[3]))

AttributeError: 'DatasetAutoFolds' object has no attribute 'to_inner_uid'

In [None]:
# order the predictions from highest to lowest rated
ranked_movies = sorted(user_preds, key=lambda x: x[1], reverse=True)

 For the final component of this challenge, it could be useful to create a function `recommended_movies` that takes in the parameters:
* `user_ratings` : list - list of tuples formulated as (user_id, movie_id) (should be in order of best to worst for this individual)
* `movie_title_df` : DataFrame 
* `n` : int- number of recommended movies 

The function should use a for loop to print out each recommended *n* movies in order from best to worst

In [None]:
# return the top n recommendations using the 
def recommended_movies(user_predictions, movie_title_df, n):
    for idx, rec in enumerate(user_predictions):
        title = movie_title_df.loc[movie_title_df.movieId == int(rec[0])].title.values[0]
        print(f"Recommendation #{idx+1}: {title} ")
        n -= 1
        if n == 0:
            break
    pass

            
recommended_movies(ranked_movies, df_movies, 10)

## Level Up

* Try and chain all of the steps together into one function that asks users for ratings for a certain number of movies, then all of the above steps are performed to return the top n recommendations
* Make a recommender system that only returns items that come from a specified genre

In [None]:
def make_movie_recommendations():
    
    
    
    
    
    
    
    
    

## Summary

In this lab, you got the change to implement a collaborative filtering model as well as retrieve recommendations from that model. You also got the opportunity to add your own recommendations to the system to get new recommendations for yourself! Next, you will get exposed to using spark to make recommender systems.