# Implementing Recommender Systems - Lab

## Introduction

In this lab, you'll practice creating a recommender system model using surprise. You'll also get the chance to create a more complete recommender system pipeline to obtain the top recommendations for a specific user.


## Objectives
You will be able to:
* Fit a recommender system model to a set of data
* Create a function that will return the top recommendations for a user
* Introduce a new user to a rating matrix and make recommendations for them

For this lab, we will be using the famous 1M movie dataset. It contains a collection of user ratings for many different movies. In the last  lesson, you got exposed to working with Surprise datasets. In this lab, you will also go through the process of reading in a dataset into the Surprise dataset format. To begin with, load the dataset into a pandas dataframe. Determine which columns are necessary for your recommendation system and drop any extraneous ones.

In [1]:
import pandas as pd
df = pd.read_csv('./ml-latest-small/ratings.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100836 entries, 0 to 100835
Data columns (total 4 columns):
userId       100836 non-null int64
movieId      100836 non-null int64
rating       100836 non-null float64
timestamp    100836 non-null int64
dtypes: float64(1), int64(3)
memory usage: 3.1 MB


In [2]:
#drop unnecessary columns
new_df=df.drop(columns='timestamp')

It's now time to transform the dataset into something compatible with Surprise. In order to do this, you're going to need `Reader` and `Dataset` classes. There's a method in `Dataset` specifically for loading dataframes.

In [3]:
from surprise import Reader, Dataset
reader = Reader()
data = Dataset.load_from_df(new_df,reader)

Let's look at how many users and items we have in our dataset. If using neighborhood-based methods, this will help us determine whether or not we should perform user-user or item-item similarity

In [4]:
dataset = data.build_full_trainset()
print('Number of users: ',dataset.n_users,'\n')
print('Number of items: ',dataset.n_items)

Number of users:  610 

Number of items:  9724


## Determine the Best Model
Now, compare the different models and see which ones perform best. For consistency sake, use RMSE to evaluate models. Remember to cross-validate! Can you get a model with a higher average RMSE on test data than 0.869?

In [5]:
# importing relevant libraries
from surprise.model_selection import cross_validate
from surprise.prediction_algorithms import SVD
from surprise.prediction_algorithms import KNNWithMeans, KNNBasic, KNNBaseline
from surprise.model_selection import GridSearchCV
import numpy as np

In [6]:
## Perform a gridsearch with SVD
params = {'n_factors' :[20, 50, 100],
         'reg_all':[0.02, .05, 0.1]}
g_s_svd = GridSearchCV(SVD,param_grid=params,n_jobs=-1)
g_s_svd.fit(data)


In [7]:
print(g_s_svd.best_score)
print(g_s_svd.best_params)

{'rmse': 0.8686406406496305, 'mae': 0.6676277997997128}
{'rmse': {'n_factors': 50, 'reg_all': 0.05}, 'mae': {'n_factors': 50, 'reg_all': 0.05}}


In [8]:
# cross validating with KNNBasic
knn_basic = KNNBasic(sim_options={'name':'pearson','user_based':True})
cv_knn_basic= cross_validate(knn_basic,data,n_jobs=-1)

In [9]:
for i in cv_knn_basic.items():
    print(i)
print('-----------------------')
print(np.mean(cv_knn_basic['test_rmse']))

('test_rmse', array([0.96320397, 0.97966465, 0.98280133, 0.97615425, 0.96385807]))
('test_mae', array([0.74434932, 0.75560264, 0.75877868, 0.7517816 , 0.74427725]))
('fit_time', (0.3370227813720703, 0.3568861484527588, 0.3402731418609619, 0.3506131172180176, 0.35271596908569336))
('test_time', (1.1722910404205322, 1.101898193359375, 1.159066915512085, 1.1417760848999023, 1.1343178749084473))
-----------------------
0.9731364552887186


In [10]:
# cross validating with KNNBaseline
knn_baseline = KNNBaseline(sim_options={'name':'pearson','user_based':True})
cv_knn_baseline = cross_validate(knn_baseline,data)

Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.


In [11]:
for i in cv_knn_baseline.items():
    print(i)

np.mean(cv_knn_baseline['test_rmse'])

('test_rmse', array([0.88104598, 0.86780799, 0.87606128, 0.88092119, 0.87823101]))
('test_mae', array([0.67199015, 0.6620757 , 0.6707229 , 0.67179252, 0.67223474]))
('fit_time', (0.6505217552185059, 0.5830690860748291, 0.5861270427703857, 0.6293549537658691, 0.5915780067443848))
('test_time', (1.715900182723999, 1.687485933303833, 1.6478428840637207, 1.6702032089233398, 1.7350661754608154))


0.8768134891456789

Based off these outputs, it seems like the best performing model is the SVD model with n_factors = 50 and a regularization rate of 0.05. Let's use that model to make some predictions. Use that model or if you found one that performs better, feel free to use that.

## Making Recommendations

This next section is going to involve making recommendations, and it's important that the output for the recommendation is interpretable to people. Rather than returning the movie_id values, it would be far more valuable to return the actual title of the movie. As a first step, let's read in the movies to a dataframe and take a peak at what information we have about them.

In [12]:
df_movies = pd.read_csv('./ml-latest-small/movies.csv')

In [13]:
df_movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


## Making simple predictions
Just as a reminder, let's look at how you make a prediction for an individual user and item. First, we'll fit the SVD model we had from before.

In [14]:

svd = SVD(n_factors= 100, reg_all=0.05)
svd.fit(dataset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x11c338eb8>

In [15]:
svd.predict(2,4)

Prediction(uid=2, iid=4, r_ui=None, est=3.023343288592047, details={'was_impossible': False})

This prediction value is a tuple and each of the values within it can be accessed by way of indexing. Now let's put all of our knowledge of recommendation systems to do something interesting: making predictions for a new user!

## Obtaining User Ratings 

It's great that we have working models and everything, but wouldn't it be nice to get to recommendations specifically tailored to your preferences? That's what we'll be doing now. The first step to go let's create a function that allows students to pick randomly selected movies. The function should present users with a movie and ask them to rate it. If they have not seen the movie, they should be able to skip rating it. 

The function `movie_rater` should take as parameters:
* movie_df : DataFrame - a dataframe containing the movie ids, name of movie, and genres
* num : int - number of ratings
* genre : string - a specific genre from which to draw movies

The function returns:
* rating_list : list - a collection of dictionaries in the format of {'userId': int  , 'movieId': int  ,'rating': float  }

#### This function is optional, but fun :) 

In [16]:
def movie_rater(movie_df,num, genre=None):
    userID = 1000
    rating_list = []
    while num > 0:
        if genre:
            movie = movie_df[movie_df['genres'].str.contains(genre)].sample(1)
        else:
            movie = movie_df.sample(1)
        print(movie)
        rating = input('How do you rate this movie on a scale of 1-5, press n if you have not seen :\n')
        if rating == 'n':
            continue
        else:
            rating_one_movie = {'userId':userID,'movieId':movie['movieId'].values[0],'rating':rating}
            rating_list.append(rating_one_movie) 
            num -= 1
    return rating_list
        

In [17]:
user_rating = movie_rater(df_movies,4)

      movieId                 title     genres
6919    64499  Che: Part One (2008)  Drama|War


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId         title genres
5984    36527  Proof (2005)  Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                      title                genres
2618     3503  Solaris (Solyaris) (1972)  Drama|Mystery|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                title         genres
3590     4921  Little Women (1933)  Drama|Romance


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                title  genres
4481     6619  Uptown Girls (2003)  Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId               title          genres
1819     2419  Extremities (1986)  Drama|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                                           title  \
8638   119155  Night at the Museum: Secret of the Tomb (2014)   

                                 genres  
8638  Adventure|Children|Comedy|Fantasy  


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId           title              genres
8252   104841  Gravity (2013)  Action|Sci-Fi|IMAX


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 2


      movieId                          title              genres
1998     2659  It Came from Hollywood (1982)  Comedy|Documentary


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                                  title       genres
6168    44633  Devil and Daniel Johnston, The (2005)  Documentary


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                   title         genres
4075     5812  Far from Heaven (2002)  Drama|Romance


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                    title          genres
4831     7211  People Will Talk (1951)  Comedy|Romance


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                 title                 genres
5889    33237  San Francisco (1936)  Drama|Musical|Romance


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId              title genres
6110    42730  Glory Road (2006)  Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                            title genres
3102     4164  Caveman's Valentine, The (2001)  Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                             title           genres
7935    95690  Some Guy Who Kills People (2011)  Comedy|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                  title            genres
2986     3999  Vertical Limit (2000)  Action|Adventure


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId              title           genres
4362     6379  Wrong Turn (2003)  Horror|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                  title                            genres
3019     4037  House of Games (1987)  Crime|Film-Noir|Mystery|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                            title        genres
3103     4166  Series 7: The Contenders (2001)  Action|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                   title  genres
2622     3507  Odd Couple, The (1968)  Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId            title  genres
3595     4929  Toy, The (1982)  Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId               title                           genres
4030     5700  The Pumaman (1980)  Action|Adventure|Fantasy|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                                              title  \
5556    26701  Patlabor: The Movie (Kidô keisatsu patorebâ: T...   

                                                 genres  
5556  Action|Animation|Crime|Drama|Film-Noir|Mystery...  


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                                title                  genres
5337     8879  Murder on the Orient Express (1974)  Crime|Mystery|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                   title                 genres
6399    50800  Messengers, The (2007)  Drama|Horror|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


     movieId                  title               genres
128      155  Beyond Rangoon (1995)  Adventure|Drama|War


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId              title genres
5389     8982  I Am David (2003)  Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId              title         genres
8345   108156  Ride Along (2014)  Action|Comedy


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


     movieId                           title  \
783     1025  Sword in the Stone, The (1963)   

                                 genres  
783  Animation|Children|Fantasy|Musical  


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId          title                    genres
1644     2193  Willow (1988)  Action|Adventure|Fantasy


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 4


      movieId                                              title  \
6075    41566  Chronicles of Narnia: The Lion, the Witch and ...   

                          genres  
6075  Adventure|Children|Fantasy  


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 4.5


      movieId               title                           genres
3437     4683  Wizard, The (1989)  Adventure|Children|Comedy|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId        title                          genres
1662     2232  Cube (1997)  Horror|Mystery|Sci-Fi|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


     movieId                         title         genres
290      332  Village of the Damned (1995)  Horror|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId            title            genres
4502     6664  Commando (1985)  Action|Adventure


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


     movieId              title          genres
543      640  Diabolique (1996)  Drama|Thriller


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                                            title  \
6472    52730  It's a Very Merry Muppet Christmas Movie (2002)   

               genres  
6472  Children|Comedy  


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                                              title  \
1549     2085  101 Dalmatians (One Hundred and One Dalmatians...   

                            genres  
1549  Adventure|Animation|Children  


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                                 title  \
8657   120827  The Hound of the Baskervilles (1988)   

                          genres  
8657  Crime|Drama|Horror|Mystery  


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId             title                genres
7621    87234  Submarine (2010)  Comedy|Drama|Romance


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId               title                  genres
8286   105954  All Is Lost (2013)  Action|Adventure|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                          title                  genres
8574   116887  Exodus: Gods and Kings (2014)  Action|Adventure|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId             title               genres
9065   142420  High Rise (2015)  Action|Drama|Sci-Fi


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                        title        genres
7539    84950  Take Me Home Tonight (2011)  Comedy|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                      title        genres
9425   166015  The African Doctor (2016)  Comedy|Drama


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                                              title  genres
1036     1348  Nosferatu (Nosferatu, eine Symphonie des Graue...  Horror


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 n


      movieId                 title                               genres
9708   187541  Incredibles 2 (2018)  Action|Adventure|Animation|Children


How do you rate this movie on a scale of 1-5, press n if you have not seen :
 5


If you're struggling to come up with the above function, you can use this list of user ratings to complete the next segment

In [18]:
user_rating

[{'userId': 1000, 'movieId': 104841, 'rating': '2'},
 {'userId': 1000, 'movieId': 2193, 'rating': '4'},
 {'userId': 1000, 'movieId': 41566, 'rating': '4.5'},
 {'userId': 1000, 'movieId': 187541, 'rating': '5'}]

### Making Predictions With the New Ratings
Now that you have new ratings, you can use them to make predictions for this new user. The proper way this should work is:

* add the new ratings to the original ratings DataFrame, read into a Surprise dataset
* train a model using the new combined DataFrame
* make predictions for the user
* order those predictions from highest rated to lowest rated
* return the top n recommendations with the text of the actual movie (rather than just the index number)

In [19]:
## add the new ratings to the original ratings DataFrame
new_ratings_df = new_df.append(user_rating,ignore_index=True)
new_data = Dataset.load_from_df(new_ratings_df,reader)

In [20]:
# train a model using the new combined DataFrame
svd_ = SVD(n_factors= 50, reg_all=0.05)
svd_.fit(new_data.build_full_trainset())

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x11c3387b8>

In [21]:
# make predictions for the user
# you'll probably want to create a list of tuples in the format (movie_id, predicted_score)
list_of_movies = []
for m_id in new_df['movieId'].unique():
    list_of_movies.append( (m_id,svd_.predict(1000,m_id)[3]))


In [22]:
# order the predictions from highest to lowest rated

ranked_movies = sorted(list_of_movies,key=lambda x:x[1],reverse=True)

 For the final component of this challenge, it could be useful to create a function `recommended_movies` that takes in the parameters:
* `user_ratings` : list - list of tuples formulated as (user_id, movie_id) (should be in order of best to worst for this individual)
* `movie_title_df` : DataFrame 
* `n` : int- number of recommended movies 

The function should use a for loop to print out each recommended *n* movies in order from best to worst

In [23]:
# return the top n recommendations using the 
def recommended_movies(user_ratings,movie_title_df,n):
        for idx, rec in enumerate(user_ratings):
            title = movie_title_df.loc[movie_title_df['movieId'] == int(rec[0])]['title']
            print('Recommendation # ',idx+1,': ',title,'\n')
            n-= 1
            if n == 0:
                break
            
recommended_movies(ranked_movies,df_movies,5)

Recommendation #  1 :  841    Streetcar Named Desire, A (1951)
Name: title, dtype: object 

Recommendation #  2 :  602    Dr. Strangelove or: How I Learned to Stop Worr...
Name: title, dtype: object 

Recommendation #  3 :  909    Apocalypse Now (1979)
Name: title, dtype: object 

Recommendation #  4 :  906    Lawrence of Arabia (1962)
Name: title, dtype: object 

Recommendation #  5 :  680    Philadelphia Story, The (1940)
Name: title, dtype: object 



## Level Up

* Try and chain all of the steps together into one function that asks users for ratings for a certain number of movies, then all of the above steps are performed to return the top n recommendations
* Make a recommender system that only returns items that come from a specified genre

## Summary

In this lab, you got the change to implement a collaborative filtering model as well as retrieve recommendations from that model. You also got the opportunity to add your own recommendations to the system to get new recommendations for yourself! Next, you will get exposed to using spark to make recommender systems.