# Recommendation System

In this lab, we will use a python package named [Surprise](http://surpriselib.com/), which is an easy-to-use Python scikit for recommendation systems. It includes several commonly used algorithms, including [collaborative filtering](https://surprise.readthedocs.io/en/stable/knn_inspired.html) and [Matrix Factorization-based algorithms](https://surprise.readthedocs.io/en/stable/matrix_factorization.html).

In [1]:
# # install packages
# import sys

#!pip3 install scikit-surprise

In [2]:
from surprise.prediction_algorithms.matrix_factorization import SVD
from surprise.prediction_algorithms.knns import KNNBasic
from surprise.prediction_algorithms.knns import KNNWithMeans
from surprise.prediction_algorithms.knns import KNNBaseline
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
from surprise.model_selection import GridSearchCV

-----

## Load data from package surprise 

First, we can download the ml-100k dataset included in package surprise. The data will be saved in the .surprise_data folder in your home directory. Use the API in the package to sample random trainset and testset where test set is made of 20% of the ratings.

In [3]:
# Load the movielens-100k dataset (download it if needed) and split the data into 
data = Dataset.load_builtin('ml-100k')

# sample random trainset and testset where test set is made of 20% of the ratings.
trainset, testset = train_test_split(data, test_size=0.20)

In [4]:
print("Number of users: {}".format(trainset.n_users))
print("Number of items: {}".format(trainset.n_items))
print("Number of ratings: {}".format(trainset.n_ratings))

Number of users: 943
Number of items: 1644
Number of ratings: 80000


-----

## Collaborative Filtering

First, we will apply three different flavors of collaborative filtering to this data and evaluate their performances using RMSE and MAE. For each of these algorithms, the actual number of neighbors that are aggregated to compute an estimation is necessarily less than or equal to `𝑘`.

### The basic collaborative filtering algorithm

**TODO**: You will study the [KNNBasic](https://surprise.readthedocs.io/en/stable/knn_inspired.html) API, choose the number of neighbors and the similarity measure, train the model based on training dataset and make predictions on the test dataset. Finally, you will evaluate the model performance based on RMSE and MAE. 

Try to play around with the different number of neighbors in the algorithm as well as the different similarity measure and see how it impacts the model performance.

In [5]:
# Use the basic collaborative filtering algorithm. 
# See https://surprise.readthedocs.io/en/stable/knn_inspired.html for more details.

# TODO
# Reference: https://realpython.com/build-recommendation-engine-collaborative-filtering/#:~:text=Collaborative%20filtering%20is%20a%20family,on%20ratings%20of%20similar%20users.&text=It%20is%20calculated%20only%20on,user%20gives%20to%20an%20item.

sim_options = {
    "name": ["msd", "cosine", "pearson", "pearson_baseline"],
    "user_based": [False],
}
param_grid = {"sim_options":sim_options}
knnbasic = GridSearchCV(KNNBasic, param_grid, measures=["rmse", "mae"],refit=True)
knnbasic.fit(data)

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson si

In [7]:
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(knnbasic.best_score)
pp.pprint(knnbasic.best_params)

{'mae': 0.7704533026794103, 'rmse': 0.974906783491272}
{   'mae': {'sim_options': {'name': 'msd', 'user_based': False}},
    'rmse': {'sim_options': {'name': 'msd', 'user_based': False}}}


In [8]:
knnbasic.predict('A', 1)

Prediction(uid='E', iid=2, r_ui=None, est=3.52986, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'})

### The basic collaborative filtering algorithm with user mean ratings

**TODO**: A variation of the basic CF model is to take into account the mean ratings of each user. You will study the [KNNWithMeans](https://surprise.readthedocs.io/en/stable/knn_inspired.html) API, choose the number of neighbors and the similarity measure, train the model based on training dataset and make predictions on the test dataset. Finally, you will evaluate the model performance based on RMSE and MAE. 

Try to play around with the different number of neighbors in the algorithm as well as the different similarity measure and see how it impacts the model performance.

In [9]:
# Use the basic collaborative filtering algorithm, taking into account the mean ratings of each user.
# See https://surprise.readthedocs.io/en/stable/knn_inspired.html for more details.

# TODO
sim_options = {
    "name": ["msd", "cosine", "pearson", "pearson_baseline"],
    "user_based": [True],
    "min_support": [3, 4, 5],
}
param_grid = {"sim_options":sim_options}
knnmeans = GridSearchCV(KNNWithMeans, param_grid, measures=["rmse", "mae"],refit=True)
knnmeans.fit(data)

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computi

In [10]:
pp.pprint(knnmeans.best_score)
pp.pprint(knnmeans.best_params)

{'mae': 0.729090409403397, 'rmse': 0.9360174316773222}
{   'mae': {   'sim_options': {   'min_support': 4,
                                  'name': 'pearson_baseline',
                                  'user_based': True}},
    'rmse': {   'sim_options': {   'min_support': 3,
                                   'name': 'pearson_baseline',
                                   'user_based': True}}}


In [11]:
knnmeans.predict('A', 1)

Prediction(uid='E', iid=2, r_ui=None, est=3.52986, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'})

-----

## Matrix Factorization

Then, we will explore the matrix factorization techniques for recommendation. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices. The famous SVD algorithm for matrix factorization is popularized by Simon Funk during the Netflix Prize. 

**TODO**: in this task, you will use the famous SVD algorithm for the implementation of the matrix factorization modeo. You will study the [SVD](https://surprise.readthedocs.io/en/stable/matrix_factorization.html) API, choose the number of neighbors and the similarity measure, train the model based on training dataset and make predictions on the test dataset. Finally, you will evaluate the model performance based on RMSE and MAE. 

Try to play around with different number of factors and also try the [SVD++ algorithm](https://surprise.readthedocs.io/en/stable/matrix_factorization.html) and [Non-negative Matrix Factorization](https://surprise.readthedocs.io/en/stable/matrix_factorization.html) to see if you can imporve the model preformance.

In [12]:
# We'll use the famous SVD algorithm.

# TODO
param_grid = {
    "n_epochs": [5, 10],
    "lr_all": [0.002, 0.005],
    "reg_all": [0.4, 0.6]
}
svd = GridSearchCV(SVD, param_grid, measures=["rmse", "mae"],refit=True)
svd.fit(data)

In [13]:
pp.pprint(svd.best_score)
pp.pprint(svd.best_params)

{'mae': 0.770307947978311, 'rmse': 0.9611619892993749}
{   'mae': {'lr_all': 0.005, 'n_epochs': 10, 'reg_all': 0.4},
    'rmse': {'lr_all': 0.005, 'n_epochs': 10, 'reg_all': 0.4}}


In [17]:
svd.predict('A', 1)

Prediction(uid='A', iid=1, r_ui=None, est=3.52986, details={'was_impossible': False})

## [BONUS] 
Implement your own version of User-User or Item-Item Collaborative Filtering and compare its performance against the surprise package's implementation.

In [None]:
# TODO

# End of Lab: Recommendation System

This week I learned that recommendation systems are divided into collaborative, content-based, and knowledge based. Collaborative systems use community/peer/friends data to make recommendations. Content-based systems recommend based on things liked/seen in the past. Knowledge-based systems recommend based on my information demands.