# Recommendation Systems based on explicit feedback

## Preparation
* python packages - see requirements.txt
* dataset - you can use the movielens small, which contains 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. It was updated on 9/2018.

To import movielens small you can use the surprise built-in dataset utils:

In [4]:
from surprise import Dataset
data = Dataset.load_builtin('ml-100k')

or alternatively download the movielens dataset from https://files.grouplens.org/datasets/movielens/ml-latest-small.zip and import it in surprise with:

In [17]:
import pandas as pd
df = pd.read_csv ("ml-latest-small/ratings.csv")
df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [18]:
df.drop(['timestamp'], axis=1, inplace=True)

In [20]:
from surprise import Reader
from surprise import Dataset

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)

## Explorative Data Analysis

## Algorithms

## Evaluation

There exist multiple metrics to evaluate quality of recommendation systems:
* accuracy metrics (root mean square error, mean absolute error), i.e., based on the difference between the predicted rating and the actual rating; RMSE is what was also used for the netflix 1 Million prize. However, it turned out that RMSE is not a good evaluation metric for recommendation systems, given that it focuses too much on rating prediction rather than on which contents are most likely to be useful for the user.
* hit rate is the frequency of conversion of a recommended result into a click, a rating, or a purchase. For evaluation purposes, this can be computed as the ratio $\frac{hits}{users}$.
* average reciprocal hit rank (ARHR) is a measure similar to hit rate but it can be used for top-n recommenders to evaluate the actual rank of predicted items, given that it is computed as $\frac{\sum_{i=1}^{n} \frac{1}{rank_i}}{users}$.
* coverage gives a sense of how much of the catalog is returned by the recommendation and how quickly new content can appear in the results
* novelty tells how popular are results in the returned recommendation, given the strong hypothesis that users are familiar with popular content and may tend to reject unknown ones. Popularity tends to be a good indication of success and therefore exploits user trust in the system, while coverage imply exploring new things user may like but may also be less known.
* diversity tells how different are results in the returned recommendation, that is 1 - S, with S being the average similarity between all recommendation pairs in the returned recommendation;
* churn indicates how often recommendations change, that is how quickly the results change upon user actions;

While the presented metrics are good evaluation indicators, results from A/B tests will matter most as they indicate the quality of the system on actual users, rather than the test set. This is both because the test set may not be representative enough, as well as because A/B tests allow for the collection of actual sales data, which is what matter the most in the evaluation.

The surprise accuracy package provides a number of evaluation metrics:
* rmse	Compute RMSE (Root Mean Squared Error).
* mse	Compute MSE (Mean Squared Error).
* mae	Compute MAE (Mean Absolute Error).
* fcp	Compute FCP (Fraction of Concordant Pairs).

In [21]:
# https://surprise.readthedocs.io/en/stable/getting_started.html#train-test-split-and-the-fit-method
from surprise import accuracy
from surprise.model_selection import train_test_split

# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=.25)

# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

RMSE: 0.8757


0.8757482354259867

In [23]:
algo = KNNBasic()
algo.fit(trainset)
predictions = algo.test(testset)
accuracy.rmse(predictions)

Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9542


0.9541696731691143

## Benchmarking algorithm performance

In [24]:
# https://surprise.readthedocs.io/en/stable/getting_started.html#automatic-cross-validation
from surprise import SVD

from surprise.model_selection import cross_validate

# We'll use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8786  0.8631  0.8686  0.8787  0.8754  0.8729  0.0061  
MAE (testset)     0.6768  0.6631  0.6676  0.6740  0.6735  0.6710  0.0050  
Fit time          4.27    4.20    4.22    4.11    4.06    4.17    0.08    
Test time         0.27    0.11    0.11    0.11    0.12    0.14    0.07    


{'test_rmse': array([0.87857721, 0.86310031, 0.86864053, 0.87866605, 0.87539098]),
 'test_mae': array([0.67680962, 0.66308103, 0.66758322, 0.67396526, 0.67349298]),
 'fit_time': (4.271094083786011,
  4.204852104187012,
  4.21667218208313,
  4.109671115875244,
  4.059509038925171),
 'test_time': (0.2738502025604248,
  0.10553693771362305,
  0.10870695114135742,
  0.10980987548828125,
  0.11524701118469238)}

## References
  * http://surprise.readthedocs.io/en/stable/
  * http://media.sundog-soft.com/RecSys/RecSys.pdf