# Étude comparative des frameworks spécialisés en système de recommandation

Nous étudierons principalement trois frameworks :
* Surprise
* LightFM
* Spotlight
* CaseRecommender

[Surprise](http://surpriselib.com/) est un module spécialisé dans les systèmes de recommandation qui permet d'expérimenter rapidement différents algorithmes très utilisés.

[LightFM](https://github.com/lyst/lightfm) est une implémentation Python des principaux algorithmes utiles aux systèmes de recommandation. Il supporte à la fois des retours implicites et explicites de l'utilisateur. Le papier décrivant l'approche de LightFM est disponible [ici](https://arxiv.org/pdf/1507.08439.pdf).

[Spotlight](https://maciejkula.github.io/spotlight/) a une approche différente et repose sur des réseaux de neurones. Il est essentiellement développé par Maciej Kula, chercheur renommé dans le domaine des systèmes de recommandation. Il utilise la librarie PyTorch.

[CaseRecommender](https://github.com/caserec/CaseRecommender) est, dans l'idée, assez proche de LightFM. Il met à disposition un certain nombre d'algorithmes connus de recommandation. Il supporte aussi les retours implicites comme explicites. L'approche de CaseRecommender est disponible [ici](http://www.lbd.dcc.ufmg.br/colecoes/wfa/2016/002.pdf)

L'objectif est d'évaluer la qualité des résultats obtenus et les temps d'exécutions. 

## Surprise

In [1]:
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split

# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin('ml-100k')

# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=.25)

# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
%time algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
print(accuracy.rmse(predictions))

CPU times: user 5.47 s, sys: 37.2 ms, total: 5.51 s
Wall time: 5.57 s
RMSE: 0.9368
0.9367797388116315


## Light FM

In [2]:
import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM
from lightfm.evaluation import precision_at_k

data = fetch_movielens(min_rating=0.5)

model = LightFM(loss='warp')
%time model.fit(data['train'], epochs=30, num_threads=2)

print("Test precision: {:.2f}".format(precision_at_k(model, data['test'], k=5).mean()))



CPU times: user 1.86 s, sys: 7.02 ms, total: 1.87 s
Wall time: 1.88 s
Test precision: 0.12


## Spotlight

In [3]:
from spotlight.cross_validation import random_train_test_split
from spotlight.datasets.movielens import get_movielens_dataset
from spotlight.evaluation import rmse_score
from spotlight.evaluation import precision_recall_score
from spotlight.factorization.explicit import ExplicitFactorizationModel

dataset = get_movielens_dataset(variant='100K')

train, test = random_train_test_split(dataset)

model = ExplicitFactorizationModel(n_iter=1)
%time model.fit(train)

rmse = rmse_score(model, test)
print(rmse)

precision_at_k, recall_at_k = precision_recall_score(model, test, k=5)
print(precision_at_k.mean())

  from ._conv import register_converters as _register_converters


CPU times: user 3.3 s, sys: 279 ms, total: 3.58 s
Wall time: 1.21 s
1.0015708
0.04553191489361702


## CaseRecommender

In [4]:
from caserec.utils.split_database import SplitDatabase
from caserec.recommenders.rating_prediction.itemknn import ItemKNN
from caserec.recommenders.rating_prediction.svd import SVD

In [5]:
db = '/Users/alessandro/lightfm_data/movielens100k/ml-100k/u.data'
folds_path = '/Users/alessandro/lightfm_data/movielens100k/ml-100k/'

# Split the datas
SplitDatabase(input_file=db, dir_folds=folds_path, n_splits=1).shuffle_split(test_size=0.1)
tr = '/Users/alessandro/Desktop/ml-100k/folds/0/train.dat'
te = '/Users/alessandro/Desktop/ml-100k/folds/0/test.dat'

# Run Rating Prediction Algorithm
ItemKNN(tr, te).compute()
SVD(tr, te).compute()




[Case Recommender: Rating Prediction > ItemKNN Algorithm]

train data:: 943 users and 1659 items (90000 interactions) | sparsity:: 94.25%
test data:: 922 users and 1269 items (10000 interactions) | sparsity:: 99.15%

training_time:: 2.570840 sec
prediction_time:: 1.842921 sec
Eval:: MAE: 0.804285 RMSE: 1.049476 
[Case Recommender: Rating Prediction > SVD]

train data:: 943 users and 1659 items (90000 interactions) | sparsity:: 94.25%
test data:: 922 users and 1269 items (10000 interactions) | sparsity:: 99.15%

training_time:: 0.085531 sec
prediction_time:: 0.013282 sec


Eval:: MAE: 1.168677 RMSE: 1.490471 
