# Explicit feedback movie recommendations
In this example, we'll build a quick explicit feedback recommender system: that is, a model that takes into account explicit feedback signals (like ratings) to recommend new content.

We'll use an approach first made popular by the [Netflix prize](http://www.netflixprize.com/) contest: [matrix factorization](https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf). 

<img src="static/matrix_factorization.png" alt="Matrix factorization" style="width: 600px;"/>

In matrix factorization, we start with user-item-rating triplets, conveying the information that user _i_ gave some item _j_ rating _r_. We then try to estimate representations for both users and items in some high-dimensional latent space so that when we multiply these representations, we can recover the original ratings. The utility of the model then is derived from the fact that if we multiply the user vector of a user with the item vector of some item they _have not_ rated, we hope to obtain a predicition for the rating they would have given to it if they had seen it.

We start with importing a famous dataset, the [Movielens 100k dataset](https://grouplens.org/datasets/movielens/100k/). It contains 100,000 ratings (between 1 and 5) given to 1683 movies by 944 users:

In [3]:
import numpy as np

from spotlight.datasets.movielens import get_movielens_dataset

dataset = get_movielens_dataset(variant='100K')
print(dataset)

<Interactions dataset (944 users x 1683 items x 100000 interactions)>


In order to evaluate the model, we'll split it into a train and a test set:

In [12]:
random_state = np.random.RandomState(42)

from spotlight.cross_validation import random_train_test_split

train, test = random_train_test_split(dataset, random_state=random_state)

print('Split into \n {} and \n {}'.format(train, test))

Split into 
 <Interactions dataset (944 users x 1683 items x 80000 interactions)> and 
 <Interactions dataset (944 users x 1683 items x 20000 interactions)>


We're going to fit a classic factorization model with a regression loss: that is, we'll be trying to fit latent representations to users and items in such a way that the squared difference between actual and predicted ratings is minimized.

In [27]:
from spotlight.factorization.explicit import ExplicitFactorizationModel

model = ExplicitFactorizationModel(loss='regression',
                                  embedding_dim=128,
                                  n_iter=10,
                                  batch_size=1024,
                                  l2=1e-9,
                                  learning_rate=1e-3,
                                  use_cuda=False)

In [28]:
model.fit(train, verbose=True)

Epoch 0: loss 1034.392520904541
Epoch 1: loss 569.925882101059
Epoch 2: loss 136.80290246009827
Epoch 3: loss 84.37578642368317
Epoch 4: loss 74.30708056688309
Epoch 5: loss 70.70860320329666
Epoch 6: loss 68.6357769370079
Epoch 7: loss 67.51402765512466
Epoch 8: loss 66.53516966104507
Epoch 9: loss 65.70937591791153


In [29]:
from spotlight.evaluation import rmse_score

train_rmse = rmse_score(model, train)
test_rmse = rmse_score(model, test)

print('Train RMSE {:.3f}, test RMSE {:.3f}'.format(train_rmse, test_rmse))

Train RMSE 0.897, test RMSE 0.940
