In [2]:
### approach first made popular by the Netflix prize contest: matrix factorization. ###

import numpy as np

from spotlight.datasets.movielens import get_movielens_dataset

dataset = get_movielens_dataset(variant='100K')
print(dataset)

# The dataset object is an instance of an Interactions class, a fairly light-weight wrapper that
# Spotlight users to hold the arrays that contain information about an interactions dataset (such as user
# and item ids, ratings, and timestamps).

<Interactions dataset (944 users x 1683 items x 100000 interactions)>


# Taken from https://maciejkula.github.io/spotlight/factorization/explicit.html#spotlight.factorization.explicit.ExplicitFactorizationModel

Parameters:	
- loss (string, optional) – One of ‘regression’, ‘poisson’, ‘logistic’ corresponding to losses from spotlight.losses.
- embedding_dim (int, optional) – Number of embedding dimensions to use for users and items.
-n_iter (int, optional) – Number of iterations to run.

-batch_size (int, optional) – Minibatch size.

-l2 (float, optional) – L2 loss penalty.

-learning_rate (float, optional) – Initial learning rate.

-optimizer_func (function, optional) – Function that takes in module parameters as the first argument and returns an instance of a PyTorch optimizer. Overrides l2 and learning rate if supplied. If no optimizer supplied, then use ADAM by default.

-use_cuda (boolean, optional) – Run the model on a GPU.

-representation (a representation module, optional) – If supplied, will override default settings and be used as the main network module in the model. Intended to be used as an escape hatch when you want to reuse the model’s training functions but want full freedom to specify your network topology.

-sparse (boolean, optional) – Use sparse gradients for embedding layers.

-random_state (instance of numpy.random.RandomState, optional) – Random state to use when fitting.

In [15]:
import torch

from spotlight.factorization.explicit import ExplicitFactorizationModel

model = ExplicitFactorizationModel(loss='regression',
                                   embedding_dim=128,  # latent dimensionality
                                   n_iter=10,  # number of epochs of training
                                   batch_size=1024,  # minibatch size
                                   l2=1e-9,  # strength of L2 regularization
                                   learning_rate=1e-3,
                                   use_cuda=torch.cuda.is_available())

The model
We can feed our dataset to the ExplicitFactorizationModel class - and sklearn-like object that allows us to train and evaluate the explicit factorization models.

Internally, the model uses the BilinearNet class to represents users and items. It's composed of a 4 embedding layers:

a (num_users x latent_dim) embedding layer to represent users,

a (num_items x latent_dim) embedding layer to represent items,

a (num_users x 1) embedding layer to represent user biases, and

a (num_items x 1) embedding layer to represent item biases.

Together, these give us the predictions. Their accuracy is evaluated using one of the Spotlight losses. In this case, we'll use the regression loss, which is simply the squared difference between the true and the predicted rating.

In [16]:
# Split into training and test set
from spotlight.cross_validation import random_train_test_split

train, test = random_train_test_split(dataset, random_state=np.random.RandomState(42))
print('Split into \n {} and \n {}'.format(train, test))


Split into 
 <Interactions dataset (944 users x 1683 items x 80000 interactions)> and 
 <Interactions dataset (944 users x 1683 items x 20000 interactions)>


In [17]:
# Train the model
model.fit(train, verbose=True)

Epoch 0: loss 13.129207152354565
Epoch 1: loss 7.510111075413378
Epoch 2: loss 1.791443021991585
Epoch 3: loss 1.0732096886333031
Epoch 4: loss 0.9460989712159845
Epoch 5: loss 0.8974253467366665
Epoch 6: loss 0.8725832889351663
Epoch 7: loss 0.8591318568096885
Epoch 8: loss 0.8486642867703981
Epoch 9: loss 0.8395742698560787


In [18]:
# Model valuation using Root Mean Square Error
from spotlight.evaluation import rmse_score

train_rmse = rmse_score(model, train)
test_rmse = rmse_score(model, test)

print('Train RMSE {:.3f}, test RMSE {:.3f}'.format(train_rmse, test_rmse))

Train RMSE 0.902, test RMSE 0.945
