## SpotlightRec MVP
Purpose: To construct a *sequencial* movie recommender that returns a recommendation based upon a list of liked films.

In [1]:
# General libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Spotlight specific tools
from spotlight.cross_validation import user_based_train_test_split
############################################################
# Goal: remove the need for synthetic data
#from spotlight.datasets.synthetic import generate_sequential
# Actual data set
from spotlight.datasets.movielens import get_movielens_dataset
############################################################
from spotlight.evaluation import sequence_mrr_score, sequence_precision_recall_score
from spotlight.sequence.implicit import ImplicitSequenceModel

Generation of synthetic data using built in spotlight generator.

In [2]:
#help(get_movielens_dataset)
movielens = get_movielens_dataset()

Format of movielens dataset:
 - number of rows: 100000
 - columns include: user id | item id | rating | timestamp

Construct train and test splits

In [3]:
#help(user_based_train_test_split)
train, test = user_based_train_test_split(movielens)

train = train.to_sequence()
test = test.to_sequence()


Train model on user based review training set

In [16]:
#help(ImplicitSequenceModel)
model = ImplicitSequenceModel(n_iter=50,
                              representation='cnn',
                              loss='bpr')


In [17]:
#help(model.fit)
model.fit(train, verbose=True)

Epoch 0: loss 0.3051327263767069
Epoch 1: loss 0.21001177981044306
Epoch 2: loss 0.20228308935960135
Epoch 3: loss 0.20180538748249863
Epoch 4: loss 0.19824618372050198
Epoch 5: loss 0.19474790267872089
Epoch 6: loss 0.19241349173314642
Epoch 7: loss 0.19096127197597967
Epoch 8: loss 0.18391118853381186
Epoch 9: loss 0.16757668554782867
Epoch 10: loss 0.15247104700767633
Epoch 11: loss 0.14214567298238928
Epoch 12: loss 0.13556200220729364
Epoch 13: loss 0.12837566074096796
Epoch 14: loss 0.12227996032346379
Epoch 15: loss 0.11606480903697736
Epoch 16: loss 0.112983236032905
Epoch 17: loss 0.11190626499327747
Epoch 18: loss 0.11051565995722105
Epoch 19: loss 0.10940220590793726
Epoch 20: loss 0.10865532764882753
Epoch 21: loss 0.10883934931321577
Epoch 22: loss 0.10666036470369859
Epoch 23: loss 0.10645296447204822
Epoch 24: loss 0.10542114259618701
Epoch 25: loss 0.10586479193333423
Epoch 26: loss 0.10272521841706651
Epoch 27: loss 0.10313971972826755
Epoch 28: loss 0.1029671647331931

In [18]:
sequence_mrr_score(model, test)

array([0.00980392, 0.05      , 0.01010101, ..., 0.06666667, 0.04      ,
       0.25      ])

In [35]:
#help(model.predict)
predVals = model.predict(sequences=np.array([1,11,28]))

In [36]:
np.argmax(predVals)

172

We see that the returned prediction is simply the movieID. Based on the storage of the data (Sequential interaction database) we are lacking the ability to extract pertinant metadata about the predicted movie.

Further work is being done on this problem.