# Purpose

The purpose of this notebook is to build a factorization machine model using the movielens dataset. This consists of the following steps:
1. Load in movielens data
2. preprocess the data, and format into sparse matrix
3. train test split the sparse data
4. Calculate baseline scores for popularity vs factorization machine model
5. model tuning

In [4]:
cd ../

/Users/scottcronin/gh/recommender_deployed


In [5]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import copy
import pandas as pd
import pickle
import numpy as np
import os
import matplotlib.pyplot as plt
from matplotlib import cm
import seaborn as sns
import scipy.sparse as scs
from sklearn.base import TransformerMixin
from sklearn.externals import joblib
from lightfm import LightFM, cross_validation, evaluation

sns.set_context('notebook', font_scale=1.4)

# Load in the data

In [6]:
interactions = pd.read_csv('data/ratings.dat',
                           sep='::', engine='python',
                           header=None,
                           names=['uid', 'iid', 'rating', 'timestamp'],
                           usecols=['uid', 'iid', 'rating'],
                          )
display(interactions.sample(5))
print('Shape: {:>9,} x {}'.format(*interactions.shape))

Unnamed: 0,uid,iid,rating
2187027,15969,246,4.0
4675054,33395,4478,4.0
5167350,36957,488,4.0
4087236,29268,3745,4.0
6446600,46071,3873,3.0


Shape: 10,000,054 x 3


# Preprocess data into a sparse matrix

In [5]:
class Preprocessor(TransformerMixin):
    def __init__(self, copy=True, min_rating=4.0):
        self.copy = copy
        self.min_rating = min_rating
        self.uid_to_idx = None
        self.iid_to_idx = None
    
    def fit(self, df, y=None, **kwargs):
        self._validate_df(df)
        if self.copy:
            df = df.copy()
        df = self._filter_interactions_to_min_rating(df)
        df = self._drop_duplicate_user_item_interactions(df)

        # create uid to indx mapping
        uniq_uids = df['uid'].unique()
        self.uid_to_idx = dict(zip(uniq_uids, np.arange(len(uniq_uids))))

        # create iid to indx mapping
        uniq_iids = df['iid'].unique()
        self.iid_to_idx = dict(zip(uniq_iids, np.arange(len(uniq_iids))))        
        return self
    
    def transform(self, df, **kwargs):
        self._validate_df(df)
        if self.copy:
            df = df.copy()

        df = self._filter_interactions_to_min_rating(df)
        df = self._drop_duplicate_user_item_interactions(df)
        
        # generate sparse matrix
        row = df['uid'].map(self.uid_to_idx)
        col = df['iid'].map(self.iid_to_idx)
        assert len(row) == len(col)
        data = np.ones(len(row))
        shape = (len(self.uid_to_idx), len(self.iid_to_idx))
        csr = scs.coo_matrix((data, (row, col)), shape=shape).tocsr()
        return csr

    def _drop_duplicate_user_item_interactions(self, df):
        if df.duplicated().sum() != 0:
            df = df.drop_duplicated()
        return df
    
    def _filter_interactions_to_min_rating(self, df):
        df = df.loc[df['rating'] >= self.min_rating, ['uid', 'iid']]
        return df
    
    def _validate_df(self, df):
        assert 'uid' in df.columns
        assert 'iid' in df.columns
        assert 'rating' in df.columns

In [13]:
pp = Preprocessor(min_rating=4.0)
csr = pp.fit_transform(interactions)

# Train Test Split for Evaluation

Lets begin by creating a simple train test split

In [7]:
tr, te = cross_validation.random_train_test_split(csr)

Let's build a model with train and evaluate it with test

In [6]:
%%time
lfm = LightFM(no_components=30, loss='warp', learning_rate=0.05)
lfm.fit(tr, epochs=3)

CPU times: user 26.9 s, sys: 237 ms, total: 27.2 s
Wall time: 27.3 s


Let's build a popularity model simply by deleting user and item embedding vectors

In [7]:
pop = copy.deepcopy(lfm)
pop.user_embeddings[:, :] = 0.0
pop.item_embeddings[:, :] = 0.0

# Evaluate model performance

In [8]:
def evaluate_model(model, train, test):
    model_rr = evaluation.reciprocal_rank(
        model=model,
        test_interactions=test,
        train_interactions=train,
        num_threads=2
    )
    model_auc = evaluation.auc_score(
        model=model,
        test_interactions=test,
        train_interactions=train,
        num_threads=2
    )
    return model_rr, model_auc

In [9]:
%%time
fm_rr, fm_auc = evaluate_model(lfm, tr, te)
pop_rr, pop_auc = evaluate_model(pop, tr, te)

CPU times: user 4min 53s, sys: 1.82 s, total: 4min 55s
Wall time: 4min 56s


In [24]:
print('{:>10}:\n\t{:>20}: {:0.3}\n\t{:>20}: {:0.3}'.format(
    'Factorization Machine',
        'Mean Reciprocal Rank', fm_rr.mean(),
        'Mean ROC-AUC', fm_auc.mean()
))
print('{:>10}:\n\t{:>20}: {:0.3}\n\t{:>20}: {:0.3}'.format(
    'Popularity Model',
        'Mean Reciprocal Rank', pop_rr.mean(),
        'Mean ROC-AUC', pop_auc.mean()
))

Factorization Machine:
	Mean Reciprocal Rank: 0.41
	        Mean ROC-AUC: 0.972
Popularity Model:
	Mean Reciprocal Rank: 0.281
	        Mean ROC-AUC: 0.944


As you can see, a factorization machine model outperforms a popularity model by ~50% in mean reciprocal rank. Mean reciprocal rank tends to be the best metric for this type of model as it is heavily weighted on getting the first few recommendations correct.