# Useful Advice

- How can we update/store the model for updating when new users/new items/ new interactions come in?
    - The most robust answer is that you periodically recompute the model from scratch to handle all three. For just new interactions, you can run additional fitting iterations on the same model with the new data. (You can persist the model by pickling it.) Adding new users/items (this is called fold-in) is somewhat tricky, and not explicitly supported. For models that naturally take in new information (and new users), you should have a look at [sequence-based models](https://maciejkula.github.io/spotlight/index.html#sequential-models)
- Getting similar items
    - https://github.com/lyst/lightfm/issues/244
- bdicts library to convert UUIDs to 32 bit ints for recommender libraries
    - https://stackoverflow.com/questions/48068147/recommender-systems-convert-uuids-to-32-bit-ints-for-recommender-libraries
- Evaluation:
    - The scores themselves have no meaning in isolation; they are only meaningful because they define a ranking over items for a given user. The scale they take depends on the loss you specify, the learning rate, the regularization parameters, and the data itself. I recommend keeping an eye on the MRR/AUC scores of your model, and comparing them with what a random or a popularity model would achieve.
- Mini-Batch Training for large dataset:
    - https://github.com/lyst/lightfm/issues/234
- Item-Item Recommendation ???
    - https://github.com/lyst/lightfm/issues/239#issuecomment-352774493
- Sample weight for different interactions
    - https://github.com/lyst/lightfm/issues/260
- Handling non-categorical features
    - https://github.com/lyst/lightfm/issues/261
- Explaining Recommendation
    - try using item similarity to do that present support for your recommendations. For example, you could use cosine similarity between the embedding of the item you are recommending and items that the user has interacted with in the past to find items from the user's history that are most similar to the recommendation you are making.
    - https://github.com/lyst/lightfm/issues/251#issuecomment-363314396
- One interesting characteristic of WARP is that it is effectively playing against itself: it is preferentially using examples that it scores highly as negative examples. Up to a certain point, this allows it to express user preferences better; beyond that point, it is self-defeating, as it is selecting the examples that it should recommend and treating them as negatives. The extent to which that happens is governed by the max_sampled hyperparameter. I suspect that this is what happening in your case. Instead of switching to BPR, can you try decreasing that hyperparameter? Otherwise, I don't see any obvious problems with switching losses. It's interesting that you find different results with adagrad and adadelta, I haven't observed that.
    - https://github.com/lyst/lightfm/issues/258#issuecomment-363313831
- Use sample weight to incorporate multiple feedbacks
    - https://github.com/lyst/lightfm/issues/258#issuecomment-363313831
- Contextual-Awareness: If you want to have contextual features (time-varying, or specific to a given interaction instance), you could integrate them in either user or item features. This way you would have more than one features row for a given user or item. For example, instead of having one row for item 10: {'id': 10, 'category': 'foo'} you could have two rows: {'id': 10, 'category': 'foo', 'time-of-week': 'friday'} and {'id': 10, 'category': 'foo', 'time-of-week': 'tuesday'}. With a little ingenuity in feature engineering you can make the model very flexible.
    - https://github.com/lyst/lightfm/issues/238#issuecomment-352774054
- To incorporate numerical side features, we may find that discretization works better.
    - https://github.com/lyst/lightfm/issues/261
- Using predict with feature matrix
    - https://github.com/lyst/lightfm/issues/240
- Get item recommendation for a particular user on a particular item page
    - https://github.com/lyst/lightfm/issues/266#issuecomment-378243505
- Discussion on combining interaction matrix
    - https://github.com/lyst/lightfm/issues/176
- Loss function for "mostly implicit" data
    - In addition to positive implicit feedback data, I also have implicit negative feedback. The way I approach the problem is instead of sampling negative items at random, I modified a bit the sampling strategy to sample from implicit negative feedback. This is somewhat intuitive IMO, because if we already have the negative feedback for each users, why would we choose at random? This is even more relevant if we have sufficient number of negative feedbacks.
    - https://github.com/lyst/lightfm/issues/310
- Whether to exclude the items the user already interacted with from the evaluation score.
    - A good recommender model will definitely recommend items the user has already seen. The test set does not contain the items the user has already seen, so if they are ranked highly, this will reduce the measured accuracy on the test set. To eliminate this effect, we can exclude the already-seen items from the evaluation. This can be done by setting the items already in the training set to MIN_FLOAT.
    - https://github.com/lyst/lightfm/issues/349
    - https://github.com/lyst/lightfm/issues/326
- Normalized outputs are often easier to reason about, and threshold for recommendation systems, and I've observed in other models that training with normalization included can improve precision. (compared to normalizing afterwards)
    - https://github.com/lyst/lightfm/pull/309
- I'm uncomfortable with calling seed a tunable hyperparameter. With the right (other) hyperparameters and enough data, the choice of random seed should have a small effect on the final accuracy. In fact, the setting is mostly there for testing purposes: in applications, you would not hold it fixed and let it very from fitting to fitting.
    - https://github.com/lyst/lightfm/issues/355
- Formula for lightfm's predict function
    - https://github.com/lyst/lightfm/issues/363
- Recommendations are a ranking problem, not a classification problem, so things like thresholds don't really make sense. The idea is that you have, say, 4 slots to fill in your interface, and you sort all products by their recommendation score and pick the top 4; you never set a threshold.

#  Implementation

In [1]:
%load_ext cython

In [12]:
%%cython -a
#cython: boundscheck=False, wraparound=False, cdivision=True

ctypedef float flt
from libc.stdlib cimport free, malloc


cdef class FastLightFM:
    """
    Class holding all the model state.
    """

    cdef flt[:, :] item_features
    cdef flt[:, :] item_feature_gradients
    cdef flt[:, :] item_feature_momentum

    cdef flt[:] item_biases
    cdef flt[:] item_bias_gradients
    cdef flt[:] item_bias_momentum

    cdef flt[:, :] user_features
    cdef flt[:, :] user_feature_gradients
    cdef flt[:, :] user_feature_momentum

    cdef flt[:] user_biases
    cdef flt[:] user_bias_gradients
    cdef flt[:] user_bias_momentum

    cdef int no_components
    cdef int adadelta
    cdef flt learning_rate
    cdef flt rho
    cdef flt eps
    cdef int max_sampled

    cdef double item_scale
    cdef double user_scale

    def __init__(self,
                 flt[:, :] item_features,
                 flt[:, :] item_feature_gradients,
                 flt[:, :] item_feature_momentum,
                 flt[:] item_biases,
                 flt[:] item_bias_gradients,
                 flt[:] item_bias_momentum,
                 flt[:, :] user_features,
                 flt[:, :] user_feature_gradients,
                 flt[:, :] user_feature_momentum,
                 flt[:] user_biases,
                 flt[:] user_bias_gradients,
                 flt[:] user_bias_momentum,
                 int no_components,
                 int adadelta,
                 flt learning_rate,
                 flt rho,
                 flt epsilon,
                 int max_sampled):

        self.item_features = item_features
        self.item_feature_gradients = item_feature_gradients
        self.item_feature_momentum = item_feature_momentum
        self.item_biases = item_biases
        self.item_bias_gradients = item_bias_gradients
        self.item_bias_momentum = item_bias_momentum
        self.user_features = user_features
        self.user_feature_gradients = user_feature_gradients
        self.user_feature_momentum = user_feature_momentum
        self.user_biases = user_biases
        self.user_bias_gradients = user_bias_gradients
        self.user_bias_momentum = user_bias_momentum

        self.no_components = no_components
        self.learning_rate = learning_rate
        self.rho = rho
        self.eps = epsilon

        self.item_scale = 1.0
        self.user_scale = 1.0

        self.adadelta = adadelta

        self.max_sampled = max_sampled


cdef class CSRMatrix:
    """
    Utility class for accessing elements
    of a CSR matrix.
    """

    cdef int[:] indices
    cdef int[:] indptr
    cdef flt[:] data

    cdef int rows
    cdef int cols
    cdef int nnz

    def __init__(self, csr_matrix):

        self.indices = csr_matrix.indices
        self.indptr = csr_matrix.indptr
        self.data = csr_matrix.data

        self.rows, self.cols = csr_matrix.shape
        self.nnz = len(self.data)

    cdef int get_row_start(self, int row) nogil:
        """
        Return the pointer to the start of the
        data for row.
        """

        return self.indptr[row]

    cdef int get_row_end(self, int row) nogil:
        """
        Return the pointer to the end of the
        data for row.
        """

        return self.indptr[row + 1]
    

import numpy as np
from lightfm import LightFM
from lightfm.datasets import fetch_stackexchange

# ify you have more physical cores available.
NUM_THREADS = 2
NUM_COMPONENTS = 30
NUM_EPOCHS = 3
ITEM_ALPHA = 1e-6

data = fetch_stackexchange('crossvalidated',
                           test_set_fraction=0.1,
                           indicator_features=False,
                           tag_features=True)
train = data['train']
test = data['test']
user_features = None
item_features = data['item_features']  # shape n_items * n_item_features (tags)
tag_labels = data['item_feature_labels']

# Define a new model instance
model = LightFM(loss='warp',
                item_alpha=ITEM_ALPHA,
                no_components=NUM_COMPONENTS)

# Fit the hybrid model. Note that this time, we pass
# in the item features matrix.
model = model.fit(train,
                item_features=item_features,
                epochs=NUM_EPOCHS,
                num_threads=NUM_THREADS)

user_ids = np.array([0])
item_ids = np.array([0])
print(model.predict(user_ids, item_ids, item_features))

[0.39820984]


In [13]:
user_ids = np.array([0])
item_ids = np.array([0])
user_ids = np.repeat(np.int32(user_ids), len(item_ids))
user_ids

array([0], dtype=int32)

In [20]:
import scipy.sparse as sp


def _construct_feature_matrices(n_users, n_items, user_features, item_features):
    if user_features is None:
        user_features = sp.identity(n_users, dtype = np.float32, format = 'csr')
    else:
        user_features = user_features.tocsr()

    if item_features is None:
        item_features = sp.identity(n_items, dtype = np.float32, format = 'csr')
    else:
        item_features = item_features.tocsr()
        
    return user_features, item_features


user_ids = np.repeat(np.int32(user_ids), len(item_ids))
n_users = user_ids.max() + 1
n_items = item_ids.max() + 1
user_features, item_features = _construct_feature_matrices(
    n_users, n_items, user_features, item_features)

CSRMatrix(item_features)
predictions = np.empty(len(user_ids), dtype=np.float64)
predictions.shape

(1,)

In [18]:
item_features

<72360x1246 sparse matrix of type '<class 'numpy.float32'>'
	with 198963 stored elements in Compressed Sparse Row format>

In [None]:
hi

In [None]:
import numpy as np
from lightfm.datasets import fetch_stackexchange

data = fetch_stackexchange('crossvalidated',
                           test_set_fraction=0.1,
                           indicator_features=False,
                           tag_features=True)
train = data['train']
test = data['test']


print('The dataset has %s users and %s items, '
      'with %s interactions in the test and %s interactions in the training set.'
      % (train.shape[0], train.shape[1], test.getnnz(), train.getnnz()))

In [None]:
# Import the model
from lightfm import LightFM

# Set the number of threads; you can increase this
# ify you have more physical cores available.
NUM_THREADS = 2
NUM_COMPONENTS = 30
NUM_EPOCHS = 3
ITEM_ALPHA = 1e-6

# Let's fit a WARP model: these generally have the best performance.
model = LightFM(loss='warp',
                item_alpha=ITEM_ALPHA,
               no_components=NUM_COMPONENTS)

# Run 3 epochs and time it.
model = model.fit(train, epochs=NUM_EPOCHS, num_threads=NUM_THREADS)
print(model.item_embeddings.shape)
print(model.user_embeddings.shape)

In [None]:
model.user_embeddings[0].shape

In [None]:
model.item_embeddings[0].shape

In [None]:
user_ids = np.array([0])
item_ids = np.array([0])
print(model.predict(user_ids, item_ids, item_features))

model.user_embeddings[0].dot(model.item_embeddings[0]) # + model.user_biases[0] + model.item_biases[0]

In [None]:
from lightfm.evaluation import auc_score

# Compute and print the AUC score
train_auc = auc_score(model, train, num_threads=NUM_THREADS).mean()
print('Collaborative filtering train AUC: %s' % train_auc)

In [None]:
test_auc = auc_score(model, test, train_interactions=train, num_threads=NUM_THREADS).mean()
print('Collaborative filtering test AUC: %s' % test_auc)

In [None]:
# Set biases to zero
model.item_biases *= 0.0

test_auc = auc_score(model, test, train_interactions=train, num_threads=NUM_THREADS).mean()
print('Collaborative filtering test AUC: %s' % test_auc)

In [None]:
item_features = data['item_features']
tag_labels = data['item_feature_labels']

print('There are %s distinct tags, with values like %s.' % (item_features.shape[1], tag_labels[:3].tolist()))

In [None]:
item_features.shape

In [None]:
tag_labels.shape

In [None]:
# Define a new model instance
model = LightFM(loss='warp',
                item_alpha=ITEM_ALPHA,
                no_components=NUM_COMPONENTS)

# Fit the hybrid model. Note that this time, we pass
# in the item features matrix.
model = model.fit(train,
                item_features=item_features,
                epochs=NUM_EPOCHS,
                num_threads=NUM_THREADS)

In [None]:
user_ids = np.array([0])
item_ids = np.array([0])
t = model.predict(user_ids, item_ids, item_features)
t

In [None]:
model.item_embeddings.shape

In [None]:
model.user_embeddings.shape

In [None]:
model.user_embeddings[0] @ model.item_embeddings[0]

In [None]:
# Don't forget the pass in the item features again!
train_auc = auc_score(model,
                      train,
                      item_features=item_features,
                      num_threads=NUM_THREADS).mean()
print('Hybrid training set AUC: %s' % train_auc)

In [None]:
test_auc = auc_score(model,
                    test,
                    train_interactions=train,
                    item_features=item_features,
                    num_threads=NUM_THREADS).mean()
print('Hybrid test set AUC: %s' % test_auc)

In [None]:
def get_similar_tags(model, tag_id):
    # Define similarity as the cosine of the angle
    # between the tag latent vectors
    
    # Normalize the vectors to unit length
    tag_embeddings = (model.item_embeddings.T
                      / np.linalg.norm(model.item_embeddings, axis=1)).T
    
    query_embedding = tag_embeddings[tag_id]
    similarity = np.dot(tag_embeddings, query_embedding)
    most_similar = np.argsort(-similarity)[1:4]
    
    return most_similar


for tag in (u'bayesian', u'regression', u'survival'):
    tag_id = tag_labels.tolist().index(tag)
    print('Most similar tags for %s: %s' % (tag_labels[tag_id],
                                            tag_labels[get_similar_tags(model, tag_id)]))