# Implicit feedback recommender for events

In this notebook we are going to build a recommender system based on an implicit feedback dataset. We have a list of events in the form of IDs of artists and venues. This dataset essentially denotes user behavior, that is, which artists do events at which venues.  We can analyze these data to make recommendations based on what is learned from these preferences. For this process we will use Matrix Factorization, a method that uses matrix decomposition to discover the underlying factors or features that describe artist and venue preferences.

Matrix factorization is a popular algorithm for building implicit feedback recommender systems. The basic idea behind matrix factorization is to represent the artist-venue interaction matrix as the product of two low-rank matrices. This allows the algorithm to find underlying latent features that describe the preferences of artists and venues.

To illustrate, let's say we have an artist-venue interaction matrix R, where each row represents an artist, each column represents a venue, and each entry represents the artist's interaction with the venue (e.g., an event). The goal of matrix factorization is to factorize this matrix into two matrices, U and V, such that $R ≈ UV^T$. Here, U is a matrix where each row represents an artist and each column represents a latent feature, and V is a matrix where each row represents a venue and each column represents a latent feature. The latent features are learned during the training process and are not directly observed in the data.

The factorization can be achieved by minimizing a loss function that measures the difference between the predicted and actual interactions in the artist-venue matrix. One common loss function used in matrix factorization is the mean squared error (MSE), which is defined as:

$$L = ∑(r_ui - u_i^T v_u)^2$$

Here, $r_ui$ is the actual interaction between artist $u$ and venue $i$, and $u_i$ and $v_u$ are the corresponding row vectors from the matrices U and V, respectively. The goal is to find the values of U and V that minimize this loss function.

Once the matrix factorization has been performed, the recommendation engine can make predictions for new venues that an artist has not yet booked an event with by multiplying the artist's latent feature vector by the venue matrix $V^T$. The resulting vector represents the predicted interactions between the artist and each venue, and the top venues can be recommended to the artist.

Matrix factorization has proven to be an effective algorithm for implicit feedback recommender systems, particularly for large and sparse datasets. It has been used in many real-world applications, such as movie and music recommendations on platforms like Netflix and Spotify.

## The training process

First we read the training data

In [None]:
import pandas as pd
implicit_data = pd.read_csv("data.csv", sep="\t", header=None)

We then construct a Pandas DataFrame, where the first column is the artist's ID, the second the venue's ID and the third (called 'plays') is the number of times the artist has performed in that venue.

In [None]:
implicit_data = implicit_data.groupby(['artists', 'venues']).size()
implicit_data = implicit_data.reset_index(level=['artists', 'venues'])
implicit_data.rename(columns={0: 'plays'}, inplace=True)

Next, we initialise a model from the [implicit](https://implicit.readthedocs.io/en/latest/) library with 50 latent factors.

In [None]:
import implicit
model = implicit.als.AlternatingLeastSquares(factors=50)

We convert the data into a CSR matrix, a sparse matrix data structure used by the implicit library.

In [None]:
from pandas.api.types import CategoricalDtype
from scipy import sparse
artists = implicit_data["artists"].unique()
venues = implicit_data["venues"].unique()
shape = (len(artists), len(venues))

We create an index for the artists and the venues and finally the sparse matrix itself.

In [None]:
artist_cat = CategoricalDtype(categories=sorted(artists), ordered=True)
venue_cat = CategoricalDtype(categories=sorted(venues), ordered=True)
artist_index = implicit_data["artists"].astype(artist_cat).cat.codes
venue_index = implicit_data["venues"].astype(venue_cat).cat.codes
coo = sparse.coo_matrix(
    (implicit_data["plays"], (artist_index, venue_index)), shape=shape
)
csr = coo.tocsr()

The model can be trained with a single command.

In [None]:
model.fit(csr)

The first 10 recommendations for an artist can be obtained as follows:

In [None]:
import numpy as np
artist_id = 10
results = model.recommend(
  artist_index, csr[artist_id], N=10, filter_already_liked_items=True
)
ids = [result for result in results[0]]
scores = [result for result in results[1]]
pd.DataFrame(
  {"venue": venues[ids], 
   "score": scores, 
   "already_liked": np.in1d(ids, csr[artist_id].indices)
  }
)

The same for a venue

In [None]:
venue_id = 2
results = model.recommend(
  venue_id, csr[venue_id], N=10, filter_already_liked_items=True
)
ids = [result for result in results[0]]
scores = [result for result in results[1]]
pd.DataFrame(
  {"artist": artists[ids], 
   "score": scores, 
   "already_liked": np.in1d(ids, csr[venue_index].indices)
 }
)

Finally, we export the model to be used in the production recommender system.

In [None]:
from joblib import dump
dump(model, 'model.joblib')