## FastAI Recommender

This notebook shows how to use the [FastAI](https://fast.ai) recommender which is using [Pytorch](https://pytorch.org/) under the hood. 

In [1]:
import torch, fastai
from fastai.collab import *
from fastai.tabular import *

# set the environment path to find Recommenders
import sys
sys.path.append("../../")
import time
import os
import itertools
import pandas as pd
import papermill as pm

from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_random_split
from reco_utils.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k
from reco_utils.evaluation.python_evaluation import rmse, mae, rsquared, exp_var

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))
print("Fast AI version: {}".format(fastai.__version__))
print("Torch version: {}".format(torch.__version__))
print("Cuda Available: {}".format(torch.cuda.is_available()))
print("CuDNN Enabled: {}".format(torch.backends.cudnn.enabled))

System version: 3.6.0 | packaged by conda-forge | (default, Feb  9 2017, 14:36:55) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
Pandas version: 0.23.4
Fast AI version: 1.0.39
Torch version: 1.0.0
Cuda Available: True
CuDNN Enabled: True


Defining some constants to refer to the different columns of our dataset.

In [2]:
USER,ITEM,RATING,TIMESTAMP,PREDICTION,TITLE = 'UserId','MovieId','Rating','Timestamp','Prediction','Title'

In [3]:
# top k items to recommend
TOP_K = 10

# Select Movielens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

In [4]:
ratings_df = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=[USER,ITEM,RATING,TIMESTAMP]
)

ratings_df.head()

# make sure the IDs are loaded as strings to better prevent confusion with embedding ids
ratings_df[USER] = ratings_df[USER].astype('str')
ratings_df[ITEM] = ratings_df[ITEM].astype('str')

## Training

In [5]:
# fix random seeds to make sure out runs are reproducible
np.random.seed(101)
torch.manual_seed(101)
torch.cuda.manual_seed_all(101)

In [6]:
start_time = time.time()

data = CollabDataBunch.from_df(ratings_df, pct_val=0.25, user_name=USER, item_name=ITEM, rating_name=RATING)

preprocess_time = time.time() - start_time

In [7]:
data.show_batch()

UserId,MovieId,target
785,423,2.0
110,230,3.0
72,553,5.0
871,202,4.0
268,558,3.0


Now we will create a `collab_learner` for the data. We will be using 40 latent factors. This will create an embedding for the users and the items that will map each of these to 40 floats as can be seen below. Note that the embedding parameters are not predefined, but are learned by the model.

Although ratings can only range from 1-5, we are setting the range of possible ratings to a range from 0 to 5.5 -- that will allow the model to predict values around 1 and 5, which improves accuracy. Lastly, we set a value for weight-decay for regularization.

In [8]:
learn = collab_learner(data, n_factors=40, y_range=[0,5.5], wd=1e-1)
learn.model

EmbeddingDotBias(
  (u_weight): Embedding(944, 40)
  (i_weight): Embedding(1628, 40)
  (u_bias): Embedding(944, 1)
  (i_bias): Embedding(1628, 1)
)

Now train the model for 5 epochs setting the maximal learning rate. The learner will reduce the learning rate with each epoch using cosine annealing.

In [9]:
start_time = time.time()

learn.fit_one_cycle(5, max_lr=5e-3)

train_time = time.time() - start_time + preprocess_time
print("Took {} seconds for training.".format(train_time))

epoch,train_loss,valid_loss
1,0.933093,0.952120
2,0.870814,0.879554
3,0.744948,0.835027
4,0.646840,0.816982
5,0.548141,0.816484


Took 31.75955605506897 seconds for training.


## Generating Recommendations

Define two helper functions

In [10]:
def cartesian_product(*arrays):
    la = len(arrays)
    dtype = np.result_type(*arrays)
    arr = np.empty([len(a) for a in arrays] + [la], dtype=dtype)
    for i, a in enumerate(np.ix_(*arrays)):
        arr[...,i] = a
    return arr.reshape(-1, la)  

def score(learner, userIds, movieIds, user_col, item_col, prediction_col, top_k=0):
    """score all users+movies provided and reduce to top_k items per user if top_k>0"""
    u = learner.get_idx(userIds, is_item=False)
    m = learner.get_idx(movieIds, is_item=True)
    
    pred = learner.model.forward(u, m)
    scores = pd.DataFrame({user_col: userIds, item_col:movieIds, prediction_col:pred})
    scores =  scores.sort_values([user_col,prediction_col],ascending=[True,False])
    if top_k > 0:
        top_scores = scores.groupby(user_col).head(top_k).reset_index(drop=True)
    else:
        top_scores = scores
    return top_scores

Get the validation and test sets from the learner's data bunch:

In [11]:
valid_df = pd.DataFrame({USER:[row.classes[USER][row.cats[0]] for row in learn.data.valid_ds.x], 
                        ITEM:[row.classes[ITEM][row.cats[1]] for row in learn.data.valid_ds.x], 
                        RATING: [row.obj for row in data.valid_ds.y]})

train_df = pd.DataFrame({USER:[row.classes[USER][row.cats[0]] for row in learn.data.train_ds.x], 
                        ITEM:[row.classes[ITEM][row.cats[1]] for row in learn.data.train_ds.x], 
                        RATING: [row.obj for row in data.train_ds.y]})

Get all users from the validation set and all items from the test and validation sets

In [12]:
valid_users = valid_df[USER].unique()
_, total_items = data.classes.values()
total_items = np.array(total_items[1:])

Build the cartesian product of users and items to score all items for all users


In [13]:
users_items = cartesian_product(np.array(valid_users),np.array(total_items))
users_items = pd.DataFrame(users_items, columns=[USER,ITEM])


Lastly, remove the user/items combinations that are in the training set -- we don't want to propose a movie that the user has already watched.

In [14]:
training_removed = pd.concat([users_items, train_df[[USER,ITEM]], train_df[[USER,ITEM]]]).drop_duplicates(keep=False)

### Score the model to find the top K recommendation

In [15]:
start_time = time.time()

top_k_scores = score(learn, training_removed[USER], training_removed[ITEM], 
                     user_col=USER, item_col=ITEM, prediction_col=PREDICTION, top_k=TOP_K)

test_time = time.time() - start_time
print("Took {} seconds for {} predictions.".format(test_time, len(training_removed)))

Took 1.7312288284301758 seconds for 1459261 predictions.


Calculate some metrics for our model

In [16]:
eval_map = map_at_k(valid_df, top_k_scores, col_user=USER, col_item=ITEM, 
                    col_rating=RATING, col_prediction=PREDICTION, 
                    relevancy_method="top_k", k=TOP_K)

In [17]:
eval_ndcg = ndcg_at_k(valid_df, top_k_scores, col_user=USER, col_item=ITEM, 
                      col_rating=RATING, col_prediction=PREDICTION, 
                      relevancy_method="top_k", k=TOP_K)

In [18]:
eval_precision = precision_at_k(valid_df, top_k_scores, col_user=USER, col_item=ITEM, 
                                col_rating=RATING, col_prediction=PREDICTION, 
                                relevancy_method="top_k", k=TOP_K)

In [19]:
eval_recall = recall_at_k(valid_df, top_k_scores, col_user=USER, col_item=ITEM, 
                          col_rating=RATING, col_prediction=PREDICTION, 
                          relevancy_method="top_k", k=TOP_K)

In [20]:
print("Model:\t" + learn.__class__.__name__,
      "Top K:\t%d" % TOP_K,
      "MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

Model:	CollabLearner
Top K:	10
MAP:	0.024682
NDCG:	0.153864
Precision@K:	0.139236
Recall@K:	0.054513


The above numbers are lower than SAR, but expected, since the model is explicitly trying to generalize the users and items to the latent factors. Next look at how well the model predicts how the user would rate the movie. Need to score `training_removed` again, but this time don't ask for top_k, but keep all instead. Then merge the validation set that contains the rating with the predicted values.

In [21]:
scores = score(learn, training_removed[USER], training_removed[ITEM], 
               user_col=USER, item_col=ITEM, prediction_col=PREDICTION)

Now calculate some regression metrics

In [22]:
eval_r2 = rsquared(valid_df, scores, col_user=USER, col_item=ITEM, col_rating=RATING, col_prediction=PREDICTION)
eval_rmse = rmse(valid_df, scores, col_user=USER, col_item=ITEM, col_rating=RATING, col_prediction=PREDICTION)
eval_mae = mae(valid_df, scores, col_user=USER, col_item=ITEM, col_rating=RATING, col_prediction=PREDICTION)
eval_exp_var = exp_var(valid_df, scores, col_user=USER, col_item=ITEM, col_rating=RATING, col_prediction=PREDICTION)

pd.DataFrame([
    ["RMSE",eval_rmse],
    ["MAE",eval_mae],
    ["R2",eval_r2],
    ["ExpVar",eval_exp_var]], 
    columns=['Metric', 'Value'])

Unnamed: 0,Metric,Value
0,RMSE,0.902846
1,MAE,0.713283
2,R2,0.363257
3,ExpVar,0.363808


That RMSE is actually quite good when compared to these benchmarks: https://www.librec.net/release/v1.3/example.html

In [23]:
# Record results with papermill for tests
pm.record("map", eval_map)
pm.record("ndcg", eval_ndcg)
pm.record("precision", eval_precision)
pm.record("recall", eval_recall)
pm.record("rmse", eval_rmse)
pm.record("mae", eval_mae)
pm.record("exp_var", eval_exp_var)
pm.record("rsquared", eval_r2)
pm.record("train_time", train_time)
pm.record("test_time", test_time)