<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# FBT Single Node on MovieLens (Python, CPU)

FBT (Frequently Bought together) recommender can be thought of as a even simpler restriction of the SAR (Simple Algorithm for Recommendation). Like SAR, FBT is a fast and scalable algorithm for personalized recommendations based on user transaction history. SAR leverages user ratings of items and timestamp information of when user rated an item to produce easily explainable and interpretable recommendations. But there are many scenarios where we may not have reliable rating information or timestamps. All we have is user interactions with items and we need a simple recommendation engine that can leverage this interaction information without regard to context or quality of interaction or when in history did this interaction happen.

This is where we can leverage FBT. Like SAR, FBT recommends items that are most ***similar*** to the ones that the user already has an existing ***affinity*** for. Two items are ***similar*** if the users that interacted with one item are also likely to have interacted with the other. Unlike SAR though, user ***affinity*** to an item is simply binary - 1 if the user has interacted with an item in the past, 0 otherwise.

### Advantages of FBT:
- A simple first algorithm to implement when all you have is users and items and no more information. Covers a broad range of customer scenarios.
- High accuracy for an easy to train and deploy algorithm
- Fast training and scoring, only requiring simple counting to construct matrices used at prediction time.

### Notes to use FBT properly:
- Since FBT uses very little information, recommendations will likely not have more context than historical interactions. If we can leverage useful information from item or user features, more sohisticated algorithms will have an edge in performance.

- It's memory-hungry, requiring the creation of an $mxm$ sparse square matrix (where $m$ is the number of items). This can also be a problem for many matrix factorization algorithms.
- FBT does not need ratings information, hence we can't predict ratings either. Evaluation can best happen with user studies. We can still look at offline evaluation methods like Precision@K, Recall@K.

This notebook provides an example of how to utilize and evaluate FBT in Python on a CPU.

# 0 Global Settings and Imports

In [13]:
%load_ext autoreload
%autoreload 2

# set the environment path to find Recommenders
import sys
sys.path.append("../../")

import logging
import numpy as np
import pandas as pd
import scrapbook as sb
from sklearn.preprocessing import minmax_scale

from reco_utils.common.python_utils import binarize
from reco_utils.common.timer import Timer
from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_stratified_split
from reco_utils.evaluation.python_evaluation import (
    map_at_k,
    ndcg_at_k,
    precision_at_k,
    recall_at_k,
    rmse,
    mae,
    logloss,
    rsquared,
    exp_var,
    get_top_k_items
)
from reco_utils.recommender.fbt.fbt import FBT

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
System version: 3.6.11 | packaged by conda-forge | (default, Nov 27 2020, 18:51:43) 
[GCC Clang 11.0.0]
Pandas version: 1.1.5


# 1 Load Data

FBT is intended to be used on a very simple schema: `<User ID>, <Item ID>.` Each row represents a single interaction between a user and an item. These interactions might be different types of events on an e-commerce website, such as a user clicking to view an item, adding it to a shopping basket, following a recommendation link, and so on. 

The MovieLens dataset is well formatted interactions of Users providing Ratings to Movies (movie ratings are used as the event weight). We will swap out the movielens ratings information with a dummy rating of 1.0 (if user interacted with an item) in order to leverage SAR signature and to showcase simple FBT with only users and items. 

In [14]:
# top k items to recommend
TOP_K = 10

# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

### 1.1 Download and use the MovieLens Dataset

In [15]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=('UserID', 'ItemID'),
    title_col="Title"
)
data['rating'] = 1
data.head()

2021-05-06 22:41:26,844 DEBUG    Starting new HTTP connection (1): files.grouplens.org:80
2021-05-06 22:41:27,060 DEBUG    http://files.grouplens.org:80 "GET /datasets/movielens/ml-100k.zip HTTP/1.1" 200 4924029
100%|██████████| 4.81k/4.81k [00:01<00:00, 3.27kKB/s]


Unnamed: 0,UserID,ItemID,Title,rating
0,196,242,Kolya (1996),1
1,63,242,Kolya (1996),1
2,226,242,Kolya (1996),1
3,154,242,Kolya (1996),1
4,306,242,Kolya (1996),1


### 1.2 Split the data using the python random splitter provided in utilities:

We split the full dataset into a `train` and `test` dataset to evaluate performance of the algorithm against a held-out set not seen during training. Because FBT generates recommendations based on user preferences, all users that are in the test set must also exist in the training set. For this case, we can use the provided `python_stratified_split` function which holds out a percentage (in this case 25%) of items from each user, but ensures all users are in both `train` and `test` datasets. Other options are available in the `dataset.python_splitters` module which provide more control over how the split occurs.

In [16]:
train, test = python_stratified_split(data, ratio=0.75, col_user='UserID', col_item='ItemID', seed=42)

In [17]:
print("""
Train:
Total Ratings: {train_total}
Unique Users: {train_users}
Unique Items: {train_items}

Test:
Total Ratings: {test_total}
Unique Users: {test_users}
Unique Items: {test_items}
""".format(
    train_total=len(train),
    train_users=len(train['UserID'].unique()),
    train_items=len(train['ItemID'].unique()),
    test_total=len(test),
    test_users=len(test['UserID'].unique()),
    test_items=len(test['ItemID'].unique()),
))


Train:
Total Ratings: 74992
Unique Users: 943
Unique Items: 1601

Test:
Total Ratings: 25008
Unique Users: 943
Unique Items: 1532



# 2 Train the FBT Model

### 2.1 Instantiate the SAR algorithm and set the index

We will leverage the single node implementation of SAR and specify the column names to match our dataset. Since we don't need timestamps, we will not use it and turn off timestamp related flags. (`timedecay_formula = False`) We will also switch `normalize = False` which is associated to renormalizing ratings predictions obtained form user affinity computation to an appropriate scale.

Other options are specified to control the behavior of the algorithm as described in the [deep dive notebook](../02_model_collaborative_filtering/sar_deep_dive.ipynb).

In [18]:
logging.basicConfig(level=logging.DEBUG, 
                    format='%(asctime)s %(levelname)-8s %(message)s')

model = FBT(
    col_user="UserID",
    col_item="ItemID",
    col_prediction="score",
    col_rating="rating"
)

### 2.2 Train the FBT model on our training data, and get the top-k recommendations for our testing data

To train (fit) a model, FBT computes an item-to-item ***co-occurence matrix***. Co-occurence represents the number of times two items appear together for any given user. With SAR, we have the option to rescale the co-occurence matrix to an ***item similarity matrix*** by a given metric (Jaccard similarity for example). We will not do this with our vanilla FBT example. 

SAR also computes an ***affinity matrix*** to capture the strength of the relationship between each user and each item. Affinity is driven by different types (like *rating* or *viewing* a movie). Since for our example, any interaction is Boolean, the affinity matrix is identity.

Recommendations are achieved by computing a cooccurrence score per user and course and filtering the ***top-k*** results for each user in the `recommend_k_items` function seen below.

In [19]:
with Timer() as train_time:
    model.fit(train)

print("Took {} seconds for training.".format(train_time.interval))

2021-05-06 22:41:35,000 INFO     De-duplicating the user-item counts
2021-05-06 22:41:35,007 INFO     Check dataframe is of the type, schema we expect
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  .rank('dense', ascending=False)
2021-05-06 22:41:37,374 INFO     Done training
Took 2.3939198610023595 seconds for training.


In [43]:
with Timer() as test_time:
    preds = model.recommend_k_items(test, top_k=20, remove_seen=False)

print("Took {} seconds for prediction.".format(test_time.interval))

2021-05-06 22:55:03,694 INFO     Calculating recommendation scores
Took 4.739804231001472 seconds for prediction.


In [44]:
preds.head()

Unnamed: 0,UserID,ItemID,score,rank
0,1,1,56.985294,4
6,1,7,45.264706,20
49,1,50,66.735294,1
55,1,56,50.283582,11
68,1,69,47.925373,14


In [45]:
preds_with_titles = (
    preds.merge(data[['ItemID', 'Title']].drop_duplicates().set_index('ItemID'), 
               on='ItemID',
               how='inner')
         .sort_values(by=['UserID', 'score'], ascending=False)
         .rename(columns={'Title': 'Item_title_recommended', 'ItemID_paired': 'ItemID_recommended'})
)
        
display(preds_with_titles.head(10))

Unnamed: 0,UserID,ItemID,score,rank,Item_title_recommended
2226,943,50,81.170732,1,Star Wars (1977)
9583,943,174,73.731707,2,Raiders of the Lost Ark (1981)
10526,943,181,72.785714,3,Return of the Jedi (1983)
11606,943,210,70.097561,4,Indiana Jones and the Last Crusade (1989)
4962,943,98,69.512195,5,"Silence of the Lambs, The (1991)"
939,943,1,69.341463,6,Toy Story (1995)
8330,943,172,68.926829,7,"Empire Strikes Back, The (1980)"
6745,943,121,66.073171,8,Independence Day (ID4) (1996)
4119,943,79,65.560976,9,"Fugitive, The (1993)"
12484,943,222,64.414634,10,Star Trek: First Contact (1996)


### 2.3. Evaluate how well FBT performs

We evaluate how well SAR performs for a few common ranking metrics provided in the `python_evaluation` module in reco_utils. We will consider the Mean Average Precision (MAP), Normalized Discounted Cumalative Gain (NDCG), Precision, and Recall for the top-k items per user we computed with SAR. User, item and rating column names are specified in each evaluation method.

In [46]:
test

Unnamed: 0,UserID,ItemID,Title,rating
60240,1,49,I.Q. (1994),1
30162,1,69,Forrest Gump (1994),1
72658,1,221,Breaking the Waves (1996),1
11920,1,5,Copycat (1995),1
85977,1,139,"Love Bug, The (1969)",1
...,...,...,...,...
43302,943,402,Ghost (1990),1
59385,943,739,Pretty Woman (1990),1
10424,943,219,"Nightmare on Elm Street, A (1984)",1
53633,943,184,Army of Darkness (1993),1


In [47]:
test

Unnamed: 0,UserID,ItemID,Title,rating
60240,1,49,I.Q. (1994),1
30162,1,69,Forrest Gump (1994),1
72658,1,221,Breaking the Waves (1996),1
11920,1,5,Copycat (1995),1
85977,1,139,"Love Bug, The (1969)",1
...,...,...,...,...
43302,943,402,Ghost (1990),1
59385,943,739,Pretty Woman (1990),1
10424,943,219,"Nightmare on Elm Street, A (1984)",1
53633,943,184,Army of Darkness (1993),1


In [48]:
preds

Unnamed: 0,UserID,ItemID,score,rank
0,1,1,56.985294,4
6,1,7,45.264706,20
49,1,50,66.735294,1
55,1,56,50.283582,11
68,1,69,47.925373,14
...,...,...,...,...
1487308,943,195,56.780488,20
1487317,943,204,59.780488,15
1487323,943,210,70.097561,4
1487335,943,222,64.414634,10


In [49]:
eval_map_k = map_at_k(test, preds, col_user='UserID', col_item='ItemID', col_prediction='score',k=TOP_K)
eval_map_k

0.011704675310227435

In [50]:
eval_ndcg = ndcg_at_k(test, preds, col_user='UserID', col_item='ItemID', col_prediction='score', k=TOP_K)
eval_ndcg

0.08749225059640534

In [51]:
eval_precision = precision_at_k(test, preds, col_user='UserID', col_item='ItemID', col_prediction='score', k=TOP_K)
eval_precision

0.09141039236479323

In [52]:
eval_recall = recall_at_k(test, preds, col_user='UserID', col_item='ItemID', col_prediction='score', k=TOP_K)
eval_recall

0.04438263793463333

In [134]:
eval_rmse = rmse(test_with_rating, top_k, col_user='UserID', col_item='ItemID', col_prediction='score')
eval_rmse

78.67274761249735

In [135]:
eval_mae = mae(test_with_rating, top_k, col_user='UserID', col_item='ItemID', col_prediction='score')
eval_mae

74.87890428926127

In [136]:
print("Model:\t",
      "Top K:\t%d" % TOP_K,
      "MAP:\t%f" % eval_map_k,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall,
      "RMSE:\t%f" % eval_rmse,
      "MAE:\t%f" % eval_mae,
      sep='\n')

Model:	
Top K:	10
MAP:	0.010292
NDCG:	0.080831
Precision@K:	0.089502
Recall@K:	0.038034
RMSE:	78.672748
MAE:	74.878904


In [65]:
# Now let's look at the results for a specific user
user_id = 876

ground_truth = test[test['UserID']==user_id].sort_values(by='score', ascending=False)[:TOP_K]
prediction = model.recommend_k_items(pd.DataFrame(dict(userID=[user_id])), remove_seen=True) 
test_user_movie_watched_prediction = (
    pd.merge(ground_truth, prediction, on=['userID', 'itemID'], how='left')
      .drop(columns=['rating'])
)
display(test_user_movie_watched_prediction.head())

2021-04-28 17:06:43,348 INFO     Calculating recommendation scores
2021-04-28 17:06:43,369 INFO     Removing seen items


Unnamed: 0,userID,itemID,Title,prediction
0,876,604,It Happened One Night (1934),
1,876,187,"Godfather: Part II, The (1974)",
2,876,523,Cool Hand Luke (1967),
3,876,878,That Darn Cat! (1997),
4,876,286,"English Patient, The (1996)",1304.0


Above, we see that one of the movies from the test set was recovered by the model's top-k recommendations, however the others were not. Offline evaluations are difficult as they can only use what was seen previously in the test set and may not represent the user's actual preferences across the entire set of items. Adjustments to how the data is split, algorithm is used and hyper-parameters can improve the results here. 

In [47]:
# Record results with papermill for tests - ignore this cell
sb.glue("map", eval_map)
sb.glue("ndcg", eval_ndcg)
sb.glue("precision", eval_precision)
sb.glue("recall", eval_recall)
sb.glue("train_time", train_time.interval)
sb.glue("test_time", test_time.interval)