<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# FBT Single Node on MovieLens (Python, CPU)

FBT (Frequently Bought together) recommender can be thought of as a even simpler restriction of the SAR (Simple Algorithm for Recommendation). Like SAR, FBT is a fast and scalable algorithm for personalized recommendations based on user transaction history. SAR leverages user ratings of items and timestamp information of when user rated an item to produce easily explainable and interpretable recommendations. But there are many scenarios where we may not have reliable rating information or timestamps. All we have is user interactions with items and we need a simple recommendation engine that can leverage this interaction information without regard to context or quality of interaction or when in history did this interaction happen.

This is where we can leverage FBT. Like SAR, FBT recommends items that are most ***similar*** to the ones that the user already has an existing ***affinity*** for. Two items are ***similar*** if the users that interacted with one item are also likely to have interacted with the other. Unlike SAR though, user ***affinity*** to an item is simply binary - 1 if the user has interacted with an item in the past, 0 otherwise.

### Advantages of FBT:
- A simple first algorithm to implement when all you have is users and items and no more information. Covers a broad range of customer scenarios.
- High accuracy for an easy to train and deploy algorithm
- Fast training and scoring, only requiring simple counting to construct matrices used at prediction time.

### Notes to use FBT properly:
- Since FBT uses very little information, recommendations will likely not have more context than historical interactions. If we can leverage useful information from item or user features, more sohisticated algorithms will have an edge in performance.

- It's memory-hungry, requiring the creation of an $mxm$ sparse square matrix (where $m$ is the number of items). This can also be a problem for many matrix factorization algorithms.
- FBT does not need ratings information, hence we can't predict ratings either. Evaluation can best happen with user studies. We can still look at offline evaluation methods like Precision@K, Recall@K.

This notebook provides an example of how to utilize and evaluate FBT in Python on a CPU.

# 0 Global Settings and Imports

In [1]:
%load_ext autoreload
%autoreload 2

# set the environment path to find Recommenders
import sys
sys.path.append("../../")

import logging
import numpy as np
import pandas as pd
import scrapbook as sb
from sklearn.preprocessing import minmax_scale

from reco_utils.common.python_utils import binarize
from reco_utils.common.timer import Timer
from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_stratified_split
from reco_utils.evaluation.python_evaluation import (
    map_at_k,
    ndcg_at_k,
    precision_at_k,
    recall_at_k,
    rmse,
    mae,
    logloss,
    rsquared,
    exp_var
)
from reco_utils.recommender.sar import SAR

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))

System version: 3.6.11 | packaged by conda-forge | (default, Nov 27 2020, 18:51:43) 
[GCC Clang 11.0.0]
Pandas version: 1.1.5


# 1 Load Data

FBT is intended to be used on a very simple schema: `<User ID>, <Item ID>.` Each row represents a single interaction between a user and an item. These interactions might be different types of events on an e-commerce website, such as a user clicking to view an item, adding it to a shopping basket, following a recommendation link, and so on. 

The MovieLens dataset is well formatted interactions of Users providing Ratings to Movies (movie ratings are used as the event weight). We will swap out the movielens ratings information with a dummy rating of 1.0 (if user interacted with an item) in order to leverage SAR signature and to showcase simple FBT with only users and items. 

In [2]:
# top k items to recommend
TOP_K = 10

# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

### 1.1 Download and use the MovieLens Dataset

In [3]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=('userID', 'itemID'),
    title_col="Title"
)

# Convert the float precision to 32-bit in order to reduce memory consumption 
data['rating'] = np.float32(1.0)
data.head()

100%|██████████| 4.81k/4.81k [00:03<00:00, 1.27kKB/s]


Unnamed: 0,userID,itemID,Title,rating
0,196,242,Kolya (1996),1.0
1,63,242,Kolya (1996),1.0
2,226,242,Kolya (1996),1.0
3,154,242,Kolya (1996),1.0
4,306,242,Kolya (1996),1.0


### 1.2 Split the data using the python random splitter provided in utilities:

We split the full dataset into a `train` and `test` dataset to evaluate performance of the algorithm against a held-out set not seen during training. Because FBT generates recommendations based on user preferences, all users that are in the test set must also exist in the training set. For this case, we can use the provided `python_stratified_split` function which holds out a percentage (in this case 25%) of items from each user, but ensures all users are in both `train` and `test` datasets. Other options are available in the `dataset.python_splitters` module which provide more control over how the split occurs.

In [4]:
train, test = python_stratified_split(data, ratio=0.75, col_user='userID', col_item='itemID', seed=42)

In [5]:
print("""
Train:
Total Ratings: {train_total}
Unique Users: {train_users}
Unique Items: {train_items}

Test:
Total Ratings: {test_total}
Unique Users: {test_users}
Unique Items: {test_items}
""".format(
    train_total=len(train),
    train_users=len(train['userID'].unique()),
    train_items=len(train['itemID'].unique()),
    test_total=len(test),
    test_users=len(test['userID'].unique()),
    test_items=len(test['itemID'].unique()),
))


Train:
Total Ratings: 74992
Unique Users: 943
Unique Items: 1601

Test:
Total Ratings: 25008
Unique Users: 943
Unique Items: 1532



# 2 Train the FBT Model

### 2.1 Instantiate the SAR algorithm and set the index

We will leverage the single node implementation of SAR and specify the column names to match our dataset. Since we don't need timestamps, we will not use it and turn off timestamp related flags. (`timedecay_formula = False`) We will also switch `normalize = False` which is associated to renormalizing ratings predictions obtained form user affinity computation to an appropriate scale.

Other options are specified to control the behavior of the algorithm as described in the [deep dive notebook](../02_model_collaborative_filtering/sar_deep_dive.ipynb).

In [6]:
logging.basicConfig(level=logging.DEBUG, 
                    format='%(asctime)s %(levelname)-8s %(message)s')

model = SAR(
    col_user="userID",
    col_item="itemID",
    col_rating="rating",
    similarity_type="jaccard", 
    time_decay_coefficient=30, 
    timedecay_formula=False,
    normalize=False
)

### 2.2 Train the FBT model on our training data, and get the top-k recommendations for our testing data

To train (fit) a model, FBT computes an item-to-item ***co-occurence matrix***. Co-occurence represents the number of times two items appear together for any given user. With SAR, we have the option to rescale the co-occurence matrix to an ***item similarity matrix*** by a given metric (Jaccard similarity for example). We will not do this with our vanilla FBT example. 

SAR also computes an ***affinity matrix*** to capture the strength of the relationship between each user and each item. Affinity is driven by different types (like *rating* or *viewing* a movie). Since for our example, any interaction is Boolean, the affinity matrix is identity.

Recommendations are achieved by computing a cooccurrence score per user and course and filtering the ***top-k*** results for each user in the `recommend_k_items` function seen below.

In [7]:
with Timer() as train_time:
    model.fit(train)

print("Took {} seconds for training.".format(train_time.interval))

2021-05-07 21:11:00,089 INFO     Collecting user affinity matrix
2021-05-07 21:11:00,091 INFO     De-duplicating the user-item counts
2021-05-07 21:11:00,097 INFO     Creating index columns
2021-05-07 21:11:00,183 INFO     Building user affinity sparse matrix
2021-05-07 21:11:00,188 INFO     Calculating item co-occurrence
2021-05-07 21:11:00,331 INFO     Calculating item similarity
2021-05-07 21:11:00,332 INFO     Using jaccard based similarity
2021-05-07 21:11:00,361 INFO     Done training
Took 0.27520909799204674 seconds for training.


In [18]:
with Timer() as test_time:
    top_k = model.recommend_k_items(test, remove_seen=True)

print("Took {} seconds for prediction.".format(test_time.interval))

2021-05-07 21:11:53,158 INFO     Calculating recommendation scores
2021-05-07 21:11:53,183 INFO     Removing seen items
Took 0.05055569099204149 seconds for prediction.


In [19]:
top_k.head()

Unnamed: 0,userID,itemID,prediction
0,1,69,37.659954
1,1,423,37.453304
2,1,403,37.092041
3,1,238,36.293091
4,1,568,36.273216


In [20]:
top_k_with_titles = (
    top_k.join(data[['itemID', 'Title']].drop_duplicates().set_index('itemID'), 
               on='itemID', 
               how='inner')
         .sort_values(by=['userID', 'prediction'], ascending=False)
         .rename(columns={'Title': 'itemTitle'})
)
        
display(top_k_with_titles.head(10))

Unnamed: 0,userID,itemID,prediction,itemTitle
9420,943,82,26.735399,Jurassic Park (1993)
9421,943,403,26.123484,Batman (1989)
9422,943,568,26.019278,Speed (1994)
9423,943,423,25.228394,E.T. the Extra-Terrestrial (1982)
9424,943,393,25.134943,Mrs. Doubtfire (1993)
9425,943,11,24.651745,Seven (Se7en) (1995)
9426,943,89,24.606472,Blade Runner (1982)
9427,943,71,24.545057,"Lion King, The (1994)"
9428,943,202,24.437031,Groundhog Day (1993)
9429,943,95,24.141525,Aladdin (1992)


### 2.3. Evaluate how well FBT performs

We evaluate how well SAR performs for a few common ranking metrics provided in the `python_evaluation` module in reco_utils. We will consider the Mean Average Precision (MAP), Normalized Discounted Cumalative Gain (NDCG), Precision, and Recall for the top-k items per user we computed with SAR. User, item and rating column names are specified in each evaluation method.

In [21]:
eval_map = map_at_k(test, top_k, col_user='userID', col_item='itemID', col_rating='rating', k=TOP_K)
eval_map

0.09602432512049103

In [22]:
eval_ndcg = ndcg_at_k(test, top_k, col_user='userID', col_item='itemID', col_rating='rating', k=TOP_K)
eval_ndcg

0.3507570877288949

In [23]:
eval_precision = precision_at_k(test, top_k, col_user='userID', col_item='itemID', col_rating='rating', k=TOP_K)

In [24]:
eval_recall = recall_at_k(test, top_k, col_user='userID', col_item='itemID', col_rating='rating', k=TOP_K)

In [25]:
eval_rmse = rmse(test, top_k, col_user='userID', col_item='itemID', col_rating='rating')

In [26]:
eval_mae = mae(test, top_k, col_user='userID', col_item='itemID', col_rating='rating')

In [27]:
print("Model:\t",
      "Top K:\t%d" % TOP_K,
      "MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall,
      sep='\n')

Model:	
Top K:	10
MAP:	0.096024
NDCG:	0.350757
Precision@K:	0.308271
Recall@K:	0.169100


In [33]:
# Now let's look at the results for a specific user
user_id = 876

ground_truth = test[test['userID']==user_id].sort_values(by='rating', ascending=False)[:TOP_K]
prediction = model.recommend_k_items(pd.DataFrame(dict(userID=[user_id])), remove_seen=True) 
test_user_movie_watched_prediction = (
    pd.merge(ground_truth, prediction, on=['userID', 'itemID'], how='left')
      .drop(columns=['rating'])
)
display(test_user_movie_watched_prediction.head())

2021-05-07 18:06:43,193 INFO     Calculating recommendation scores
2021-05-07 18:06:43,193 INFO     Removing seen items


Unnamed: 0,userID,itemID,Title,prediction
0,876,604,It Happened One Night (1934),
1,876,187,"Godfather: Part II, The (1974)",
2,876,523,Cool Hand Luke (1967),
3,876,878,That Darn Cat! (1997),
4,876,286,"English Patient, The (1996)",


Above, we see that one of the movies from the test set was recovered by the model's top-k recommendations, however the others were not. Offline evaluations are difficult as they can only use what was seen previously in the test set and may not represent the user's actual preferences across the entire set of items. Adjustments to how the data is split, algorithm is used and hyper-parameters can improve the results here. 

In [47]:
# Record results with papermill for tests - ignore this cell
sb.glue("map", eval_map)
sb.glue("ndcg", eval_ndcg)
sb.glue("precision", eval_precision)
sb.glue("recall", eval_recall)
sb.glue("train_time", train_time.interval)
sb.glue("test_time", test_time.interval)