# Offline Batch Recommender System

In this notebook, we will build a simple offline batch recsys that writes recommendations to Redis for quick retrieval later. The architecture diagram below shows how it comes together from a bird's eye view.

![](./img/OfflineBatchRecsys.png)

## Architecture
Recommender systems commonly have a *multi-stage pipeline*:
1) A fast **Candidate Retrieval Model** quickly truncates the large item catalog to a relevant set of hundreds (or thousands) of items.
2) Filtering is performed to remove undesirable or already-seen items.
3) A finely-tuned deep learning **Ranking Model** (i.e. more powerful) ranks the most likely items that are going to interacted with.
4) Results are ordered and returned to the user.

In this case, we will write recommendations to a key-value store (Redis) such that a client can request these in near "realtime" at a later point.

> WHY? This is especially useful for developers who can't afford the complexity of hosting a live multi-stage recsys OR need to get something up and running *quickly*

In this notebook, we will:

1) Prepare the [**Dataset**](#Dataset-Preparation)
2) Build a [**Candidate Retrieval Model**](#Candidate-Retrieval-Model)
3) Build a [**Ranking Model**](#Ranking-Model)
4) [**Write Recommendations**](#Write-Recommendations-to-Redis) to Redis (offline)
5) [**Fetch Recommendations**](#Fetch-Recommendations-from-Redis) from Redis
6) [**Export Models**](#Export-Models) for later use

*This notebook was created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container and was heavily based on the work done by the NVIDIA Merlin team [here](https://github.com/NVIDIA-Merlin/models/blob/main/examples/05-Retrieval-Model.ipynb)*

## Dataset Preparation

We will use a synthetic dataset that mimicks the [Ali-CCP: Alibaba Click and Conversion Prediction](https://tianchi.aliyun.com/dataset/dataDetail?dataId=408#1) dataset. This allows us to tune it to our exact needs for demonstration/learning purposes.


### Importing Libraries

In [1]:
import os
import logging
import time
import warnings
warnings.filterwarnings('ignore')

import nvtabular as nvt
import merlin.models.tf as mm
import tensorflow as tf

from nvtabular.ops import *

from merlin.datasets.synthetic import generate_data
from merlin.datasets.ecommerce import transform_aliccp
from merlin.models.utils.example_utils import workflow_fit_transform
from merlin.models.utils.dataset import unique_rows_by_features
from merlin.schema.tags import Tags
from merlin.io.dataset import Dataset


# disable INFO and DEBUG logging everywhere
logging.disable(logging.WARNING)

2023-01-13 19:23:36.190654: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-01-13 19:23:37.629389: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-13 19:23:37.631284: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-13 19:23:37.632724: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-13 19:23:38.043617: I tensorflow/core/

### Generate Synthetic Ali-CCP Dataset

In [2]:
# Generate the synthetic data
NUM_ROWS = 1000000
TRAIN_SIZE = 0.7
VALID_SIZE = 0.3

train, valid = generate_data("aliccp-raw", NUM_ROWS, set_sizes=(TRAIN_SIZE, VALID_SIZE))

In [3]:
# Defin output path for data
DATA_DIR = os.environ['PWD'] +"/data/"
OUTPUT_DATA_DIR = os.path.join(DATA_DIR, "processed")
OUTPUT_RETRIEVAL_DATA_DIR = os.path.join(OUTPUT_DATA_DIR, "retrieval")
CATEGORY_TEMP_DIR = os.path.join(DATA_DIR, "categories")

In [4]:
# Define NVTabular Feature Transformation Pipeline

user_id_raw = ["user_id"] >> Rename(postfix='_raw') >> LambdaOp(lambda col: col.astype("int32")) >> TagAsUserFeatures()
item_id_raw = ["item_id"] >> Rename(postfix='_raw') >> LambdaOp(lambda col: col.astype("int32")) >> TagAsItemFeatures()

user_id = ["user_id"] >> Categorify(dtype="int32", out_path=CATEGORY_TEMP_DIR) >> TagAsUserID()
item_id = ["item_id"] >> Categorify(dtype="int32", out_path=CATEGORY_TEMP_DIR) >> TagAsItemID()

item_features = (
    ["item_category", "item_shop", "item_brand"] >> Categorify(dtype="int32", out_path=CATEGORY_TEMP_DIR) >> TagAsItemFeatures()
)

user_features = (
    [
        "user_shops",
        "user_profile",
        "user_group",
        "user_gender",
        "user_age",
        "user_consumption_2",
        "user_is_occupied",
        "user_geography",
        "user_intentions",
        "user_brands",
        "user_categories",
    ]
    >> Categorify(dtype="int32", out_path=CATEGORY_TEMP_DIR)
    >> TagAsUserFeatures()
)

targets = ["click"] >> AddMetadata(tags=[Tags.BINARY_CLASSIFICATION, "target"])

outputs = user_id + item_id + item_features + user_features +  user_id_raw + item_id_raw + targets

# add dropna op to filter rows with nulls
outputs = outputs >> Dropna()

With `transform_aliccp` function, we transform the raw dataset... applying the operators defined in the NVTabular workflow pipeline above. The processed parquet files are saved to the output path.

In [5]:
# Transform data and create files
transform_aliccp((train, valid), OUTPUT_DATA_DIR, nvt_workflow=outputs)

## Candidate Retrieval Model

We will use a **Two-Tower** model to infer a subset of relevant items from large item corpus for a given user. 

A Two-Tower Model consists of item (candidate) and user (query) encoder towers. With two towers, the model can learn representations (embeddings) for queries and candidates separately. 

<img src="https://d3i71xaburhd42.cloudfront.net/8c32706b6af49db5d9cd9217e5196f701e473537/2-Figure1-1.png"  width="30%">

Image from: [Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations](https://www.semanticscholar.org/paper/Mixed-Negative-Sampling-for-Learning-Two-tower-in-Yang-Yi/29f080a1bb6df6f45afd82c443f72da745983bee)

### Feature Engineering for Candidate Retrieval

In [6]:
# Load Datasets from generated files
train_retrieval = Dataset(os.path.join(OUTPUT_DATA_DIR, "train", "*.parquet"))
valid_retrieval = Dataset(os.path.join(OUTPUT_DATA_DIR, "valid", "*.parquet"))

In [7]:
# Define NVTabular Feature Transformation Pipeline

inputs = train_retrieval.schema.column_names

# Select only positive interaction rows where click==1 in the dataset with Filter() operator
outputs = inputs >> Filter(f=lambda df: df["click"] == 1)

# Execute the transformation workflow for both datasets
workflow = nvt.Workflow(outputs)
workflow.fit(train_retrieval)
workflow.transform(train_retrieval).to_parquet(
    output_path=os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "train")
)
workflow.transform(valid_retrieval).to_parquet(
    output_path=os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "valid")
)

*NVTabular exported the schema file, schema.pbtxt a protobuf text file, of our processed dataset. To learn more about the schema object and schema file you can explore [this notebook](https://github.com/NVIDIA-Merlin/models/blob/main/examples/02-Merlin-Models-and-NVTabular-integration.ipynb).*

In [8]:
# Read transformed parquet files as Dataset objects
train_retrieval = Dataset(os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "train", "*.parquet"), part_size="500MB")
valid_retrieval = Dataset(os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "valid", "*.parquet"), part_size="500MB")

Now we can use the `schema` object to define the model inputs. We select features with user and item tags, and exclude raw IDs and target column.

In [9]:
# Create model input schema
schema = train_retrieval.schema.select_by_tag([Tags.ITEM_ID, Tags.USER_ID, Tags.ITEM, Tags.USER]).without(['user_id_raw', 'item_id_raw', 'click'])
train_retrieval.schema = schema
valid_retrieval.schema = schema

In [10]:
# Inspect the schema!
schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged,properties.num_buckets,properties.freq_threshold,properties.max_size,properties.start_index,properties.cat_path,properties.embedding_sizes.cardinality,properties.embedding_sizes.dimension,properties.domain.min,properties.domain.max,properties.domain.name
0,user_id,"(Tags.USER_ID, Tags.ID, Tags.USER, Tags.CATEGO...",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,772.0,66.0,0,771,user_id
1,item_id,"(Tags.ID, Tags.CATEGORICAL, Tags.ITEM_ID, Tags...",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.ite...,754.0,65.0,0,753,item_id
2,item_category,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.ite...,754.0,65.0,0,753,item_category
3,item_shop,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.ite...,754.0,65.0,0,753,item_shop
4,item_brand,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.ite...,754.0,65.0,0,753,item_brand
5,user_shops,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,772.0,66.0,0,771,user_shops
6,user_profile,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,34.0,16.0,0,33,user_profile
7,user_group,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,8.0,16.0,0,7,user_group
8,user_gender,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,3.0,16.0,0,2,user_gender
9,user_age,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,6.0,16.0,0,5,user_age


### Designing Retrieval Model Architecture

The **Two-Tower** model consists of a **User tower** (where all user features are fed) and an **Item tower** (where all item features are fed).

The User tower generates an embedding for the User. Then it computes the positive interaction "score" (likelihood of interaction event) using the dot-product between the User embedding and the Item embedding, in addition to sampled "negative" Items within a batch.

##### About Negative Sampling

Many datasets for recommender systems contain implicit feedback with logs of user interactions like clicks, add-to-cart, purchases, music listening events, rather than explicit ratings that reflects user preferences over items. 

In Merlin Models -- NVIDIA provides some scalable negative sampling algorithms for this Item Retrieval task. In this example, we use the `in-batch` sampling algorithm which uses the items interacted by other users as negatives within the same mini-batch.

In [11]:
# Function to create the two tower retrieval model

def create_two_tower(tower_dim: int, encoder_dim: int, optimizer: str, k: int, tags) -> mm.TwoTowerModelV2:
    # User/Query Tower
    user_schema = schema.select_by_tag(tags.USER)
    # create user (query) tower input block
    user_inputs = mm.InputBlockV2(user_schema)
    # create user (query) encoder block
    query = mm.Encoder(
        user_inputs,
        mm.MLPBlock([encoder_dim, tower_dim], no_activation_last_layer=True)
    )

    # Item/Candidate Tower
    item_schema = schema.select_by_tag(tags.ITEM)
    # create item (candidate) tower input block
    item_inputs = mm.InputBlockV2(item_schema)
    # create item (candidate) encoder block
    candidate = mm.Encoder(
        item_inputs,
        mm.MLPBlock([encoder_dim, tower_dim], no_activation_last_layer=True)
    )
    
    # Build Model Class
    model = mm.TwoTowerModelV2(query, candidate)
    model.compile(
        optimizer=optimizer,
        run_eagerly=False,
        loss="categorical_crossentropy",
        metrics=[mm.RecallAt(k), mm.NDCGAt(k)]
    )
    return model

**Notes:**
- `no_activation_last_layer:` when set True, no activation is used for top hidden layer. Learn more [here](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/b9f4e78a8830fe5afcf2f0452862fb3c0d6584ea.pdf).
- In the `TwoTowerModelV2` function we did not set `negative_samplers` arg. By default, it uses contrastive learning and `in-batch` negative sampling strategy.
- Two metrics are used to judge the quality of the recommendations: **Normalized Discounted Cumulative Gain (NDCG@K)** and **Recall@K**.
    - NDCG@K accounts for rank of the relevant item in the recommendation list and is a more fine-grained metric than HR, which only verifies whether the relevant item is among the top-k items.
    - Recall (Also known as HitRate@K) when there is only one relevant item in the recommendation list. Recall just verifies whether the relevant item is among the top-k items.
- When we set `validation_data=valid` in the `model.fit()`, we compute evaluation metrics on validation set using the negative sampling strategy used for training. 

In [12]:
# Initialize model
retrieval_model = create_two_tower(
    tower_dim=64,
    encoder_dim=128,
    optimizer="adam",
    k=10,
    tags=Tags
)

# Fit model
retrieval_model.fit(train_retrieval, validation_data=valid_retrieval, batch_size=4096, epochs=2)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f83338ff1c0>

### Evaluate the model accuracy

The validation metric values during training are calculated given the positive and negative scores in each batch, and then averaged over batches per epoch. **That means validation metrics are not computed using the entire item catalog.**

To determine the exact accuracy, we need to compute the similarity score between a given query and all possible candidates. Below, by using the `topk_model` we can evaluate the trained retrieval model using the entire item catalog (brute force).

In [13]:
# Create candidate/item features for evaluation
candidate_features = unique_rows_by_features(train_retrieval, Tags.ITEM, Tags.ITEM_ID)

# Here's a sneek peek of the item features data
candidate_features.head()

Unnamed: 0,item_id,item_category,item_shop,item_brand
49,1,1,1,1
63,2,2,2,2
51,3,3,3,3
2,4,4,4,4
31,5,5,5,5


In [14]:
# Convert model to a top_k_encoder
topk_model = retrieval_model.to_top_k_encoder(candidate_features, k=20, batch_size=128)
topk_model.compile(run_eagerly=False)

In [15]:
# Create data loader for validation data
eval_loader = mm.Loader(valid_retrieval, batch_size=1024).map(mm.ToTarget(schema, "item_id"))

# Evaluation
metrics = topk_model.evaluate(eval_loader, return_dict=True)
metrics



{'loss': 0.4654218852519989,
 'recall_at_10': 0.10574013739824295,
 'mrr_at_10': 0.04078298062086105,
 'ndcg_at_10': 0.05588269233703613,
 'map_at_10': 0.04078298062086105,
 'precision_at_10': 0.010574014857411385,
 'regularization_loss': 0.0,
 'loss_batch': 0.5699992775917053}

### Generate top-K recommendations

Let's generate top-K (k=20 in our example) recommendations for a given batch of 8 samples.
- The `to_top_k_encoder()` method uses the item/candidate features dataset to compute and store all item/candidate embeddings in an index.
- The forward method of `topk_model` takes as the query/user features as input, and computes the dot product scores between the given query/user embeddings and all the candidates of the top-k index.
- Then, it returns the top-k (k=20) item ids with the highest scores.

In [16]:
# Create query/user features for evaluation
user_features = unique_rows_by_features(valid_retrieval, Tags.USER, Tags.USER_ID)
user_features.head()

Unnamed: 0,user_id,user_shops,user_profile,user_group,user_gender,user_age,user_consumption_2,user_is_occupied,user_geography,user_intentions,user_brands,user_categories
6227,0,0,29,4,1,2,1,1,2,0,0,0
1,1,1,1,1,1,1,1,1,1,1,1,1
0,2,2,1,1,1,1,1,1,1,2,2,2
41,3,3,1,1,1,1,1,1,1,3,3,3
17,4,4,1,1,1,1,1,1,1,4,4,4


In [17]:
# Check out one batch of 8 Users
loader = mm.Loader(user_features, batch_size=8, shuffle=False)
batch = next(iter(loader))
print(batch[0]['user_id'])

tf.Tensor(
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]], shape=(8, 1), dtype=int32)


The recommended top 20 item ids and scores are returned below from the candidate retrieval model for each of the 8 selected users (from the validation set).

In [18]:
scores, reccommended_item_ids = topk_model(batch[0])

In [19]:
# Recommended Items for the batch of 8 Users
reccommended_item_ids

<tf.Tensor: shape=(8, 20), dtype=int32, numpy=
array([[ 24, 314,  78,   2, 322, 144, 338, 265, 231, 200,  91,  68, 193,
        237, 191, 100,  52, 268,  69, 102],
       [258,  34,  10,   6,  16,   8,  12,  32, 290, 155, 229,   4,  43,
        105,  31,  26,  42, 449,  89,  93],
       [154,   7,   8, 277,  22, 245, 364,   3, 212, 614,  15, 333,  24,
        272,   2,   1, 144, 371, 251, 475],
       [  2,  28, 144, 293,  15, 100, 191, 296, 314, 265, 287,   7,  24,
        281, 342, 372, 413, 424,  68, 322],
       [ 26,  71, 335,   6, 429, 155, 112,   1, 320,   8,  43,   3, 436,
         34,  23,   5, 290, 180, 126, 349],
       [ 37,  11,   8,  23, 142, 259, 115, 216, 614,  70, 651, 370, 225,
        634, 171,  29, 186, 411, 373, 122],
       [ 20, 153,  46, 576,  11, 401,  29, 131, 246,  45,  77, 108, 207,
         19, 203, 615, 135, 309, 391, 439],
       [  9,   5, 263,  12, 249, 151,  10, 139, 152, 264, 235, 392, 173,
         87, 388, 452, 103, 215,   4,  36]], dtype=int32)>

In [20]:
# Recommendation "scores" (higher is better) for each recommended item for the batch of 8 Users
scores

<tf.Tensor: shape=(8, 20), dtype=float32, numpy=
array([[0.05185889, 0.05088262, 0.05043829, 0.049939  , 0.04359741,
        0.04357419, 0.04341645, 0.04305017, 0.04195669, 0.04081468,
        0.04068989, 0.0406682 , 0.04022066, 0.0392919 , 0.03873269,
        0.03855398, 0.03844743, 0.03838059, 0.03832123, 0.03831603],
       [0.08404511, 0.0709916 , 0.07060404, 0.07024504, 0.06460899,
        0.06457698, 0.06397193, 0.06343734, 0.0625853 , 0.05872864,
        0.05808752, 0.05773585, 0.0560155 , 0.05289734, 0.0501873 ,
        0.04955173, 0.04784145, 0.0427202 , 0.0426748 , 0.04224171],
       [0.0498568 , 0.04736774, 0.04562012, 0.03990443, 0.03915498,
        0.03853892, 0.03846797, 0.03740597, 0.03711938, 0.03681709,
        0.03671035, 0.03667554, 0.03652515, 0.03576012, 0.03540184,
        0.03381001, 0.03287342, 0.03243016, 0.03212929, 0.03158545],
       [0.10922823, 0.08259409, 0.06778173, 0.06300528, 0.0629309 ,
        0.05904691, 0.05811433, 0.05744144, 0.05609805, 0.053072

At this point - we now have a trained **Candidate Retrieval Model**. Input a **User** and find the topK most likely **Items** to be interacted with. These will serve as inputs to the next model in the pipeline...

## Ranking Model

We will use a Deep Learning Recommendation Model [(DLRM)](https://arxiv.org/abs/1906.00091) architecture (published by Facebook/Meta in 2019) to score and rank User/Item pairs. The model was introduced as a personalization deep learning model that uses embeddings to process sparse features that represent categorical data and a multilayer perceptron (MLP) to process dense features, then interacts these features explicitly using the statistical techniques proposed in [here](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5694074). To learn more about DLRM architetcture please visit [this notebook](https://github.com/NVIDIA-Merlin/models/blob/main/examples/04-Exporting-ranking-models.ipynb) in the Merlin Models GH repo.


### Feature Engineering for DLRM

In [21]:
# Define train and valid dataset objects
train_rank = Dataset(os.path.join(OUTPUT_DATA_DIR, "train", "*.parquet"), part_size="500MB")
valid_rank = Dataset(os.path.join(OUTPUT_DATA_DIR, "valid", "*.parquet"), part_size="500MB")

# Define schema object
schema = train_rank.schema.without(['user_id_raw', 'item_id_raw'])

In [22]:
# Inspect schema - DLRM takes all of these features as input!
schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged,properties.num_buckets,properties.freq_threshold,properties.max_size,properties.start_index,properties.cat_path,properties.embedding_sizes.cardinality,properties.embedding_sizes.dimension,properties.domain.min,properties.domain.max,properties.domain.name
0,user_id,"(Tags.USER_ID, Tags.ID, Tags.USER, Tags.CATEGO...",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,772.0,66.0,0,771,user_id
1,item_id,"(Tags.ID, Tags.CATEGORICAL, Tags.ITEM_ID, Tags...",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.ite...,754.0,65.0,0,753,item_id
2,item_category,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.ite...,754.0,65.0,0,753,item_category
3,item_shop,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.ite...,754.0,65.0,0,753,item_shop
4,item_brand,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.ite...,754.0,65.0,0,753,item_brand
5,user_shops,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,772.0,66.0,0,771,user_shops
6,user_profile,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,34.0,16.0,0,33,user_profile
7,user_group,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,8.0,16.0,0,7,user_group
8,user_gender,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,3.0,16.0,0,2,user_gender
9,user_age,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/workdir/data/categories/categories/unique.use...,6.0,16.0,0,5,user_age


In [23]:
# Target column here is "click" (binary classification)
target_column = schema.select_by_tag(Tags.TARGET).column_names[0]
target_column

'click'

### Building DLRM

In [24]:
def create_dlrm(optimizer: str, schema, target_column: str, embedding_dim: int):
    model = mm.DLRMModel(
        schema,
        embedding_dim=embedding_dim,
        bottom_block=mm.MLPBlock([128, 64]),
        top_block=mm.MLPBlock([128, 64, 32]),
        prediction_tasks=mm.BinaryClassificationTask(target_column),
    )
    model.compile(optimizer=optimizer, run_eagerly=False, metrics=[tf.keras.metrics.AUC()])
    return model


In [25]:
dlrm = create_dlrm(
    optimizer="adam",
    schema=schema,
    target_column=target_column,
    embedding_dim=64
)

dlrm.fit(train_rank, validation_data=valid_rank, batch_size=16 * 1024, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f8312a58d30>

In [26]:
# Check out one batch of users
loader = mm.Loader(valid_rank, batch_size=8, shuffle=False)
batch = next(iter(loader))
print(batch[0]['user_id'], batch[0]['item_id'])

tf.Tensor(
[[  2]
 [  1]
 [  5]
 [ 13]
 [168]
 [ 25]
 [ 95]
 [  3]], shape=(8, 1), dtype=int32) tf.Tensor(
[[142]
 [ 15]
 [ 11]
 [  6]
 [ 31]
 [ 39]
 [ 20]
 [ 17]], shape=(8, 1), dtype=int32)


In [27]:
# Test the DLRM model I/O with the user batch
dlrm(batch[0])

<tf.Tensor: shape=(8, 1), dtype=float32, numpy=
array([[0.50545496],
       [0.4798867 ],
       [0.50957555],
       [0.5093797 ],
       [0.4888947 ],
       [0.51890856],
       [0.52987057],
       [0.48951036]], dtype=float32)>

### Test the entire pipeline

So now that we have a **Candidate Retrieval Model** and a **Ranking Model** (DLRM) we can test the end to end flow locally. In order to do this live, like in the next multi-stage section, you need a **Feature Store** (Redis/Feast), **ANN Index** (Redis), and means to host/serve the entire DAG of operations (Triton).

*Below is a very brute force approach for a single batch of users.*

In [28]:
# Test full pipeline
import numpy as np

# Reload Dataset
train_retrieval = Dataset(os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "train", "*.parquet"), part_size="500MB")
schema = train_retrieval.schema.select_by_tag([Tags.ITEM_ID, Tags.USER_ID, Tags.ITEM, Tags.USER]).without(['click'])
train_retrieval.schema = schema

# User and Item Offline Feature "Stores"
item_fs = unique_rows_by_features(train_retrieval, Tags.ITEM, Tags.ITEM_ID).to_ddf().compute()
user_fs = unique_rows_by_features(train_retrieval, Tags.USER, Tags.USER_ID).to_ddf().compute()

In [29]:
# User batch loader
user_loader = mm.Loader(unique_rows_by_features(train_retrieval, Tags.USER, Tags.USER_ID), batch_size=8, shuffle=False)

# Sample User batch
user_batch = next(iter(user_loader))
users = user_batch[0]['user_id']
users

<tf.Tensor: shape=(8, 1), dtype=int32, numpy=
array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]], dtype=int32)>

In [30]:
# Retrieve candidate Items for each User
_, candidate_item_ids = topk_model(user_batch[0])
candidate_item_ids

<tf.Tensor: shape=(8, 20), dtype=int32, numpy=
array([[258,  34,  10,   6,  16,   8,  12,  32, 290, 155, 229,   4,  43,
        105,  31,  26,  42, 449,  89,  93],
       [154,   7,   8, 277,  22, 245, 364,   3, 212, 614,  15, 333,  24,
        272,   2,   1, 144, 371, 251, 475],
       [  2,  28, 144, 293,  15, 100, 191, 296, 314, 265, 287,   7,  24,
        281, 342, 372, 413, 424,  68, 322],
       [ 26,  71, 335,   6, 429, 155, 112,   1, 320,   8,  43,   3, 436,
         34,  23,   5, 290, 180, 126, 349],
       [ 37,  11,   8,  23, 142, 259, 115, 216, 614,  70, 651, 370, 225,
        634, 171,  29, 186, 411, 373, 122],
       [ 20, 153,  46, 576,  11, 401,  29, 131, 246,  45,  77, 108, 207,
         19, 203, 615, 135, 309, 391, 439],
       [  9,   5, 263,  12, 249, 151,  10, 139, 152, 264, 235, 392, 173,
         87, 388, 452, 103, 215,   4,  36],
       [ 11, 576, 153,  53,  30, 174,  64, 109, 218, 338, 480, 191, 406,
        409, 316,  29, 527, 309,   9,  20]], dtype=int32)>

In [31]:
# Softmax sample function for ordering
def softmax_sample(recs: np.array, scores: np.array) -> np.array:
    arr = np.exp(scores)/sum(np.exp(scores))
    top_item_idx = (-arr).argsort()
    return recs[top_item_idx]

# For each user + candidate items, score with the DLRM
for user, candidates in zip(users.numpy(), candidate_item_ids.numpy()):
    
    num_recs = len(candidates)
    user_id = user[0]
    
    # Pull user features from offline feature store
    user_features = user_fs[user_fs.user_id == user_id]
    raw_user_id = user_features.user_id_raw.to_numpy()[0]
    user_features = user_features.append([user_features]*(num_recs-1), ignore_index=True)

    # Pull item features from offline feature store
    item_features = item_fs[item_fs.item_id.isin(candidates)].reset_index(drop=True)
    raw_item_ids = item_features.item_id_raw.to_numpy()
    
    # combined features
    item_features[user_features.columns] = user_features
    item_features = Dataset(item_features)
    item_features.schema = schema.without(['click'])
    
    # Score with DLRM - TODO -- can we do this without the Loader???
    inputs = mm.Loader(item_features, batch_size=num_recs)
    inputs = next(iter(inputs))
    scores = dlrm(inputs[0]).numpy().reshape(-1)
    recs = softmax_sample(raw_item_ids, scores)

In [32]:
# Look at one of the recommendations for a User
recs

array([111, 448, 171, 287,  31,  14, 322,   2,  64, 676,  12, 302,  53,
       579,  30, 368, 190, 154, 488, 229], dtype=int32)

## Write Recommendations to Redis

Redis is used (low latency k-v store) to persist recommendations for each User.


In [33]:
import asyncio
import redis.asyncio as redis
from redis.commands.json.path import Path

def generate_recs(topk_model, dlrm, input_data, batch_size: int, tags):
    user_loader = mm.Loader(unique_rows_by_features(input_data, tags.USER, tags.USER_ID), batch_size=batch_size, shuffle=False)
    # Load a batch
    for batch in user_loader:
        
        users = batch[0]['user_id']
        
        # Generate candidates per user
        _, candidate_item_ids = topk_model(batch[0])
        
        # For each user + candidate items, score with the DLRM
        for user, candidates in zip(users.numpy(), candidate_item_ids.numpy()):
            try:
                num_recs = len(candidates)
                user_id = user[0]

                # Pull user features from feature store
                user_features = user_fs[user_fs.user_id == user_id]
                raw_user_id = user_features.user_id_raw.to_numpy()[0]
                user_features = user_features.append([user_features]*(num_recs-1), ignore_index=True)

                # Pull item features from feature store
                item_features = item_fs[item_fs.item_id.isin(candidates)].reset_index(drop=True)
                raw_item_ids = item_features.item_id_raw.to_numpy()

                # combined features
                item_features[user_features.columns] = user_features
                item_features = Dataset(item_features)
                item_features.schema = schema.without(['click'])

                # Score with DLRM - TODO -- can we do this without the Loader???
                inputs = mm.Loader(item_features, batch_size=num_recs)
                inputs = next(iter(inputs))
                scores = dlrm(inputs[0]).numpy().reshape(-1)

                # Rank
                recs = softmax_sample(raw_item_ids, scores)

                yield raw_user_id, recs
            except Exception as e:
                logging.info(user_id, str(e))

In [34]:
# Test Recommendation Generator
next(generate_recs(topk_model, dlrm, train_retrieval, 32, Tags))

# SEE BELOW: User ID --> Recommended IDs

(7,
 array([ 13,  91,  32,   1,  42,   4, 106, 456, 245,  43, 264,  34,  18,
         88,  15,   9,  27,  10, 155, 269], dtype=int32))

In [35]:
async def store_recommendations(rec_gen, redis_conn: redis.Redis):
    """
    Store recommendations generated for each User.
    """
    async def store(user_id: str, recs: list):
        """
        Store and individual User's latest recommendations in Redis.
        """
        entry = {
            "user_id": int(user_id),
            "recommendations": [int(rec) for rec in recs]
        }
        # Set the JSON object in Redis
        await redis_conn.json().set(f"USER:{user_id}", Path.root_path(), entry)
    # Write to Redis
    for user_id, recs in top_recs_per_user:
        await store(user_id, recs)

In [36]:
redis_conn = redis.Redis(
    host="redis-inference-store",
    port=6379,
    decode_responses=True
)

# Create Recommendation Denerator
top_recs_per_user = generate_recs(topk_model, dlrm, train_retrieval, 32, Tags)

# Run the process - may take a few minutes
await store_recommendations(top_recs_per_user, redis_conn=redis_conn)

## Fetch Recommendations from Redis

Now that you've written a list of recommended item IDs for each User - we can use the Redis CLI to take a peek at one example. Your application can now fetch these when needed (at low latency).

In [37]:
!redis-cli -h redis-inference-store -p 6379 JSON.GET USER:1

"{\"user_id\":1,\"recommendations\":[219,392,174,143,262,57,30,77,14,117,123,12,173,375,5,157,114,503,266,204]}"


In [38]:
# Simple benchmark with async Python client
times = []
for i in range(300):
    t = time.time()
    await redis_conn.json().get(f"USER:{i}")
    times.append(time.time()-t)

In [39]:
# Average read time from Redis --> less than a ms likely
np.average(times)

0.00019850492477416993

## Export Models

So far we have trained and evaluated our models for the recsys pipeline. We need to export the models and artifacts in order to deploy them "online" in the next notebook.

### Export Retrieval Model

We are able to save the user tower model as a TF model to disk. The user tower model is needed to generate a user embedding vector when a user feature vector <i>x</i> is fed into that model.

In [40]:
query_tower = retrieval_model.query_encoder
query_tower.save(os.path.join(DATA_DIR, "query_tower"))

## we can load back the saved model via the following script.
#query_tower_loaded = tf.keras.models.load_model(os.path.join(DATA_DIR, 'query_tower'))

### Export User features

With `unique_rows_by_features` utility function we can easily extract both unique user and item features tables as cuDF dataframes. Note that for user features table, we use `USER` and `USER_ID` tags. These will get stored in our Feature Store later on.

In [41]:
user_features = (
    unique_rows_by_features(train_retrieval, Tags.USER, Tags.USER_ID).compute().reset_index(drop=True)
)
user_features.head()

Unnamed: 0,user_id,user_shops,user_profile,user_group,user_gender,user_age,user_consumption_2,user_is_occupied,user_geography,user_intentions,user_brands,user_categories,user_id_raw
0,1,1,1,1,1,1,1,1,1,1,1,1,7
1,2,2,1,1,1,1,1,1,1,2,2,2,8
2,3,3,1,1,1,1,1,1,1,3,3,3,6
3,4,4,1,1,1,1,1,1,1,4,4,4,9
4,5,5,1,1,1,1,1,1,1,5,5,5,5


In [42]:
# save to disk
user_features.to_parquet(os.path.join(DATA_DIR, "user_features.parquet"))

### Export User Embeddings

In [43]:
queries = retrieval_model.query_embeddings(Dataset(user_features, schema=schema), batch_size=1024, index=Tags.USER_ID)
query_embs_df = queries.compute(scheduler="synchronous").reset_index()

In [44]:
query_embs_df.head()

Unnamed: 0,user_id,0,1,2,3,4,5,6,7,8,...,54,55,56,57,58,59,60,61,62,63
0,1,0.086162,0.026745,-0.123944,0.014076,0.045792,0.000901,0.017545,0.045882,0.059931,...,0.036983,0.000207,-0.015586,0.019044,0.045006,0.076797,-0.08192,0.020308,-0.078107,-0.061659
1,2,0.000388,0.049235,-0.084484,0.060385,0.035075,0.029591,0.046608,-0.022123,0.007028,...,-0.043972,0.050407,-0.013752,0.048218,0.024069,0.02187,-0.065556,0.030787,0.009482,-0.010763
2,3,0.037499,0.021247,-0.149984,0.016978,-0.011332,0.012765,-0.004894,-0.030394,0.092035,...,-0.017439,0.091743,0.028586,0.007643,-0.013833,0.017256,-0.047512,-0.06276,-0.075953,-0.131814
3,4,0.114573,0.110715,-0.051466,0.127951,0.024169,-0.067083,0.000945,0.002651,0.017218,...,-0.060393,-0.037879,0.009158,-0.052927,0.027579,0.009222,0.000308,0.061314,0.02356,0.033158
4,5,-0.036998,0.069898,-0.13894,0.025963,-0.008821,-0.066008,-0.005975,-0.062152,-0.036975,...,-0.006486,-0.019937,-0.02411,0.067773,0.095708,0.011955,-0.004036,0.011086,0.025387,0.00261


### Export Item Features

In [45]:
item_features = (
    unique_rows_by_features(train_retrieval, Tags.ITEM, Tags.ITEM_ID).compute().reset_index(drop=True)
)
item_features.head()

Unnamed: 0,item_id,item_category,item_shop,item_brand,item_id_raw
0,1,1,1,1,6
1,2,2,2,2,7
2,3,3,3,3,8
3,4,4,4,4,9
4,5,5,5,5,5


In [46]:
# save to disk
item_features.to_parquet(os.path.join(DATA_DIR, "item_features.parquet"))

### Export Item Embeddings
These will get stored in our ANN Index (in Redis).

In [47]:
item_embs = retrieval_model.candidate_embeddings(Dataset(item_features, schema=schema), batch_size=1024, index=Tags.ITEM_ID)
item_embs_df = item_embs.compute(scheduler="synchronous")
item_embs_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,54,55,56,57,58,59,60,61,62,63
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.034978,-0.006806,0.013266,0.029613,-0.005842,0.017935,-0.00058,0.030151,-0.028896,0.012738,...,-0.082827,-0.042844,-0.019296,0.016262,0.034285,0.011797,-0.01443,-0.062068,0.100875,-0.03034
2,0.024937,-0.056926,-0.030055,0.01336,-0.054057,0.043873,0.036582,-0.005764,0.027453,0.035188,...,-0.079094,0.082841,0.058747,-0.075936,-0.024765,0.00904,-0.003463,0.008759,-0.042019,-0.064045
3,0.021218,-0.020515,-0.022223,0.016823,-0.055377,-0.002074,0.030507,0.019177,-0.01749,0.031237,...,-0.045328,-0.033176,-0.058325,-0.011805,0.029296,-0.025039,0.003592,0.039303,0.002172,0.044691
4,-0.008819,-0.069213,-0.052063,0.014789,0.018065,-0.023755,-0.045055,-0.012592,-0.023765,0.028149,...,-0.044515,-0.001543,-0.01011,0.01117,0.074124,-0.026557,0.00613,-0.007254,-0.003723,-0.006843
5,0.008794,0.06521,0.002912,0.037226,-0.052118,-0.082463,-0.029555,0.028421,-0.005797,0.04333,...,-0.00379,-0.005912,0.070558,0.027136,0.061371,-0.006306,0.004901,0.064061,-0.035719,-0.038692


In [48]:
# save to disk
item_embs_df.to_parquet(os.path.join(DATA_DIR, "item_embeddings.parquet"))

### Export DLRM

In [49]:
dlrm.save(os.path.join(DATA_DIR, "dlrm"))

## Conclusion

No we have learned how to train and evaluate your Two-Tower **Candidate Retrieval Model** and **DLRM**. We also wrote recommendations "offline" to a key-value store (Redis) for use in our applications.

![img](./img/OfflineBatchRecsys.png)

*This completes the **"Offline Batch Recommender System"** intro with Redis + NVIDIA*. In the next tutorial, we will focus on deploying these assets to the [Triton Inference Server](https://github.com/triton-inference-server/server) for live "online" recommendations.