# Offline Batch Recommender System

In this notebook, we will build a system to generate recommendations "offline". This means that at some interval (like a cron job) the system writes recommendations to a database (Redis) for later retrieval. The architecture diagram below shows how it comes together from a bird's eye view.

![](../assets/OfflineBatchRecsys.png)

## 1.0 - Architecture

A multi-stage pipeline for a recommender system is a common approach to efficiently retrieve relevant items from a large catalog. The pipeline includes several stages:

1) A fast **Candidate Retrieval Model** quickly truncates the large item catalog to a relevant set of hundreds (or thousands) of items.
2) Filtering is performed to remove undesirable or already-seen items.
3) A finely-tuned deep learning **Ranking Model** (i.e. more powerful) ranks the most likely items that are going to interacted with.
4) Results are ordered and returned to the user.

Writing the recommendations to a key-value store like Redis allows developers to access the recommendations in near real-time at a later point without the complexity of hosting a live multi-stage recommendation system. This can be especially useful for developers who can't afford the complexity of hosting a live multi-stage recsys or need to get something up and running quickly. Additionally, by saving the recommendations in a key-value store rather than serving the models directly, this system can also save on the cost of hosting the model. The next notebook in this series will dive into when "online" recommendation systems are valuable using the same models created below.

In this notebook, we will:

1) Prepare the [**Dataset**](#2.0---Dataset-Preparation)
2) Build a [**Candidate Retrieval Model**](#3.0---Candidate-Retrieval-Model)
3) Build a [**Ranking Model**](#4.0---Ranking-Model)
4) [**Write Recommendations**](#5.0---Write-Recommendations-to-Redis) to Redis (offline)
5) [**Fetch Recommendations**](#6.0---Fetch-Recommendations-from-Redis) from Redis
6) [**Export Models**](#7.0---Export-Models) for later use

*This notebook was created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container and was heavily based on the work done by the NVIDIA Merlin team [here](https://github.com/NVIDIA-Merlin/models/blob/main/examples/05-Retrieval-Model.ipynb)*

## 2.0 - Dataset Preparation

We will use a synthetic dataset that mimicks the [Ali-CCP: Alibaba Click and Conversion Prediction](https://tianchi.aliyun.com/dataset/dataDetail?dataId=408#1) dataset. This allows us to tune it to our exact needs for demonstration/learning purposes.


### 2.1 - Importing Libraries

In [1]:
import os
import logging
import time
import warnings
warnings.filterwarnings('ignore')

import nvtabular as nvt
import merlin.models.tf as mm
import tensorflow as tf

from nvtabular.ops import *

from merlin.datasets.synthetic import generate_data
from merlin.datasets.ecommerce import transform_aliccp
from merlin.models.utils.example_utils import workflow_fit_transform
from merlin.models.utils.dataset import unique_rows_by_features
from merlin.schema.tags import Tags
from merlin.io.dataset import Dataset


# disable INFO and DEBUG logging everywhere
logging.disable(logging.WARNING)

2023-01-21 02:31:29.856298: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 02:31:29.856734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 02:31:29.856847: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 02:31:30.142625: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate 

### 2.2 Generate and Process Synthetic Ali-CCP Dataset

The Merlin ecosystem built by NVIDIA, provides a number of pre-built datasets that come with convience functions for generating and processing them for use in recommendation systems.

With `transform_aliccp` function, we transform the raw dataset... applying the operators defined in the NVTabular workflow pipeline above. The processed parquet files are saved to the output path.

This pipeline is used to process the data and prepare it for use in a recommendation system by converting columns to appropriate types, categorifying features, tagging features and targets, and removing null values.

In [2]:
# Generate the synthetic data
NUM_ROWS = 1000000
TRAIN_SIZE = 0.7
VALID_SIZE = 0.3

train, valid = generate_data("aliccp-raw", NUM_ROWS, set_sizes=(TRAIN_SIZE, VALID_SIZE))

In [3]:
# Define output path for data
DATA_DIR = "/model-data/aliccp"
OUTPUT_DATA_DIR = os.path.join(DATA_DIR, "processed")
OUTPUT_RETRIEVAL_DATA_DIR = os.path.join(OUTPUT_DATA_DIR, "retrieval")
CATEGORY_TEMP_DIR = os.path.join(DATA_DIR, "categories")

In [4]:
# Define NVTabular Feature Transformation Pipeline

user_id_raw = ["user_id"] >> Rename(postfix='_raw') >> LambdaOp(lambda col: col.astype("int32")) >> TagAsUserFeatures()
item_id_raw = ["item_id"] >> Rename(postfix='_raw') >> LambdaOp(lambda col: col.astype("int32")) >> TagAsItemFeatures()

user_id = ["user_id"] >> Categorify(dtype="int32", out_path=CATEGORY_TEMP_DIR) >> TagAsUserID()
item_id = ["item_id"] >> Categorify(dtype="int32", out_path=CATEGORY_TEMP_DIR) >> TagAsItemID()

item_features = (
    ["item_category", "item_shop", "item_brand"] >> Categorify(dtype="int32", out_path=CATEGORY_TEMP_DIR) >> TagAsItemFeatures()
)

user_features = (
    [
        "user_shops",
        "user_profile",
        "user_group",
        "user_gender",
        "user_age",
        "user_consumption_2",
        "user_is_occupied",
        "user_geography",
        "user_intentions",
        "user_brands",
        "user_categories",
    ]
    >> Categorify(dtype="int32", out_path=CATEGORY_TEMP_DIR)
    >> TagAsUserFeatures()
)

targets = ["click"] >> AddMetadata(tags=[Tags.BINARY_CLASSIFICATION, "target"])

outputs = user_id + item_id + item_features + user_features +  user_id_raw + item_id_raw + targets

# add dropna op to filter rows with nulls
outputs = outputs >> Dropna()

In [5]:
# Transform data and create files
transform_aliccp((train, valid), OUTPUT_DATA_DIR, nvt_workflow=outputs)

## 3.0 - Candidate Retrieval Model

We will use a **Two-Tower** model to infer a subset of relevant items from large item corpus for a given user. 

A Two-Tower recommendation system is a type of recommendation system that uses two neural network architectures, or "towers," to generate recommendations. One tower, called the "user tower," is used to model the user's preferences, while the other tower, called the "item tower," is used to model the characteristics of the items being recommended.

The two towers are typically trained separately, but their outputs are combined to generate recommendations. The user tower produces a user representation vector, while the item tower produces an item representation vector. These vectors are then used to compute the similarity between the user and the items, which is used to rank the items and generate recommendations.

Two-Tower recommendation systems are good for several use cases, such as:

1. Handling large scale recommendation systems with millions of items and users by providing an efficient way to model users and items individually.
2. Incorporating both user-specific and item-specific information to generate more accurate recommendations.
3. Handling cold-start problem by modeling users and items separately, the system can make recommendations for new users or new items without the need of historical interactions
4. Handling both explicit and implicit feedback, as it can be trained on both types of data.

In summary, Two-Tower recommendation systems are a powerful approach for generating recommendations by modeling users and items separately and combining their outputs to generate more accurate recommendations.

<img src="https://d3i71xaburhd42.cloudfront.net/8c32706b6af49db5d9cd9217e5196f701e473537/2-Figure1-1.png"  width="30%">

Image from: [Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations](https://www.semanticscholar.org/paper/Mixed-Negative-Sampling-for-Learning-Two-tower-in-Yang-Yi/29f080a1bb6df6f45afd82c443f72da745983bee)

### 3.1 - Feature Engineering for Candidate Retrieval

In [6]:
# Load Datasets from generated files
train_retrieval = Dataset(os.path.join(OUTPUT_DATA_DIR, "train", "*.parquet"))
valid_retrieval = Dataset(os.path.join(OUTPUT_DATA_DIR, "valid", "*.parquet"))

In [7]:
# Define NVTabular Feature Transformation Pipeline

inputs = train_retrieval.schema.column_names

# Select only positive interaction rows where click==1 in the dataset with Filter() operator
outputs = inputs >> Filter(f=lambda df: df["click"] == 1)

# Execute the transformation workflow for both datasets
workflow = nvt.Workflow(outputs)
workflow.fit(train_retrieval)
workflow.transform(train_retrieval).to_parquet(
    output_path=os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "train")
)
workflow.transform(valid_retrieval).to_parquet(
    output_path=os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "valid")
)

*NVTabular exported the schema file, schema.pbtxt a protobuf text file, of our processed dataset. To learn more about the schema object and schema file you can explore [this notebook](https://github.com/NVIDIA-Merlin/models/blob/main/examples/02-Merlin-Models-and-NVTabular-integration.ipynb).*

In [8]:
# Read transformed parquet files as Dataset objects
train_retrieval = Dataset(os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "train", "*.parquet"), part_size="500MB")
valid_retrieval = Dataset(os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "valid", "*.parquet"), part_size="500MB")

Now we can use the `schema` object to define the model inputs. We select features with user and item tags, and exclude raw IDs and target column.

In [9]:
# Create model input schema
schema = train_retrieval.schema.select_by_tag([Tags.ITEM_ID, Tags.USER_ID, Tags.ITEM, Tags.USER]).without(['user_id_raw', 'item_id_raw', 'click'])
train_retrieval.schema = schema
valid_retrieval.schema = schema

In [10]:
# Inspect the schema!
schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged,properties.num_buckets,properties.freq_threshold,properties.max_size,properties.start_index,properties.cat_path,properties.embedding_sizes.cardinality,properties.embedding_sizes.dimension,properties.domain.min,properties.domain.max,properties.domain.name
0,user_id,"(Tags.CATEGORICAL, Tags.USER, Tags.USER_ID, Ta...",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,753.0,65.0,0,752,user_id
1,item_id,"(Tags.ITEM, Tags.CATEGORICAL, Tags.ID, Tags.IT...",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,754.0,65.0,0,753,item_id
2,item_category,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,754.0,65.0,0,753,item_category
3,item_shop,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,754.0,65.0,0,753,item_shop
4,item_brand,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,754.0,65.0,0,753,item_brand
5,user_shops,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,753.0,65.0,0,752,user_shops
6,user_profile,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,51.0,16.0,0,50,user_profile
7,user_group,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,10.0,16.0,0,9,user_group
8,user_gender,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,3.0,16.0,0,2,user_gender
9,user_age,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,7.0,16.0,0,6,user_age


### 3.2 - Designing Retrieval Model Architecture

The **Two-Tower** model consists of a **User tower** (where all user features are fed) and an **Item tower** (where all item features are fed).

The User tower generates an embedding for the User. Then it computes the positive interaction "score" (likelihood of interaction event) using the dot-product between the User embedding and the Item embedding, in addition to sampled "negative" Items within a batch.

##### About Negative Sampling

Many datasets for recommender systems contain implicit feedback with logs of user interactions like clicks, add-to-cart, purchases, music listening events, rather than explicit ratings that reflects user preferences over items. 

In Merlin Models -- NVIDIA provides some scalable negative sampling algorithms for this Item Retrieval task. In this example, we use the `in-batch` sampling algorithm which uses the items interacted by other users as negatives within the same mini-batch.

In [11]:
# Function to create the two tower retrieval model

def create_two_tower(tower_dim: int, encoder_dim: int, optimizer: str, k: int, tags) -> mm.TwoTowerModelV2:
    # User/Query Tower
    user_schema = schema.select_by_tag(tags.USER)
    # create user (query) tower input block
    user_inputs = mm.InputBlockV2(user_schema)
    # create user (query) encoder block
    query = mm.Encoder(
        user_inputs,
        mm.MLPBlock([encoder_dim, tower_dim], no_activation_last_layer=True)
    )

    # Item/Candidate Tower
    item_schema = schema.select_by_tag(tags.ITEM)
    # create item (candidate) tower input block
    item_inputs = mm.InputBlockV2(item_schema)
    # create item (candidate) encoder block
    candidate = mm.Encoder(
        item_inputs,
        mm.MLPBlock([encoder_dim, tower_dim], no_activation_last_layer=True)
    )
    
    # Build Model Class
    model = mm.TwoTowerModelV2(query, candidate)
    model.compile(
        optimizer=optimizer,
        run_eagerly=False,
        loss="categorical_crossentropy",
        metrics=[mm.RecallAt(k), mm.NDCGAt(k)]
    )
    return model

**Notes:**
- `no_activation_last_layer:` when set True, no activation is used for top hidden layer. Learn more [here](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/b9f4e78a8830fe5afcf2f0452862fb3c0d6584ea.pdf).
- In the `TwoTowerModelV2` function we did not set `negative_samplers` arg. By default, it uses contrastive learning and `in-batch` negative sampling strategy.
- Two metrics are used to judge the quality of the recommendations: **Normalized Discounted Cumulative Gain (NDCG@K)** and **Recall@K**.
    - NDCG@K accounts for rank of the relevant item in the recommendation list and is a more fine-grained metric than HR, which only verifies whether the relevant item is among the top-k items.
    - Recall (Also known as HitRate@K) when there is only one relevant item in the recommendation list. Recall just verifies whether the relevant item is among the top-k items.
- When we set `validation_data=valid` in the `model.fit()`, we compute evaluation metrics on validation set using the negative sampling strategy used for training. 

In [12]:
# Initialize model
retrieval_model = create_two_tower(
    tower_dim=64,
    encoder_dim=128,
    optimizer="adam",
    k=10,
    tags=Tags
)

# Fit model
retrieval_model.fit(train_retrieval, validation_data=valid_retrieval, batch_size=4096, epochs=2)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7fdb549578e0>

### 3.3 - Evaluate the model accuracy

The validation metric values during training are calculated given the positive and negative scores in each batch, and then averaged over batches per epoch. **That means validation metrics are not computed using the entire item catalog.**

To determine the exact accuracy, we need to compute the similarity score between a given query and all possible candidates. Below, by using the `topk_model` we can evaluate the trained retrieval model using the entire item catalog (brute force).

In [13]:
# Create candidate/item features for evaluation
candidate_features = unique_rows_by_features(train_retrieval, Tags.ITEM, Tags.ITEM_ID)

# Here's a sneek peek of the item features data
candidate_features.head()

Unnamed: 0,item_id,item_category,item_shop,item_brand
3,1,1,1,1
6,2,2,2,2
38,3,3,3,3
56,4,4,4,4
50,5,5,5,5


In [14]:
# Convert model to a top_k_encoder
topk_model = retrieval_model.to_top_k_encoder(candidate_features, k=20, batch_size=128)
topk_model.compile(run_eagerly=False)

In [15]:
# Create data loader for validation data
eval_loader = mm.Loader(valid_retrieval, batch_size=1024).map(mm.ToTarget(schema, "item_id"))

# Evaluation
metrics = topk_model.evaluate(eval_loader, return_dict=True)
metrics



{'loss': 0.43882274627685547,
 'recall_at_10': 0.09288118034601212,
 'mrr_at_10': 0.03513556718826294,
 'ndcg_at_10': 0.04852905124425888,
 'map_at_10': 0.03513556718826294,
 'precision_at_10': 0.00928812101483345,
 'regularization_loss': 0.0,
 'loss_batch': 0.6025397777557373}

### 3.4 - Generate top-K candidates

Let's generate top-K (k=20 in our example) recommendation candidates for a given batch of 8 samples.
- The `to_top_k_encoder()` method uses the item/candidate features dataset to compute and store all item/candidate embeddings in an index.
- The forward method of `topk_model` takes as the query/user features as input, and computes the dot product scores between the given query/user embeddings and all the candidates of the top-k index.
- Then, it returns the top-k (k=20) item ids with the highest scores.

In [16]:
# Create query/user features for evaluation
user_features = unique_rows_by_features(valid_retrieval, Tags.USER, Tags.USER_ID)
user_features.head()

Unnamed: 0,user_id,user_shops,user_profile,user_group,user_gender,user_age,user_consumption_2,user_is_occupied,user_geography,user_intentions,user_brands,user_categories
3372,0,0,39,5,1,3,2,1,2,0,0,0
18,1,1,1,1,1,1,1,1,1,1,1,1
22,2,2,1,1,1,1,1,1,1,2,2,2
2,3,3,1,1,1,1,1,1,1,3,3,3
11,4,4,1,1,1,1,1,1,1,4,4,4


In [17]:
# Check out one batch of 8 Users
loader = mm.Loader(user_features, batch_size=8, shuffle=False)
batch = next(iter(loader))
print(batch[0]['user_id'])

tf.Tensor(
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]], shape=(8, 1), dtype=int32)


The recommended top 20 item ids and scores are returned below from the candidate retrieval model for each of the 8 selected users (from the validation set).

In [18]:
scores, reccommended_item_ids = topk_model(batch[0])

In [19]:
# Recommended Items for the batch of 8 Users
reccommended_item_ids

<tf.Tensor: shape=(8, 20), dtype=int32, numpy=
array([[  2,  53, 219, 339,  17, 203,  88,   8,  10, 591, 102,   9, 193,
        263, 296, 313, 366, 170, 208, 104],
       [  2, 339,  53,  15,   7,   8, 104, 219, 313,  44, 391,  16, 581,
        367, 122,  36, 131,  12,  49,  88],
       [184, 267, 189,  14,  37, 242,   5,  13,   7,  43, 182, 209,  24,
        323, 469, 344,  61, 315, 202, 346],
       [  5,  39,  12, 353,  71, 384,  98,  47, 182, 496, 276, 226,  37,
         67, 552, 196,   3, 423, 359,  86],
       [  2,   4, 102,  53,   1, 239,  75,   3, 207, 291, 349,  31, 416,
        303,  26, 103, 283, 131, 751, 328],
       [ 11,   5, 421,  16,  37, 496, 226,  39, 448, 274,   7, 201, 251,
        360, 279,  14, 625,  12, 150, 210],
       [182,  71,  20, 384, 184,  61,   3,   5,  14, 315,  25,  39, 271,
        234, 746,  33, 115, 253, 189, 285],
       [ 53, 391,  49,  42, 339,  13,  44,  36,  77, 124,  15, 131,   2,
         91,  64, 300, 335, 388, 177, 367]], dtype=int32)>

In [20]:
# Recommendation "scores" (higher is better) for each recommended item for the batch of 8 Users
scores

<tf.Tensor: shape=(8, 20), dtype=float32, numpy=
array([[0.06132706, 0.05495376, 0.05188668, 0.04804639, 0.04525334,
        0.04467893, 0.04359638, 0.0429577 , 0.04236389, 0.04190419,
        0.04150067, 0.03876395, 0.03788529, 0.03769156, 0.03755951,
        0.03725768, 0.03700415, 0.03667157, 0.03543973, 0.03480223],
       [0.06057974, 0.05873544, 0.05457463, 0.04985515, 0.0474914 ,
        0.04664886, 0.04660673, 0.04630806, 0.04628112, 0.04596485,
        0.04518457, 0.04455723, 0.04452522, 0.04431298, 0.04381118,
        0.04294571, 0.04294293, 0.0429119 , 0.04242484, 0.04143431],
       [0.08359075, 0.08266199, 0.07058696, 0.06940745, 0.06668985,
        0.0660269 , 0.06584951, 0.06411734, 0.06376088, 0.06375296,
        0.06344488, 0.06105455, 0.06002223, 0.05888034, 0.05773782,
        0.05707346, 0.05687609, 0.05232539, 0.05226946, 0.05199999],
       [0.11616069, 0.08919618, 0.08878286, 0.08471525, 0.08420472,
        0.08336139, 0.08041547, 0.07971578, 0.07602289, 0.074058

At this point - we now have a trained **Candidate Retrieval Model**. Input a **User** and find the topK most likely **Items** to be interacted with. These will serve as inputs to the next model in the pipeline...

## 4.0 - Ranking Model

Ranking models are a type of machine learning models that are commonly used in recommendation systems to rank items based on their relevance or likelihood of being interacted with by a user. These models can be used to generate personalized recommendations for individual users by taking into account their preferences and past interactions such as clicks, purchases, or ratings. 

There are several types of ranking models that can be used in recommendation systems, such as:

1. Collaborative Filtering models, which use the past interactions of users to generate recommendations
2. Content-Based models, which generate recommendations based on the characteristics of the items
3. Hybrid models, which combine the above two approaches (we will use this one)

We will use a Deep Learning Recommendation Model [(DLRM)](https://arxiv.org/abs/1906.00091) architecture (published by Facebook/Meta in 2019) to score and rank User/Item pairs. The model was introduced as a personalization deep learning model that uses embeddings to process sparse features that represent categorical data and a multilayer perceptron (MLP) to process dense features, then interacts these features explicitly using the statistical techniques proposed in [here](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5694074). To learn more about DLRM architetcture please visit [this notebook](https://github.com/NVIDIA-Merlin/models/blob/main/examples/04-Exporting-ranking-models.ipynb) in the Merlin Models GH repo.


### 4.1 - Feature Engineering for DLRM

In [21]:
# Define train and valid dataset objects
train_rank = Dataset(os.path.join(OUTPUT_DATA_DIR, "train", "*.parquet"), part_size="500MB")
valid_rank = Dataset(os.path.join(OUTPUT_DATA_DIR, "valid", "*.parquet"), part_size="500MB")

# Define schema object
schema = train_rank.schema.without(['user_id_raw', 'item_id_raw'])

In [22]:
# Inspect schema - DLRM takes all of these features as input!
schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged,properties.num_buckets,properties.freq_threshold,properties.max_size,properties.start_index,properties.cat_path,properties.embedding_sizes.cardinality,properties.embedding_sizes.dimension,properties.domain.min,properties.domain.max,properties.domain.name
0,user_id,"(Tags.CATEGORICAL, Tags.USER, Tags.USER_ID, Ta...",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,753.0,65.0,0,752,user_id
1,item_id,"(Tags.ITEM, Tags.CATEGORICAL, Tags.ID, Tags.IT...",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,754.0,65.0,0,753,item_id
2,item_category,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,754.0,65.0,0,753,item_category
3,item_shop,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,754.0,65.0,0,753,item_shop
4,item_brand,"(Tags.ITEM, Tags.CATEGORICAL)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,754.0,65.0,0,753,item_brand
5,user_shops,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,753.0,65.0,0,752,user_shops
6,user_profile,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,51.0,16.0,0,50,user_profile
7,user_group,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,10.0,16.0,0,9,user_group
8,user_gender,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,3.0,16.0,0,2,user_gender
9,user_age,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,/model-data/aliccp/categories/categories/uniqu...,7.0,16.0,0,6,user_age


In [23]:
# Target column here is "click" (binary classification)
target_column = schema.select_by_tag(Tags.TARGET).column_names[0]
target_column

'click'

### 4.2 - Train DLRM

In [24]:
def create_dlrm(optimizer: str, schema, target_column: str, embedding_dim: int):
    model = mm.DLRMModel(
        schema,
        embedding_dim=embedding_dim,
        bottom_block=mm.MLPBlock([128, 64]),
        top_block=mm.MLPBlock([128, 64, 32]),
        prediction_tasks=mm.BinaryClassificationTask(target_column),
    )
    model.compile(optimizer=optimizer, run_eagerly=False, metrics=[tf.keras.metrics.AUC()])
    return model


In [25]:
dlrm = create_dlrm(
    optimizer="adam",
    schema=schema,
    target_column=target_column,
    embedding_dim=64
)

dlrm.fit(train_rank, validation_data=valid_rank, batch_size=16 * 1024, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7fdb2d8d7880>

In [26]:
# Check out one batch of users
loader = mm.Loader(valid_rank, batch_size=8, shuffle=False)
batch = next(iter(loader))
print(batch[0]['user_id'], batch[0]['item_id'])

tf.Tensor(
[[24]
 [ 5]
 [ 5]
 [ 2]
 [ 1]
 [35]
 [ 3]
 [14]], shape=(8, 1), dtype=int32) tf.Tensor(
[[ 4]
 [ 3]
 [10]
 [ 1]
 [ 1]
 [31]
 [14]
 [ 2]], shape=(8, 1), dtype=int32)


In [27]:
# Test the DLRM model I/O with the user batch
dlrm(batch[0])

<tf.Tensor: shape=(8, 1), dtype=float32, numpy=
array([[0.48569584],
       [0.51287526],
       [0.49173698],
       [0.51630616],
       [0.5096519 ],
       [0.50963724],
       [0.48768458],
       [0.49276185]], dtype=float32)>

### 4.3 - Test Retrieval and Rank together

So now that we have a **Candidate Retrieval Model** and a **Ranking Model** (DLRM) we can test the end to end flow locally. In order to do this live, like in the next multi-stage section, you need a **Feature Store** (Redis/Feast), **ANN Index** (Redis), and means to host/serve the entire DAG of operations (Triton).

*Below is a very brute force approach for a single batch of users.*

In [28]:
# Test full pipeline
import numpy as np

# Reload Dataset
train_retrieval = Dataset(os.path.join(OUTPUT_RETRIEVAL_DATA_DIR, "train", "*.parquet"), part_size="500MB")
schema = train_retrieval.schema.select_by_tag([Tags.ITEM_ID, Tags.USER_ID, Tags.ITEM, Tags.USER]).without(['click'])
train_retrieval.schema = schema

# User and Item Offline Feature "Stores"
item_fs = unique_rows_by_features(train_retrieval, Tags.ITEM, Tags.ITEM_ID).to_ddf().compute()
user_fs = unique_rows_by_features(train_retrieval, Tags.USER, Tags.USER_ID).to_ddf().compute()

In [29]:
# User batch loader
user_loader = mm.Loader(unique_rows_by_features(train_retrieval, Tags.USER, Tags.USER_ID), batch_size=8, shuffle=False)

# Sample User batch
user_batch = next(iter(user_loader))
users = user_batch[0]['user_id']
users

<tf.Tensor: shape=(8, 1), dtype=int32, numpy=
array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]], dtype=int32)>

In [30]:
# Retrieve candidate Items for each User
_, candidate_item_ids = topk_model(user_batch[0])
candidate_item_ids

<tf.Tensor: shape=(8, 20), dtype=int32, numpy=
array([[  2, 339,  53,  15,   7,   8, 104, 219, 313,  44, 391,  16, 581,
        367, 122,  36, 131,  12,  49,  88],
       [184, 267, 189,  14,  37, 242,   5,  13,   7,  43, 182, 209,  24,
        323, 469, 344,  61, 315, 202, 346],
       [  5,  39,  12, 353,  71, 384,  98,  47, 182, 496, 276, 226,  37,
         67, 552, 196,   3, 423, 359,  86],
       [  2,   4, 102,  53,   1, 239,  75,   3, 207, 291, 349,  31, 416,
        303,  26, 103, 283, 131, 751, 328],
       [ 11,   5, 421,  16,  37, 496, 226,  39, 448, 274,   7, 201, 251,
        360, 279,  14, 625,  12, 150, 210],
       [182,  71,  20, 384, 184,  61,   3,   5,  14, 315,  25,  39, 271,
        234, 746,  33, 115, 253, 189, 285],
       [ 53, 391,  49,  42, 339,  13,  44,  36,  77, 124,  15, 131,   2,
         91,  64, 300, 335, 388, 177, 367],
       [  2,   4, 334, 360,  27, 244, 688,  53, 243,  79, 287,   1, 416,
         78,   3, 283,   8, 102, 751,  15]], dtype=int32)>

### 4.4 - Define Functions to run Recommendation Pipeline

Next we define our functions to run the entire pipeline (as tested above).

In [31]:
# Softmax sample function for ordering
def softmax_sample(recs: np.array, scores: np.array) -> np.array:
    arr = np.exp(scores)/sum(np.exp(scores))
    top_item_idx = (-arr).argsort()
    return recs[top_item_idx]

def generate_recommendations_batch(topk_model, dlrm, input_data, batch_size: int, tags):
    user_loader = mm.Loader(unique_rows_by_features(input_data, tags.USER, tags.USER_ID), batch_size=batch_size, shuffle=False)
    
    # Load a batch
    for batch in user_loader:
            
        # Generate candidates for this batch of users
        users = batch[0]['user_id']
        _, candidate_item_ids = topk_model(batch[0])
        
        # For each user + candidate items, score with the DLRM
        for user, candidates in zip(users.numpy(), candidate_item_ids.numpy()):
            try:
                num_recs = len(candidates)
                user_id = user[0]

                # get user features
                user_features = user_fs[user_fs.user_id == user_id]
                raw_user_id = user_features.user_id_raw.to_numpy()[0]
                user_features = user_features.append([user_features]*(num_recs-1), ignore_index=True)

                # get item features
                item_features = item_fs[item_fs.item_id.isin(candidates)].reset_index(drop=True)
                raw_item_ids = item_features.item_id_raw.to_numpy()

                # combine into feature vectors
                item_features[user_features.columns] = user_features
                item_features = Dataset(item_features)
                item_features.schema = schema.without(['click'])

                # Score with ranking model
                inputs = mm.Loader(item_features, batch_size=num_recs)
                inputs = next(iter(inputs))
                scores = dlrm(inputs[0]).numpy().reshape(-1)

                # Rank
                recs = softmax_sample(raw_item_ids, scores)

                yield raw_user_id, recs
            except Exception as e:
                logging.info(user_id, str(e))

In [32]:
# Test Recommendation Generator
next(generate_recommendations_batch(topk_model, dlrm, train_retrieval, 32, Tags))

(7,
 array([ 53, 132, 122,   6,  87,  18, 624,  11,  36,  15,  17, 409, 329,
        363,  49,  44, 330, 104, 222,   4], dtype=int32))

## 5.0 - Write Recommendations to Redis

In a batch-offline recommendation system, recommendations are generated in batches and are not generated in real-time. This means that the system will run on a schedule (e.g. once a day) to generate recommendations for all users at once. Redis is used to store these recommendations in a way that allows for quick retrieval when a user requests their recommendations.

Redis is a high-performance in-memory data store that is well-suited for this use case because it can handle high read and write loads and can retrieve data very quickly. This allows the recommendation system to return recommendations to users in real-time, while still taking advantage of the offline batch processing to generate the recommendations.


In [33]:
# Import needed Redis libraries
import asyncio
import typing as t
import redis.asyncio as redis
from redis.commands.json.path import Path


In [34]:

async def store_recommendations(recommendations: t.Iterable, redis_conn: redis.Redis):
    """
    Store the recommendations generated for each User in Redis.

    Parameters:
        recommendations (t.Iterable): A generator over a dictionary where the keys are user_ids and the values are lists of recommendations
        redis_conn (redis.Redis): A Redis connection object used to store the recommendations in Redis
    """
    async def store_as_json(user_id: str, recs: list):
        """
        Store an individual user's latest recommendations in Redis.

        Parameters:
            user_id (str): The user id of the user whose recommendations are being stored
            recs (list): A list of item_ids representing the recommendations for the user
        """
        entry = {
            "user_id": int(user_id),
            "recommendations": [int(rec) for rec in recs]
        }
        # Set the JSON object in Redis
        await redis_conn.json().set(f"USER:{user_id}", Path.root_path(), entry)
        
    # Write the recommendations to Redis as a JSON object
    for user_id, recs in recommendations:
        await store_as_json(user_id, recs)

In [35]:
redis_conn = redis.Redis(
    host="redis-inference-store",
    port=6379,
    decode_responses=True
)

# Create Recommendation Denerator
recommendations = generate_recommendations_batch(topk_model, dlrm, train_retrieval, 32, Tags)

# Run the process - may take a few minutes
await store_recommendations(recommendations, redis_conn=redis_conn)

## 6.0 - Fetch Recommendations from Redis

Now that you've written a list of recommended item IDs for each User - we can use the Redis CLI to take a peek at one example. Your application can now fetch these when needed (at low latency).

In [36]:
!redis-cli -h redis-inference-store -p 6379 JSON.GET USER:1

"{\"user_id\":1,\"recommendations\":[26,222,377,18,443,12,612,62,289,25,615,14,257,13,459,624,419,19,8,244]}"


In [37]:
# Simple benchmark with async Python client
times = []
for i in range(300):
    t = time.time()
    await redis_conn.json().get(f"USER:{i}")
    times.append(time.time()-t)

In [38]:
# Average read time from Redis --> less than a ms likely
np.average(times)

0.00020427783330281574

## 7.0 - Export Models

So far we have trained and evaluated our models for the recsys pipeline. We need to export the models and artifacts in order to deploy them "online" in the next notebook.

### 7.1 - Export Retrieval Model

We are able to save the user tower model as a TF model to disk. The user tower model is needed to generate a user embedding vector when a user feature vector <i>x</i> is fed into that model.

In [39]:
query_tower = retrieval_model.query_encoder
query_tower.save(os.path.join(DATA_DIR, "query_tower"))

## we can load back the saved model via the following script.
#query_tower_loaded = tf.keras.models.load_model(os.path.join(DATA_DIR, 'query_tower'))

### 7.2 - Export User features

With `unique_rows_by_features` utility function we can easily extract both unique user and item features tables as cuDF dataframes. Note that for user features table, we use `USER` and `USER_ID` tags. These will get stored in our Feature Store later on.

In [40]:
user_features = (
    unique_rows_by_features(train_retrieval, Tags.USER, Tags.USER_ID).compute().reset_index(drop=True)
)
user_features.head()

Unnamed: 0,user_id,user_shops,user_profile,user_group,user_gender,user_age,user_consumption_2,user_is_occupied,user_geography,user_intentions,user_brands,user_categories,user_id_raw
0,1,1,1,1,1,1,1,1,1,1,1,1,7
1,2,2,1,1,1,1,1,1,1,2,2,2,8
2,3,3,1,1,1,1,1,1,1,3,3,3,6
3,4,4,1,1,1,1,1,1,1,4,4,4,9
4,5,5,1,1,1,1,1,1,1,5,5,5,5


In [41]:
# save to disk
user_features.to_parquet(os.path.join(DATA_DIR, "user_features.parquet"))

### 7.3 - Export User Embeddings

In [42]:
queries = retrieval_model.query_embeddings(Dataset(user_features, schema=schema), batch_size=1024, index=Tags.USER_ID)
query_embs_df = queries.compute(scheduler="synchronous").reset_index()

In [43]:
query_embs_df.head()

Unnamed: 0,user_id,0,1,2,3,4,5,6,7,8,...,54,55,56,57,58,59,60,61,62,63
0,1,-0.005831,-0.017723,-0.005784,0.049866,0.02628,0.049169,-0.054866,0.02213,0.036576,...,-0.021381,-0.000743,-0.060986,-0.037169,0.040978,0.017704,0.003427,-0.028408,0.047877,-0.025089
1,2,-0.050697,0.068015,-0.071699,0.043323,-0.001567,0.032226,-0.120845,-0.016043,0.016297,...,0.055824,0.003816,-0.061081,-0.002499,0.051202,-0.060756,-0.022746,0.015439,0.078743,-0.0371
2,3,0.039121,-0.006548,-0.148172,0.098371,-0.025619,0.046444,-0.166427,-0.044223,0.086496,...,0.000787,0.06598,-0.131302,-0.064237,0.037129,-0.035728,-0.08238,0.062503,0.179321,-0.09111
3,4,-0.031507,-0.061412,0.008147,0.063988,0.013863,0.10459,-0.05323,0.054759,0.083436,...,-0.03158,-0.020756,-0.025745,0.008281,0.030552,0.033858,0.061298,0.085968,0.093174,-0.027022
4,5,-0.029882,0.066317,-0.077858,0.026251,-0.046349,0.006982,-0.123309,-0.120219,0.059101,...,-0.074184,0.012038,0.067444,-0.047427,0.108614,0.067772,0.048659,-0.034214,0.144912,-0.046109


### 7.4 - Export Item Features

In [44]:
item_features = (
    unique_rows_by_features(train_retrieval, Tags.ITEM, Tags.ITEM_ID).compute().reset_index(drop=True)
)
item_features.head()

Unnamed: 0,item_id,item_category,item_shop,item_brand,item_id_raw
0,1,1,1,1,7
1,2,2,2,2,6
2,3,3,3,3,8
3,4,4,4,4,9
4,5,5,5,5,5


In [45]:
# save to disk
item_features.to_parquet(os.path.join(DATA_DIR, "item_features.parquet"))

### 7.5 - Export Item Embeddings
These will get stored in our ANN Index (in Redis).

In [46]:
item_embs = retrieval_model.candidate_embeddings(Dataset(item_features, schema=schema), batch_size=1024, index=Tags.ITEM_ID)
item_embs_df = item_embs.compute(scheduler="synchronous")
item_embs_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,54,55,56,57,58,59,60,61,62,63
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,-0.034885,-0.000131,0.018455,0.03743,0.026332,0.012729,0.00676,0.069112,0.044133,0.031141,...,-0.027153,-0.02995,-0.02007,-0.067773,0.00242,-0.001353,-0.055582,0.042481,0.013875,0.021228
2,0.021357,-0.026375,0.06909,-0.011445,0.025277,-0.010337,0.008437,0.042574,0.060663,-0.004809,...,-0.03727,-0.039209,0.013558,-0.006484,-0.029601,0.073999,0.009857,-0.022534,-0.00944,-0.025069
3,-0.018197,0.017502,0.002263,0.008534,0.015912,0.00636,-0.00166,0.007613,0.054932,0.042344,...,-0.045789,0.033707,-0.025606,-0.020231,0.068983,0.030158,-0.054312,-0.006741,0.026637,-0.040934
4,-0.018756,-0.057435,0.027142,0.069214,-0.014137,0.063484,0.049648,-0.000459,0.04144,0.020341,...,-0.050948,-0.007804,0.001069,-0.059237,-0.018273,-0.005572,-0.017192,0.033178,0.05067,0.040354
5,0.044985,0.015847,-0.041081,-0.00662,-0.003196,-0.04521,-0.031615,-0.093638,0.007464,0.03394,...,-0.014779,0.057923,-0.015743,-0.048929,0.000438,-0.043618,-0.137103,-0.01958,0.025585,0.028937


In [47]:
# save to disk
item_embs_df.to_parquet(os.path.join(DATA_DIR, "item_embeddings.parquet"))

### 7.6 - Export DLRM

In [48]:
dlrm.save(os.path.join(DATA_DIR, "dlrm"))

## Conclusion

No we have learned how to train and evaluate your Two-Tower **Candidate Retrieval Model** and **DLRM**. We also wrote recommendations "offline" to a key-value store (Redis) for use in our applications.

![img](./img/OfflineBatchRecsys.png)

*This completes the **"Offline Batch Recommender System"** intro with Redis + NVIDIA*. In the next tutorial, we will focus on deploying these assets to the [Triton Inference Server](https://github.com/triton-inference-server/server) for live "online" recommendations.