# Nexus Tutorial for Recommendation

The key industry features are as follows:

- It supports reading data from local and distributed file systems, such as HDFS. Unlike the small-batch datasets used in academia, industrial-level data is often very large and needs to be stored daily in the HDFS distributed file system. Therefore, Nexus provides HDFS data reading interfaces to facilitate rapid integration with industrial scenario data. But it still supports reading data from local files for debugging.

- It supports various training configurations including single-machine single-card, single-machine multi-card, and distributed multi-machine multi-card training for the engineer's diverse devlopment needs. The huge amount of industrial data often demands higher training time, so Nexus offers distributed training interfaces to facilitate rapid distributed training of industrial recommendation models. What's more, we utilize the [Accelerate](https://huggingface.co/docs/transformers/accelerate) to wrap the training process. It allows the engineer to switch between training and debugging by modifying a fews lines of code. 

- It supports easily deploying recommendation models into the industrial internet and severing the customer's request. Nexus contains a high performance inference engine to satisfy the requirements online request's latency. The inference engine compresses the data using [Protocol Buffers](https://github.com/protocolbuffers/protobuf) firstly. It then stores the compressed data into the key-value database [Redis](https://redis.io/). And finally, [ONNX](https://onnx.ai/), [TensorRT](https://github.com/NVIDIA/TensorRT), and [Faiss](https://github.com/facebookresearch/faiss) are integrated into the inference engine to speed up the inference process. 

The following tutorial will provide a detailed introduction on how to use Nexus for model training, including the following detailed requirements:

1. Configuration of training data.
2. Model configuration and custom model building.
3. Lanuch training under local and distributed environments.
4. Saving and reading of models.


## Data Configuration

Due to the vast amount of data in industrial recommendation systems, distributed systems are often used for storage and retrieval, with HDFS being a commonly used distributed system. Nexus supports storing data in HDFS and using it for training. Below, we will use the RecFlow, an industrial full flow recommendation dataset, published by KuaiShou as an example to illustrate the organization of data.

1. Daily Interaction Logs. This part of the data is generally used to record user interactions with items, such as clicks and conversions. Specifically, each time a user refreshes while browsing videos, a request is sent to the system, which is then processed algorithmically. The system is funnel-shaped with multiple stages, ultimately returning 10-20 candidate items to the user, and the user's interactions with these items are fed back into the system, forming a data record. This typically includes: Request ID, User ID, User Features, Item ID, Item Features, User Historical Behavior, Interaction Time, etc. Due to the volume of user data logs, they are often divided into files on a daily basis, such as 2019-07-01.csv, 2019-07-02.csv, etc. In the RecFlow dataset, daily user data logs are stored as YYYY-MM-DD.feather files. [Feather](https://arrow.apache.org/docs/python/feather.html) is a compact file format for storing Arrow tables or data frames and can save lots of storage space. Such logs are mainly used for training and testing of recommendation models.

2. Item Corpus Information. This part of the data often includes information of item corpus on the platform, organized in the form of a key-value format, where the key is the item ID and the value is a series of features of the item, such as on a short video platform: the creator of the video, the category of the video, the duration of the video, video tags, etc. It will mostly be utilized for training and inferring of recommendation retrieval models. 

3. Behavior Sequence Records. This part of the data stores the user's behavior sequence, representing the user's historical interactions, organized in the form of a key-value format, where the key is the request ID and the value is the behavior sequence corresponding to that request ID. We store the user's behavior sequence separately instead of intergrate it with the interaction logs to reduce storage costs. The storage of behavior sequence is heavy and there will exist lots of repeated sequence data if intergrated in interaction logs. The behaivor sequence records play an important role in user modeling and appears in the whole pipeline of industrial recommendation systems.

The template for the dataset configuration file is as follows:

```json
{
    "name": "Dataset Name (required)",
    "type": "Dataset type, such as hdfs or file (required)",
    "url": "The location of the dataset interaction data, such as hdfs://ip_address:port/Nexus/recflow/daily_logs (required)",
    "file_partition": {
        "type": "date",
        "format": "Date format, such as %Y-%m-%d. Default is %Y-%m-%d."
    },
    "item_col": "Item ID column name (required)",
    "item_batch_size": "Int, the batch size of item dataloader",
    "context_features": ["List of context (user-side and contextual) features used", "Feature 1", "Feature 2", "(required)"],
    "item_features": ["List of item features used", "context_features and item_features must not overlap, both are column names in the main table", "(required)"],
    "labels": ["List of labels used", "Multiple labels generally indicate multi-task training", "The label list must not be empty", "(required)"],
    "filter_settings": {
        "Filter feature name": ["Filter condition 1", "Filter condition 2", "Filter conditions are in the form of (==, !=, >=, <=, >, <)[number]"],
        "effective_view": ["==1"],
        "purpose": "Generally used for filtering by label, for example, the recall model needs to retain only samples with label=1, and negative samples are sampled from the candidate item set"
    },
    "item_info": {
        "url": "The storage location of the candidate item information data for the recall model, such as hdfs://ip_address:port/Nexus/recflow/others/video_info.pkl, required for recall models",
        "key": "The column name of the item ID. Must be provided for dataframe-style files, not needed for dict-style files",
        "columns": ["Column names of the item feature table, required in item_info, especially for dict file feature naming"],
        "use_cols": ["List of features to be used in item_info, if empty, all columns are used"]
    },
    "user_sequential_info": [
        {
            "name": "user sequence name",
            "url": "The storage location of the user sequence data, such as hdfs://ip_address:port/Nexus/recflow/seq_effective_50. Setting user_sequential_info to null indicates that an independent sequence file is not used",
            "key": "The key value for querying sequence data index, such as request_id. This value must also exist in the interaction data table.",
            "columns": ["Column names of the sequence feature table, required in item_info, especially for dict file feature naming, generally the same as or a subset of item_features"],
            "use_cols": ["List of features to be used in user_sequential_info, if empty, all columns are used"],
            "length": 50
        }
    ],
    "stats": {
        "Feature 1": 6,
        "Feature 2": 10,
        "(required)": "The cardinality of each feature"
    },
    "train_period": {
        "start_date": "2024-01-13 (required), the start date of the training data",
        "end_date": "2024-02-08, the end date of the training data. Data for this date is not included (required)"
    },
    "test_period": {
        "start_date": "2024-02-08 (required), the start date of the test data",
        "end_date": "2024-02-09, the end date of the test data. Data for this date is not included (required)"
    }
    
}
```

Specifically, when the RecFlow dataset is used for training a retriever model, the data configuration is as follows: (It is worth noting that retriever models are often trained only on exposed data, hence the need to set filter_settings)

```json
{
    "name": "recflow",
    "type": "hdfs",
    "url": "hdfs://node1:8020/Nexus/recflow/realshow",
    "file_partition": {
        "type": "date",
        "format": "%Y-%m-%d"
    },
    "item_col": "video_id",
    "item_batch_size": 2048,
    "context_features": ["user_id", "device_id", "age", "gender", "province"],
    "item_features": ["video_id", "author_id", "category_level_two", "upload_type", "category_level_one"],
    "labels": ["like"],
    "filter_settings": {
        "like": ["==1"]
    },
    "item_info": {
        "url": "hdfs://node1:8020/Nexus/recflow/others/video_info.pkl",
        "key": "video_id",
        "columns": ["video_id", "author_id", "category_level_two", "upload_type", "upload_timestamp", "category_level_one"],
        "use_cols": ["video_id", "author_id", "category_level_two", "upload_type", "category_level_one"]
    },
    "user_sequential_info": [
        {
            "name": "user_seq_effective_50",
            "url": "hdfs://node1:8020/Nexus/recflow/seq_effective_50",
            "key": "request_id",
            "columns": ["video_id", "author_id", "category_level_two", "category_level_one", "upload_type", "upload_timestamp", "duration", "request_timestamp", "playing_time", "request_id"],
            "use_cols": ["video_id", "author_id", "category_level_two", "category_level_one", "upload_type"],
            "length": 50
        }
    ],
    "stats": {
        "request_id": 9370581,
        "user_id": 42472,
        "device_id": 42561,
        "age": 8,
        "gender": 3,
        "province": 79,
        "video_id": 82216301,
        "author_id": 33474011,
        "category_level_one": 140,
        "category_level_two": 784,
        "upload_type": 40
    },
    "train_period": {
        "start_date": "2024-02-01",
        "end_date": "2024-02-08"
    },
    "test_period": {
        "start_date": "2024-02-08",
        "end_date": "2024-02-09"
    }
    
}
```

With this, the dataset configuration file is complete. Subsequently, Nexus can automatically generate a DataLoader based on the configuration.
Before stepping into the details of Nexus, you need to download the data of [RecFlow's learning folder](https://rec.ustc.edu.cn/share/f8e5adc0-2e57-11ef-bea5-3b4cac9d110e) for learning. When downloading, you can put it into your server's local file system or [HDFS](https://hadoop.apache.org/). After you faimilar with Nexus, you can download the whole RecFlow dataset or other recommendation datasets for further research. 

## Model Configuration and Building Custom Models
This section will describe how to use the models implemented in the library for training and how to inherit base classes to build custom models. Therefore, it will be divided into two subsections for introduction.

1. First, you need to clone Nexus to your local machine and install the dependencies.

    ```bash
    pip install -r requirements.txt
    ```

2. Add Nexus to the Python path to facilitate calling.

    ```bash
    export PYTHONPATH=$PYTHONPATH:/path/to/Nexus
    ```

3. Configure the model configuration file, which is used to define the structural parameters of the model, such as embedding size, hidden size, etc. An example is as follows:

    ```json
    {
        "embedding_dim": 2,
        "num_neg": 50,
        "mlp_layers": [64, 64, 8],
        "activation": "relu",
        "dropout": 0.3,
        "batch_norm": true
    }
    ```

4. Configure the training parameters, which are used to define the hyperparameters for training, such as batch size, learning rate, etc. An example is as follows:

    ```json
    {
        "num_train_epochs": 1,
        "max_steps": 900,
        "per_device_train_batch_size": 1024,
        "output_dir": "./saves/recommender_results/mlp_retriever",
        "checkpoint_steps": 150,
        "learning_rate": 0.001,
        "weight_decay": 0.0,
        "warmup_steps": 1000
    }
    ```
    
For more information on configuration parameters, you can refer to [Configuration Parameters](../../Nexus/training/embedder/recommendation/arguments.py).

5. Create a new Python script to import the dataset and model using Nexus and perform training.


In [None]:
from Nexus.training.embedder.recommendation.runner import RetrieverRunner
from Nexus.training.embedder.recommendation.modeling import MLPRetriever


data_config_path = "./examples/recommendation/config/data/recflow_retriever.json"
train_config_path = "./examples/recommendation/config/mlp_retriever/train.json"
model_config_path = "./examples/recommendation/config/mlp_retriever/model.json"

runner = RetrieverRunner(
    model_config_or_path=model_config_path,
    data_config_or_path=data_config_path,
    train_config_or_path=train_config_path,
    model_class=MLPRetriever,
)
runner.run()


6. At this point, the model training script is complete, and you can run the script to train the model. Executing the script with the Python command will default to single-machine single-GPU training. If you need single-machine multi-GPU or multi-machine multi-GPU training, you can refer to Distributed Training for configuration.

    ```bash
    python train.py
    ```

### Custom your models

This section will demonstrate how to train custom models by inheriting base classes. We will show the custom usage of recall and ranking models, to illustrate the interfaces that need to be configured for two-tower and single-tower models.

First, the first two steps are the same as using built-in models, which require environment setup if you have done so:

1. Clone Nexus to your local machine and install the dependencies.

In [None]:
!pip install -r requirements.txt 
!pip install -e . 

2. Add Nexus to the Python path to facilitate calling.

In [None]:
export PYTHONPATH=$PYTHONPATH:/path/to/Nexus

#### Retriever Model (Two-tower Model)

3. Import the BaseRetriever class and inherit from it to implement your custom model. A recall model is typically composed of four main modules:

- query_encoder: The context (query) feature encoder, which encodes user and context features into vector representations.
- item_encoder: The item feature encoder, which encodes item features into vector representations.
- score_function: The scoring function, which calculates the match degree between user-item pairs.
- loss_function: The loss function, which calculates the difference between the model's predicted values and the true labels.

Therefore, you need to override the following methods. The configuration parameters required when defining the model structure come from the model.json file.

In [None]:
import torch
from collections import OrderedDict
from Nexus.training.embedder.recommendation.modeling import BaseRetriever
from Nexus.modules.arguments import get_modules
from Nexus.modules.embedding import MultiFeatEmbedding
from Nexus.modules.layer import MLPModule

class MYMLPRetriever(BaseRetriever):
    def __init__(self, config, *args, **kwargs):
        super().__init__(config, *args, **kwargs)

    def get_item_encoder(self):
        item_emb = MultiFeatEmbedding(
            features=self.data_config.item_features,
            stats=self.data_config.stats,
            embedding_dim=self.model_config.embedding_dim,
            concat_embeddings=True
        )
        mlp = MLPModule(
            mlp_layers= [item_emb.total_embedding_dim] + self.model_config.mlp_layers,
            activation_func=self.model_config.activation,
            dropout=self.model_config.dropout,
            bias=True,
            batch_norm=self.model_config.batch_norm,
            last_activation=False,
            last_bn=False
        )
        return torch.nn.Sequential(OrderedDict([
            ("item_embedding", item_emb),
            ("mlp", mlp)
            ]))
    

    def get_query_encoder(self):
        context_emb = MultiFeatEmbedding(
            features=self.data_config.context_features,
            stats=self.data_config.stats,
            embedding_dim=self.model_config.embedding_dim
        )
        base_encoder = get_modules("encoder", "BaseQueryEncoderWithSeq")(
            context_embedding=context_emb,
            item_encoder=self.item_encoder
        )
        output_dim = self.model_config.mlp_layers[-1] + context_emb.total_embedding_dim
        mlp = MLPModule(
            mlp_layers= [output_dim] + self.model_config.mlp_layers,
            activation_func=self.model_config.activation,
            dropout=self.model_config.dropout,
            bias=True,
            batch_norm=self.model_config.batch_norm,
            last_activation=False,
            last_bn=False
        )

        return torch.nn.Sequential(OrderedDict([
            ("encoder", base_encoder),
            ("mlp", mlp)
            ]))

    def get_score_function(self):
        return get_modules("score", "InnerProductScorer")()
    
    def get_loss_function(self):
        return get_modules("loss", "BPRLoss")()

4. After implementing your custom recall model by inheriting from BaseRetriever, the process of creating a training script using Nexus is similar to that of training built-in models. You will need to utilize the dataset, model, and training configuration files to quickly complete the training script. Here's a step-by-step guide to help you set up your training script:

In [None]:
# train.py
from Nexus.training.embedder.recommendation.runner import RetrieverRunner


data_config_path = "./config/data/recflow_retriever.json"
train_config_path = "./config/mlp_retriever/train.json"
model_config_path = "./config/mlp_retriever/model.json"

runner = RetrieverRunner(
    model_config_path=model_config_path,
    data_config_path=data_config_path,
    train_config_path=train_config_path,
    model_class=MYMLPRetriever,
)
runner.run()

#### Ranker Model

Unlike retriever models, ranker models typically focus on the interaction between features and the combination of features. Therefore, the functions that need to be overridden are different, and the modules that need to be built include:

- Sequence Feature Aggregator: Used to aggregate a feature sequence of shape (L,D) into a single feature of shape (D) for subsequent feature interaction.
- Feature Interaction Module: Used to interact a series of features, usually the single feature output by the Sequence Feature Aggregator. Common modules include MLP, FM, etc.
- Prediction Module: Used for the final prediction after feature interaction, typically a fully connected layer, following the feature interaction module.
- Loss Function: Used to calculate the loss between predicted values and true labels.


3. Import the BaseRanker class and inherit from the BaseRanker class to implement a custom model:

In [None]:
import torch
from Nexus.training.reranker.recommendation.modeling import BaseRanker
from Nexus.modules.arguments import get_modules
from Nexus.modules.layer import MLPModule, LambdaModule



class MYMLPRanker(BaseRanker):
    def get_sequence_encoder(self):
        cls = get_modules("module", "AverageAggregator")
        encoder = cls(dim=1)
        return encoder
    
    def get_feature_interaction_layer(self):
        flatten_layer = LambdaModule(lambda x: x.flatten(start_dim=1))  # [B, N, D] -> [B, N*D]
        mlp_layer = MLPModule(
            mlp_layers= [self.num_feat * self.model_config.embedding_dim] + self.model_config.mlp_layers,
            activation_func=self.model_config.activation,
            dropout=self.model_config.dropout,
            bias=True,
            batch_norm=self.model_config.batch_norm,
            last_activation=False,
            last_bn=False
        )
        return torch.nn.Sequential(flatten_layer, mlp_layer)
    
    def get_prediction_layer(self):
        pred_mlp = MLPModule(
            mlp_layers=[self.model_config.mlp_layers[-1]] + self.model_config.prediction_layers + [1],
            activation_func=self.model_config.activation,
            dropout=self.model_config.dropout,
            bias=True,
            batch_norm=self.model_config.batch_norm,
            last_activation=False,
            last_bn=False
        )
        return pred_mlp

    def get_loss_function(self):
        return get_modules("loss", "BCEWithLogitLoss")(reduction='mean')

4. Then, consistent with training built-in models, by using the dataset, model, and training configuration file, you can quickly complete the training script with Nexus.

In [None]:
# train.py
from Nexus.training.reranker.recommendation.runner import RankerRunner


def main():
    data_config_path = "./config/data/recflow_ranker.json"
    train_config_path = "./config/mlp_ranker/train.json"
    model_config_path = "./config/mlp_ranker/model.json"
    
    runner = RankerRunner(
        model_config_path=model_config_path,
        data_config_path=data_config_path,
        train_config_path=train_config_path,
        model_class=MYMLPRanker
    )
    runner.run()

## Single-Machine Training and Distributed Multi-Machine Training of Models

Nexus supports basic single-machine single-GPU training, single-machine multi-GPU training, and distributed training.

1. Single-machine single-GPU training: Directly start with the Python command or start with `accelerate` command (the configuration file of accelerate refer to [single_gpu.json](config/distributed_training/single_gpu.json)).

   ```shell
   # start with Python command
   CUDA_VISIBLE_DEVICES=1 python main.py
   # start with accelerate command
   accelerate launch --config_file single_gpu.json main.py
   ```

2. Single-machine multi-GPU training: First, configure for single-machine multi-GPU, refer to the example file [configuration file single_node.json](config/distributed_training/single_node.json). Then start with the accelerate command.

    ```shell
    accelerate launch --config_file single_node.json main.py
    ```

    Note that multi-GPU training on a single machine will by default occupy port 29500 on the local machine. If you need to run multiple tasks, you need to specify different port numbers in the command or in the JSON file: --main_process_port 29501 (specified in the command line) or "main_process_port": 29501 (JSON file).

    In addition, the current training methods for both single-machine multi-GPU and multi-machine multi-GPU environments adopt DistributedDataParallel (DDP). During the training process, each process will save a complete model and optimizer on the corresponding GPU. Additionally, each GPU maintains a "bucket" to gather gradients from other GPUs during training. Therefore, during model preparation, twice the model size of GPU memory overhead will be occupied than training with a single GPU. For more details, please refer to: [blog1](https://discuss.pytorch.org/t/memory-consumption-for-the-model-get-doubled-after-wrapped-with-ddp/130837), [blog2](https://medium.com/deep-learning-for-protein-design/a-comprehensive-guide-to-memory-usage-in-pytorch-b9b7c78031d3).

3. Multi-machine multi-GPU distributed training:
    - Configure the environment on multiple machines, download Nexus, and install dependencies.
    - Configure for multi-machine multi-GPU on each machine, refer to the example files [configuration file multi_node_rank0.json](config/distributed_training/multi_nodes_rank0.json) and [configuration file multi_node_rank1.json](config/distributed_training/multi_nodes_rank1.json). Then start with the accelerate command on the rank0 machine first, and then start the other machines in sequence:
    
    ```shell
    accelerate launch --config_file multi_node_rank0.json main.py
    ```


Note:
All the acclerate configuration files mentioned above are created by `accelerate config` command.

```shell
accelerate config --config_file xxx.json
```

The you need to select the corresponding options according to your needs in an interactive way.
For more details, please refer to the [accelerate](https://github.com/huggingface/accelerate) documentation.


# Inference
In an online recommendation system, handling a single request typically involves the following steps:
- **Receiving the request header**: The request header includes the user ID and context-specific features (e.g., location and timestamp of the request).
- **Obtaining the Candidate Item Set**: At each stage, the recommendation model receives the candidate item set from the previous stage (for the retrieval model, it is the entire item pool).
- **Retrieving Features**: At each stage, the system retrieves user- and item-related features required by the recommendation model based on the user ID and candidate item IDs. To enable fast access, user and item features are stored in a cache database (e.g., Redis) in a key-value format.
- **Sorting the Candidate Item Set**: At each stage, the recommendation model ranks the candidate items using the retrieved features and selects the top-k items to pass to the next stage (for the final stage, the top-k items are directly presented to the user).

## Storing Features in Cache Database
### Defining message in protobuf
To reduce the cache size occupied by features, Protobuf is used to serialize the features before storing them in the cache database. To use Protobuf,  message data structures must first be defined.

In the .proto file, the user and item message data structures are defined. For example, in recflow.proto:

Each feature of user and item is treated as a field of the message structure.

Then, generate Python code from the .proto file using protoc:

In [None]:
# create proto
protoc --python_out=. ./inference/feature_insert/protos/recflow.proto

### Inserting Features into Redis Database
When storing user-side or item-side features in a Redis database, the process typically involves several steps:

​	1.	Create a message object.

​	2.	Assign values to each field of the message object.

​	3.	Serialize the message object.

​	4.	Store the serialized message object in the Redis database. The key is usually set as {dataset_name}:{object_name}:{object_primary_key}.

An example of inserting features into the Redis database using recflow is shown below:

In [None]:
import redis
import numpy as np
import pandas as pd
from tqdm import *

import recflow_pb2

r = redis.Redis(host='localhost', port=6379, db=0)

# Item
test_video_info = pd.read_feather('./inference/feature_data/recflow/realshow_test_video_info.feather')
for row in tqdm(test_video_info.itertuples(), total=len(test_video_info)):

    # 0. Create a message object
    item = recflow_pb2.Item()
    item.video_id = getattr(row, 'video_id')
    item.author_id = getattr(row, 'author_id')
    item.category_level_two = getattr(row, '_3')
    item.upload_type = getattr(row, 'upload_type')
    item.upload_timestamp = getattr(row, 'upload_timestamp')
    item.category_level_one = getattr(row, 'category_level_one')
    
    # 1. Serialize the Protobuf object into binary data
    serialized_data = item.SerializeToString()

    # 2. Store the compressed data in Redis
    r.set(f"recflow:item:{item.video_id}", serialized_data)
    

print("Item features are stored in Redis.")

# User
test_user_info = np.load('./inference/feature_data/recflow/test_user_info.npz')['arr_0']
for row in tqdm(test_user_info):

    # 0. Create a message object 
    user_timestamp = recflow_pb2.UserTimestamp()
    user_timestamp.request_id = row[0]
    user_timestamp.user_id = row[1]
    user_timestamp.request_timestamp = row[2]
    user_timestamp.device_id = row[3]
    user_timestamp.age = row[4]
    user_timestamp.gender = row[5]
    user_timestamp.province = row[6]
    
    for behavior in np.split(test_user_info[0][7:], len(test_user_info[0][7:]) // 6):
        item = user_timestamp.seq_effective_50.add()
        item.video_id = behavior[0]
        item.author_id = behavior[1]
        item.category_level_two = behavior[2]
        item.category_level_one = behavior[3]
        item.upload_type = behavior[4]
        item.request_timestamp = behavior[5]

    # 1. Serialize the Protobuf object into binary data
    serialized_data = user_timestamp.SerializeToString()

    # 2. Store the compressed data in Redis
    r.set(f"recflow:user_timestamp:{row[1]}_{row[2]}", serialized_data)

print("UserTimestamp features are stored in Redis.")

### Generate cache configuration file `feature_cache_config.yaml`

To enable the use of features stored in the cache, we need to generate a configuration file `feature_cache_config.yaml` for each dataset.

Taking Recflow as an example:

The `host`, `port`, and `db` fields specify details of Redis database. `features`  specifies the storage details for each feature. Within `features`, `key_temp` represents the key template for the feature in Redis database, where the content inside {} is replaced with specific item or user information, and `field` specifies the attribute name of the feature in the message object. `key_temp2proto` maps each key template to the corresponding message class name, which is used to create message objects.

Running ./inference/feature_insert/recflow_script/run.sh completes the three steps mentioned above.

## InferenceEngine

[InferenceEngine](../../UniRetrieval/abc/inference/inference_engine.py) class can be initialized to perform the inference process. By inheriting InferenceEngine, we further define BaseEmbedderInferenceEngine and BaseRerankerInferenceEngine and use them for inference

1. Converting a checkpoint of the recommendation model to an `onnxruntime.InferenceSession`.
2.	Performing batch inference.
3.	Outputting the top-k candidate items.

We can initialize the InferenceEngine class and perform batch inference as follows:

### Inference: Embedder

In [None]:
import yaml
import argparse
import pandas as pd
from Nexus.inference.embedder.recommendation import BaseEmbedderInferenceEngine
import pycuda.driver as cuda

infer_config_path = "/data1/home/recstudio/haoran/Nexus/examples/recommendation/inference/config/recflow_infer_retrieval_config.yaml"

with open(infer_config_path, 'r') as f:
    config = yaml.safe_load(f)


retriever_inference_engine = BaseEmbedderInferenceEngine(config)

    
infer_df = pd.read_feather('/data1/home/recstudio/haoran/Nexus/examples/recommendation/inference/inference_data/recflow/recflow_infer_data.feather')
for batch_idx in range(10):
    print(f"This is batch {batch_idx}")
    batch_st = batch_idx * 128 
    batch_ed = (batch_idx + 1) * 128 
    batch_infer_df = infer_df.iloc[batch_st:batch_ed]
    retriever_outputs = retriever_inference_engine.batch_inference(batch_infer_df)
    print(type(retriever_outputs), retriever_outputs.shape, retriever_outputs)
if retriever_inference_engine.config['infer_mode'] == 'trt':
    cuda.Context.pop()

### Inference: Ranker

In [None]:
import yaml
import argparse
import pandas as pd
from Nexus.inference.reranker.recommendation import BaseRerankerInferenceEngine
import pycuda.driver as cuda
import numpy as np

infer_config_path = "./examples/recommendation/inference/config/recflow_infer_ranker_config.yaml"

with open(infer_config_path, 'r') as f:
    config = yaml.safe_load(f)
    print(config)

rank_inference_engine = BaseRerankerInferenceEngine(config)
    
infer_df = pd.read_feather('/data1/home/recstudio/haoran/Nexus/examples/recommendation/inference/inference_data/recflow/recflow_infer_data.feather')
item_df = pd.read_feather('/data1/home/recstudio/haoran/Nexus/examples/recommendation/inference/inference_data/recflow/realshow_test_video_info.feather')
all_item_ids = np.array(item_df['video_id'])
for batch_idx in range(10):
    print(f"This is batch {batch_idx}")
    batch_st = batch_idx * 128 
    batch_ed = (batch_idx + 1) * 128 
    batch_infer_df = infer_df.iloc[batch_st:batch_ed]
    np.random.seed(42)
    batch_candidates = np.random.choice(all_item_ids, size=(128, 50))
    batch_candidates_df = pd.DataFrame({rank_inference_engine.feature_config['fiid']: batch_candidates.tolist()})
    ranker_outputs = rank_inference_engine.batch_inference(batch_infer_df, batch_candidates_df)
    print(type(ranker_outputs), ranker_outputs.shape, ranker_outputs[-5:])
    
if rank_inference_engine.config['infer_mode'] == 'trt':
    cuda.Context.pop()

We support onnx and tensorrt for inference acceleration. You only need to adjust the infer_mode parameter in [config.yaml](./inference/config/recflow_infer_ranker_config.yaml) to "ort" or "trt".

## Evaluation

We designed [RecommenderAbsEvaluator](../../UniRetrieval/evaluation/recommendation/evaluator.py) to evaluate model checkpoints. During the evaluation process, it is necessary to provide [eval_config.json](./eval/eval_config.json) and [eval_model_config.json](./eval/eval_model_config.json) to configure certain evaluation hyperparameters, evaluation dataset paths, checkpoint paths, and other related settings. Then, use these configs to initialize a [RecommenderEvalRunner](../../UniRetrieval/evaluation/recommendation/runner.py) class, and you can simply call the run() function to perform the model evaluation.

In [None]:
from Nexus.evaluation.recommendation import RecommenderEvalArgs, RecommenderEvalModelArgs, RecommenderEvalRunner

eval_config_path = "./examples/recommendation/eval/eval_config.json"
model_config_path = "./examples/recommendation/eval/eval_model_config.json"

eval_args = RecommenderEvalArgs.from_json(eval_config_path)
model_args = RecommenderEvalModelArgs.from_json(model_config_path)
    
runner = RecommenderEvalRunner(
    eval_args=eval_args,
    model_args=model_args
)

runner.run()