In [1]:
#pip install nextrec

import logging
import sys

logger = logging.getLogger() 
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s %(message)s'))
logger.handlers = [handler] 

# 5-Minute Quick Start

This notebook introduces NextRec, a unified, efficient, and scalable recommender-system framework, and walks you through training and building a production-ready model from scratch. The example uses internal feature definitions and online samples from E-commerce scenario.

Before getting started, install nextrec from the command line:

```bash
# Release
pip install nextrec

# Test
pip install -i https://test.pypi.org/simple/ nextrec
```

Here is a quick primer on the signals we usually process in recommendation. We handle several input types, transform them, and then feed vectors into the network:

- Dense features (numeric): continuous or ordered values such as age, price, duration, or scores; typically standardized/normalized or log-transformed.
- Sparse features (categorical/ID): high-cardinality discrete fields such as user ID, item ID, gender, occupation, or device type; typically indexed and embedded via an embedding lookup matrix.
- Sequence features (behavior history): variable-length histories such as browse/click/purchase lists. They capture user behavior and interest drift; we usually truncate/pad, embed, and then aggregate (mean/sum/attention) to get a fixed-length vector.
- Context features: environment information such as time, geography, or slot position; can be dense or sparse and often interacts with the main features.
- Multi-modal features: vectors from pre-trained models on text, images, or video; they can be used directly as dense inputs or interact with IDs.

A typical training data format looks like this:

```text
user_id,item_id,gender,age,occupation,history_seq,label
1024,501,1,28,3,"[12,45,18,77]",1
2048,777,0,35,5,"[8,99]",0
```

We provide a desensitized e-commerce dataset with user IDs, item IDs, dense features, sparse features, and sequence features. The labels include both click and conversion.


In [2]:
pip show nextrec

Name: nextrec
Version: 0.4.21
Summary: A comprehensive recommendation library with match, ranking, and multi-task learning models
Home-page: https://github.com/zerolovesea/NextRec
Author: Yang Zhou
Author-email: zyaztec@gmail.com
License: 
Location: /opt/anaconda3/envs/nextrec/lib/python3.10/site-packages
Editable project location: /Users/zyaztec/DailyWork/建模代码整理/NextRec
Requires: numpy, pandas, pyarrow, pyyaml, rich, scikit-learn, scipy, swanlab, torch, torchvision, transformers, wandb
Required-by: 
Note: you may need to restart the kernel to use updated packages.


In [13]:
import pandas as pd
from nextrec.data.preprocessor import DataProcessor

df = pd.read_csv('/Users/zyaztec/DailyWork/建模代码整理/NextRec/dataset/multitask_task.csv')
df.head()

Unnamed: 0,user_id,item_id,dense_0,dense_1,dense_2,dense_3,dense_4,dense_5,dense_6,dense_7,...,sparse_5,sparse_6,sparse_7,sparse_8,sparse_9,sparse_10,sparse_11,sequence_0,click,conversion
0,1,7817,0.147041,0.310204,0.777809,0.944897,0.623154,0.571242,0.770095,0.321103,...,161,138,88,5,312,416,188,"[90, 54, 86, 5, 121, 138, 45, 100, 0, 0, 0, 0,...",1,0
1,1,3579,0.778112,0.803593,0.51852,0.910912,0.043562,0.821427,0.880369,0.337482,...,252,25,402,7,168,155,154,"[3, 95, 31, 124, 56, 79, 109, 0, 0, 0, 0, 0, 0...",1,1
2,1,2657,0.586647,0.123208,0.203636,0.116398,0.240645,0.882588,0.062836,0.629869,...,27,62,145,109,432,170,133,"[139, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]",1,0
3,1,2689,0.337401,0.705511,0.138758,0.945233,0.330333,0.377462,0.121577,0.427124,...,241,144,40,6,333,175,210,"[59, 29, 34, 106, 4, 103, 0, 0, 0, 0, 0, 0, 0,...",1,0
4,1,2495,0.669473,0.564266,0.006319,0.255851,0.698055,0.052065,0.583597,0.590456,...,152,27,204,129,319,97,168,"[52, 122, 104, 116, 5, 138, 37, 30, 59, 10, 19...",1,0


In [14]:
task_labels = ['click', 'conversion']
dense_features_list = [col for col in df.columns if 'dense' in col]
sparse_features_list = [col for col in df.columns if 'sparse' in col] + ['user_id', 'item_id']
sequence_features_list = [col for col in df.columns if 'sequence' in col]

# we need to convert the sequence features stored as string representations of lists in the CSV to actual list objects
for col in df.columns:
    if 'sequence' in col:
        df[col] = df[col].apply(lambda x: eval(x) if isinstance(x, str) else x)

After processing the data into the required format, split out training and inference sets so the model can be evaluated on metrics.

In [15]:
from sklearn.model_selection import train_test_split

train_df, valid_df = train_test_split(df, test_size=0.2, random_state=2025)

Next we prepare the model by defining the different feature types it needs and passing them into the model. Here we use the built-in DenseFeature, SequenceFeature, and SparseFeature classes from nextrec.

In [33]:
from nextrec.basic.features import DenseFeature, SequenceFeature, SparseFeature

# we treat all dense features as DenseFeature, proj_dim=1 means no projection is performed. When proj_dim is greater than 1, 
# it indicates that a linear transformation is performed on the dense features, similar to the effect of embedding
dense_features = [DenseFeature(name=feat, proj_dim=1) for feat in dense_features_list] 

# Sparse features and sequence features are generally embedded, and the embedding_dim can be adjusted according to actual needs
sparse_features = []
for feat in sparse_features_list:
    vocab_size = 20001 # assuming the vocabulary size for each sparse feature is 20001
    # SparseFeature can also set some other parameters, such as initializer, regularization, and embedding_name, etc. 
    # When two features share embedding, the same embedding_name can be set       
    sparse_features.append(SparseFeature(name=feat, vocab_size=vocab_size, embedding_dim=4, embedding_name=feat)) 

# Sequence features are handled similarly to sparse features, but you also need to set the maximum length max_len and padding_idx parameters
sequence_features = []
for feat in sequence_features_list:
    vocab_size = 500 # assuming the vocabulary size for each sequence feature is 500
    sequence_features.append(
        SequenceFeature(
            name=feat,
            vocab_size=vocab_size,
            max_len=20,
            embedding_dim=8,
            padding_idx=0
        )
    )

Time to use the DataLoader. A DataLoader prepares iterative batches for the model. To keep things simple, we provide RecDataLoader.

RecDataLoader is a powerful utility that accepts a dict, DataFrame, DataLoader, or a path. It can also stream data by setting streaming=True. This instance fits every training scenario in the NextRec framework, and we strongly recommend giving it a try to avoid unnecessary hassle.

You can skip it if you prefer—NextRec also supports training directly with a dict or DataFrame, as shown later.


In [29]:
from nextrec.data.dataloader import RecDataLoader

task_labels = ['click', 'conversion']

dataloader = RecDataLoader(
    dense_features=dense_features,
    sparse_features=sparse_features,
    sequence_features=sequence_features,
    target=task_labels,
)

# We need to create dataloaders for the training set and validation set separately
train_loader = dataloader.create_dataloader(
    data=train_df,
    batch_size=512,
    shuffle=True,
)

valid_loader = dataloader.create_dataloader(
    data=valid_df,
    batch_size=512,
    shuffle=False,
)


# you can also pass in a path to configure a streaming data loader
# train_loader = dataloader.create_dataloader(
#     data='/path/to/train/data',
#     batch_size=512,
#     shuffle=True,
#     streaming=True
# )

Now let's choose a model to train. NextRec offers more than 20 industry-standard models for retrieval, ranking, and multi-task learning. Here we start with a classic MMOE model. Before training, we need to instantiate the model and assign parameters.

After instantiation we compile the model, assigning the optimizer, scheduler, and loss functions to the trainer. NextRec supports more than 8 optimizers, 10 schedulers, 20 loss functions, and imbalance-aware losses.


In [30]:
from nextrec.models.multi_task.mmoe import MMOE

# we need to set the parameters of the expert network and task tower for mmoe. Here we set 4 expert networks, each containing two layers,
# and the task tower also contains two layers. We have two tasks, click and conversion, 
# each of which is a binary classification task, so the task parameter is set to ['binary', 'binary']
model = MMOE(
    dense_features=dense_features,
    sparse_features=sparse_features,
    sequence_features=sequence_features,
    expert_params= {"dims": [128, 64],  "activation": "leaky_relu", "dropout": 0.3},
    num_experts=4,  # 4 expert networks
    tower_params_list=[{"dims": [64, 32], "activation": "leaky_relu", "dropout": 0.2},  # click task
                       {"dims": [64, 32], "activation": "leaky_relu", "dropout": 0.2},  # conversion task
                        ],
    target=task_labels,  # multiple task labels
    task=['binary', 'binary'],  # each task type
    device='cpu',  
    embedding_l1_reg=1e-6,
    embedding_l2_reg=1e-5,
    dense_l1_reg=1e-5,
    dense_l2_reg=1e-4,
    session_id="mmoe_task"    # session id is used to distinguish different training tasks, and will save training logs, checkpoints, model parameters, etc. in a folder named after the session_id
    
)

# Compile the model to set the optimizer and loss function. Configure them via compile().
# but we recommend passing them in compile for clarity.
# Here we use the Adam optimizer with a learning rate of 1e-3 and weight decay of 1e-5
# Each task uses binary cross-entropy loss
model.compile(
    optimizer="adam",
    optimizer_params={"lr": 1e-3, "weight_decay": 1e-5},
    loss=['bce', 'bce'],  # loss for each task
    loss_weights="grad_norm" # use grad_norm to automatically adjust the weights of each task's loss during training. you can also set it to a list of fixed weights, e.g., [1.0, 0.5]
)

# Now we can start training the model. Here we set the training to 3 epochs, but you can adjust it according to your actual situation.
# At the same time, we can also set evaluation metrics for each task. Here we set AUC, Recall, and Precision metrics for each task.
# Note you can check the training logs and model checkpoints in the nextrec_logs/mmoe_iflytek folder
# pass use_tensorboard，use_wandb，use_swanlab and wandb_kwargs, swanlab_kwargs to enable logging to TensorBoard, Weights & Biases, and SwanLab respectively.
# for exameple, use_swanlab=True, swanlab_kwargs={"project": "NextRec", "name": "MMOE_experiment"}
model.fit(
    train_data=train_loader, 
    valid_data=valid_loader,
    metrics={
        'click': ['auc', 'recall', 'precision'],
        'conversion': ['auc', 'recall', 'precision']
    },
    epochs=1,
)


[1m[94mModel Summary: MMOE[0m


[1m[36mFeature Configuration[0m
[36m--------------------------------------------------------------------------------[0m
Dense Features (8):
  1. dense_0             
  2. dense_1             
  3. dense_2             
  4. dense_3             
  5. dense_4             
  6. dense_5             
  7. dense_6             
  8. dense_7             

Sparse Features (14):
  #    Name           Vocab Size        Embed Name  Embed Dim
  ---- ------------ ------------ ----------------- ----------
  1    sparse_0           200002          sparse_0          4
  2    sparse_1           200002          sparse_1          4
  3    sparse_2           200002          sparse_2          4
  4    sparse_3           200002          sparse_3          4
  5    sparse_4           200002          sparse_4          4
  6    sparse_5           200002          sparse_5          4
  7    sparse_6           200002          sparse_6          4
  8    sparse_7           2000

Epoch 1: 84/157 elapsed=0:00:10 speed=8.38/s ETA=0:00:08
Epoch 1: 157/157 elapsed=0:00:19 speed=8.20/s ETA=0:00:00








Saved checkpoint to /Users/zyaztec/DailyWork/建模代码整理/NextRec/tutorials/notebooks/en/nextrec_logs/mmoe_task/MMOE_checkpoint.pt
Saved checkpoint to /Users/zyaztec/DailyWork/建模代码整理/NextRec/tutorials/notebooks/en/nextrec_logs/mmoe_task/MMOE_best.pt
[94mSaved best model to:               /Users/zyaztec/DailyWork/建模代码整理/NextRec/tutorials/notebooks/en/nextrec_logs/mmoe_task/MMOE_best.pt with val_auc_click: 0.855681[0m
[94mRestoring model weights from epoch: 1 with best val_auc_click: 0.855681[0m

[1m[94mTraining finished.[0m

Load best model from:              /Users/zyaztec/DailyWork/建模代码整理/NextRec/tutorials/notebooks/en/nextrec_logs/mmoe_task/MMOE_best.pt


MMOE(
  (embedding): EmbeddingLayer(
    (embed_dict): ModuleDict(
      (sparse_0): Embedding(200002, 4, padding_idx=0)
      (sparse_1): Embedding(200002, 4, padding_idx=0)
      (sparse_2): Embedding(200002, 4, padding_idx=0)
      (sparse_3): Embedding(200002, 4, padding_idx=0)
      (sparse_4): Embedding(200002, 4, padding_idx=0)
      (sparse_5): Embedding(200002, 4, padding_idx=0)
      (sparse_6): Embedding(200002, 4, padding_idx=0)
      (sparse_7): Embedding(200002, 4, padding_idx=0)
      (sparse_8): Embedding(200002, 4, padding_idx=0)
      (sparse_9): Embedding(200002, 4, padding_idx=0)
      (sparse_10): Embedding(200002, 4, padding_idx=0)
      (sparse_11): Embedding(200002, 4, padding_idx=0)
      (user_id): Embedding(200002, 4, padding_idx=0)
      (item_id): Embedding(200002, 4, padding_idx=0)
      (sequence_0): Embedding(200, 8, padding_idx=0)
    )
    (dense_transforms): ModuleDict()
    (sequence_poolings): ModuleDict(
      (sequence_0): AveragePooling()
    )
 

Next we train a ranking model using AutoINT as the example, switching the task from multi-task to single-task. This model comes from a Peking University paper published at CIKM 2019; you can read an explainer [here](https://guyuecanhui.github.io/2020/05/09/paper-2019-pku-autoint/).


In [31]:
from nextrec.models.ranking.autoint import AutoInt

target = 'conversion'

# Since the target has changed, we recreate the dataloader
dataloader = RecDataLoader(
    dense_features=dense_features,
    sparse_features=sparse_features,
    sequence_features=sequence_features,
    target=target,
)

train_loader = dataloader.create_dataloader(
    data=train_df,
    batch_size=512,
    shuffle=True,
)

valid_loader = dataloader.create_dataloader(
    data=train_df,
    batch_size=512,
    shuffle=False,
)

model = AutoInt(
    dense_features=dense_features,
    sparse_features=sparse_features,
    sequence_features=sequence_features,
    att_layer_num=3,
    att_embedding_dim=8,
    att_head_num=2,
    att_dropout=0.0,
    att_use_residual=True,
    target=target,
    device='cpu',
    embedding_l1_reg=1e-6,
    dense_l1_reg=1e-5,
    embedding_l2_reg=1e-5,
    dense_l2_reg=1e-4,
    session_id="autoint_iflytek"
)

# compile the model to set the optimizer and loss function
model.compile(
    optimizer="adam",
    optimizer_params={
        "lr": 1e-3,
        "weight_decay": 1e-5
    },
    loss="bce",
)

# training the model
model.fit(
    train_data=train_loader,
    valid_data=valid_loader,
    metrics=['auc',
             'recall',
             'precision'],
    epochs=1,
    batch_size=512,
    shuffle=True,
)


[1m[94mModel Summary: AUTOINT[0m


[1m[36mFeature Configuration[0m
[36m--------------------------------------------------------------------------------[0m
Dense Features (8):
  1. dense_0             
  2. dense_1             
  3. dense_2             
  4. dense_3             
  5. dense_4             
  6. dense_5             
  7. dense_6             
  8. dense_7             

Sparse Features (14):
  #    Name           Vocab Size        Embed Name  Embed Dim
  ---- ------------ ------------ ----------------- ----------
  1    sparse_0           200002          sparse_0          4
  2    sparse_1           200002          sparse_1          4
  3    sparse_2           200002          sparse_2          4
  4    sparse_3           200002          sparse_3          4
  5    sparse_4           200002          sparse_4          4
  6    sparse_5           200002          sparse_5          4
  7    sparse_6           200002          sparse_6          4
  8    sparse_7           2

Epoch 1: 157/157 elapsed=0:00:07 speed=20.65/s ETA=0:00:00








Saved checkpoint to /Users/zyaztec/DailyWork/建模代码整理/NextRec/tutorials/notebooks/en/nextrec_logs/autoint_iflytek/AUTOINT_checkpoint.pt
Saved checkpoint to /Users/zyaztec/DailyWork/建模代码整理/NextRec/tutorials/notebooks/en/nextrec_logs/autoint_iflytek/AUTOINT_best.pt
[94mSaved best model to:               /Users/zyaztec/DailyWork/建模代码整理/NextRec/tutorials/notebooks/en/nextrec_logs/autoint_iflytek/AUTOINT_best.pt with val_auc: 0.679912[0m
[94mRestoring model weights from epoch: 1 with best val_auc: 0.679912[0m

[1m[94mTraining finished.[0m

Load best model from:              /Users/zyaztec/DailyWork/建模代码整理/NextRec/tutorials/notebooks/en/nextrec_logs/autoint_iflytek/AUTOINT_best.pt


AutoInt(
  (embedding): EmbeddingLayer(
    (embed_dict): ModuleDict(
      (sparse_0): Embedding(200002, 4, padding_idx=0)
      (sparse_1): Embedding(200002, 4, padding_idx=0)
      (sparse_2): Embedding(200002, 4, padding_idx=0)
      (sparse_3): Embedding(200002, 4, padding_idx=0)
      (sparse_4): Embedding(200002, 4, padding_idx=0)
      (sparse_5): Embedding(200002, 4, padding_idx=0)
      (sparse_6): Embedding(200002, 4, padding_idx=0)
      (sparse_7): Embedding(200002, 4, padding_idx=0)
      (sparse_8): Embedding(200002, 4, padding_idx=0)
      (sparse_9): Embedding(200002, 4, padding_idx=0)
      (sparse_10): Embedding(200002, 4, padding_idx=0)
      (sparse_11): Embedding(200002, 4, padding_idx=0)
      (user_id): Embedding(200002, 4, padding_idx=0)
      (item_id): Embedding(200002, 4, padding_idx=0)
      (sequence_0): Embedding(200, 8, padding_idx=0)
    )
    (dense_transforms): ModuleDict()
    (sequence_poolings): ModuleDict(
      (sequence_0): AveragePooling()
    

Prefer not to build the dataloader manually? NextRec also supports passing a DataFrame or dict directly—as long as you have enough memory. (That said, RecDataLoader remains the better choice.)

You can also omit valid_data, in which case the model trains on the full dataset.

Or set valid_split to let the model automatically carve out a validation set from the training data.


In [32]:
model.fit(
    train_data=train_df,
    metrics=['auc',
             'recall',
             'precision'],
    epochs=1,
    batch_size=512,
    shuffle=True,
    # valid_split=0.2
)


[1m[94mModel Summary: AUTOINT[0m


[1m[36mFeature Configuration[0m
[36m--------------------------------------------------------------------------------[0m
Dense Features (8):
  1. dense_0             
  2. dense_1             
  3. dense_2             
  4. dense_3             
  5. dense_4             
  6. dense_5             
  7. dense_6             
  8. dense_7             

Sparse Features (14):
  #    Name           Vocab Size        Embed Name  Embed Dim
  ---- ------------ ------------ ----------------- ----------
  1    sparse_0              100          sparse_0          4
  2    sparse_1              100          sparse_1          4
  3    sparse_2              100          sparse_2          4
  4    sparse_3              100          sparse_3          4
  5    sparse_4              100          sparse_4          4
  6    sparse_5              100          sparse_5          4
  7    sparse_6              100          sparse_6          4
  8    sparse_7            

Epoch 1: 157/157 elapsed=0:00:07 speed=20.68/s ETA=0:00:00





[33mEarly stopping conditioned on metric `val_auc` which is not available. Available metrics are: loss,auc,recall,precision[0m

[1m[94mTraining finished.[0m



AutoInt(
  (embedding): EmbeddingLayer(
    (embed_dict): ModuleDict(
      (sparse_0): Embedding(200002, 4, padding_idx=0)
      (sparse_1): Embedding(200002, 4, padding_idx=0)
      (sparse_2): Embedding(200002, 4, padding_idx=0)
      (sparse_3): Embedding(200002, 4, padding_idx=0)
      (sparse_4): Embedding(200002, 4, padding_idx=0)
      (sparse_5): Embedding(200002, 4, padding_idx=0)
      (sparse_6): Embedding(200002, 4, padding_idx=0)
      (sparse_7): Embedding(200002, 4, padding_idx=0)
      (sparse_8): Embedding(200002, 4, padding_idx=0)
      (sparse_9): Embedding(200002, 4, padding_idx=0)
      (sparse_10): Embedding(200002, 4, padding_idx=0)
      (sparse_11): Embedding(200002, 4, padding_idx=0)
      (user_id): Embedding(200002, 4, padding_idx=0)
      (item_id): Embedding(200002, 4, padding_idx=0)
      (sequence_0): Embedding(200, 8, padding_idx=0)
    )
    (dense_transforms): ModuleDict()
    (sequence_poolings): ModuleDict(
      (sequence_0): AveragePooling()
    

Below are the models currently supported—feel free to try them out.

### Ranking models

| Model | Paper | Year | Status |
|------|------|------|------|
| **FM** | Factorization Machines | ICDM 2010 | Supported |
| **AFM** | Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks | IJCAI 2017 | Supported |
| **DeepFM** | DeepFM: A Factorization-Machine based Neural Network for CTR Prediction | IJCAI 2017 | Supported |
| **Wide&Deep** | Wide & Deep Learning for Recommender Systems | DLRS 2016 | Supported |
| **xDeepFM** | xDeepFM: Combining Explicit and Implicit Feature Interactions | KDD 2018 | Supported |
| **FiBiNET** | FiBiNET: Combining Feature Importance and Bilinear Feature Interaction for CTR Prediction | RecSys 2019 | Supported |
| **PNN** | Product-based Neural Networks for User Response Prediction | ICDM 2016 | Supported |
| **AutoInt** | AutoInt: Automatic Feature Interaction Learning | CIKM 2019 | Supported |
| **DCN** | Deep & Cross Network for Ad Click Predictions | ADKDD 2017 | Supported |
| **DIN** | Deep Interest Network for Click-Through Rate Prediction | KDD 2018 | Supported |
| **DIEN** | Deep Interest Evolution Network for Click-Through Rate Prediction | AAAI 2019 | Supported |
| **MaskNet** | MaskNet: Introducing Feature-wise Gating Blocks for High-dimensional Sparse Recommendation Data | 2020 | Supported |

### Retrieval models

| Model | Paper | Year | Status |
|------|------|------|------|
| **DSSM** | Learning Deep Structured Semantic Models | CIKM 2013 | Supported |
| **DSSM v2** | DSSM with pairwise BPR-style optimization | - | Supported |
| **YouTube DNN** | Deep Neural Networks for YouTube Recommendations | RecSys 2016 | Supported |
| **MIND** | Multi-Interest Network with Dynamic Routing | CIKM 2019 | Supported |
| **SDM** | Sequential Deep Matching Model | - | Supported |

### Multi-task models

| Model | Paper | Year | Status |
|------|------|------|------|
| **MMOE** | Modeling Task Relationships in Multi-task Learning | KDD 2018 | Supported |
| **PLE** | Progressive Layered Extraction | RecSys 2020 | Supported |
| **ESMM** | Entire Space Multi-Task Model | SIGIR 2018 | Supported |
| **ShareBottom** | Multitask Learning | - | Supported |
