# LambdaRank step-by-step tutorial

Step 0: Create the configuration file in `configs` folder.

**Note**: The `README.md` file contain a lot of explanations.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import datetime
import json
import os
import torch

from torch.utils.tensorboard import SummaryWriter
from lambdarank.data_loader import dataloader, load_numpy_data
from lambdarank.utils import device_loader, count_trainable_parameters
from lambdarank.model_loader import load_model_and_optim
from lambdarank.trainer import train_model
from lambdarank.metrics import eval_model

ngpu, device = device_loader()

## Configurations

- `run_name` (str): identifier of the run used as folder name in `runs` folder.
- `data_parameters`:
    - `data_path` (str): path indicating the `data` folder.
        - `train_data`:
            - `features_file (str)`: name of the `X_train` file.
            - `label_file (str)`:  name of the `y_train` file.
            - `query_file (str)`:  name of the `q_train` file.
            - `soft_label_file (str)`:  name of the `ps_train` file.
        - `test_data`: same pattern as `train_data`.
        
- `train_parameters`:
    - `training_epochs` (int): number of training epochs.
    - `query_batch_size` (int): number of iterations waited to update model weights.
    - `ps_rate` (float): a number between zero and one indicating the percentage of the iterations that will use the soft label. Set this number to zero for a full conventional LambdaRank.
    - `per_query_sample_size` (int): a number indicating the size of data randomly sampled for each query loop in training. This is specially useful if:
        - the average number of items per query is high (50+);
        - or the average number of relevant items per query is close to half the value of the `k` you want to optimize in your `nDCG@k`, i.e. close to half the size of the list the model will ideally rank.
    Decreasing this value will speedup training and doing so, we recommend increasing the number of epochs to increase model generalization.
    
- `model_configs`:
    - `alpha` (float): is a positive hyperparameter used to compute the lambdas:
    $$\lambda_{ij} = \alpha \left( \frac{1}{2}\left( 1 - S_{ij} \right) - \frac{1}{1 + e^{ \alpha \left( s_{i} - s_{j} \right)}}\right).$$
        Default value is `1.0`. You can decrease it if the number of items per query is very high.
    - `train_label_gain` (list(ints)): is a list containing the gains for each label in order of relevance. The label in `y` vector is used as an index of this list. ex: `label_gain=[0,1,3]` means a label: `0` in `y` will have `gain=0`; `1` will have `gain=1`; and `2` will have `gain=3`. Tipically $gain_{i} = 2^{l_{i}} -1$, where $l_{i}$ is the label of observation $i$, i.e. $l_{i}=y_{i}$.
        **Note**: gains could be floats, for now only integers are implemented.
    - `train_eval_at` (list(ints)): the values of `k` to compute `nDCG@k` used in training for train and test loaders. This is used for log purposes.
    - `layers` (list(dicts)): the description of the Pytorch neural network you want to use in sequential mode. Ex:
    ```python
        {"type": "Linear", "params": { "in_features": null,"out_features": 16}},
        {"type": "Dropout","params": {"p": 0.1}},
        {"type": "LeakyReLU","params": {}},
        {"type": "Linear","params": {"in_features": 16,"out_features": 1}}
     ```
- `optimizer_configs` (dict): the description of the Pytorch optimizer used in training. Only `Adam` and `RMSProp` accepted for now. Ex:
    ```python
    {"type": "Adam",
     "params": {
         "lr": 0.0001,
         "betas": [
             0.9,
             0.999
         ],
         "eps": 1e-08,
         "weight_decay": 1e-06,
         "amsgrad": false
         }
     }
    ```

- `models_path` (str): path indicating the `models` folder.
- `model_name` (str): the name of the model (pickle) to be saved. Ex: `"ranker"`. A suffix will be added with the current date.
- `label_gain` list(ints): same as `train_label_gain`, but used only as a final model evaluation.
- `eval_at` list(ints): same as `train_eval_at`, but used only as a final model evaluation.

In [3]:
config_file_name = "config_1.json"
with open(os.path.join("./configs",config_file_name), 'r') as f:
        run_configs = json.load(f)

## Dataset:
1. `X` and `y`: are the same as always, features matrix and label vector. The labels in `y` must be sequential integers starting in zero following an increasing order of relevance.
2. `q`: is a vector (with the same format as `y`) where each element $q_{i}$ is a string identifier (not necessarily the query text, you can use a hashcode) indicating the query for the observations $X_{i}$ and $y_{i}$. So, the dataset must be created in a way that a label is assigned to each `(query,item)` pair, $(q_{i},X_{i}) \rightarrow y_{i}$.  
    Note that elements in `q` should repeat, since LambdaRank only makes pairwise comparisons of items associated with the same query.
3. `ps`: stands for propensity score (or soft label). It is a vector with the same size as `y` and has a similar meaning too. It is a different kind of label that can be used in training, unlike the labels in `y`, these labels can be floats. If you do not have this vector available, make sure to set the following in your config file:
    - `training_parameters > ps_rate` to `0.0`
    - `data_parameters > train_data > soft_label_file` the same `y_train` file
    - `data_parameters > test_data > soft_label_file` the same `y_test` file

In [4]:
# Load numpy data
train_data, test_data, vali_data    = load_numpy_data(run_configs["data_parameters"])
X_train, y_train, ps_train, q_train = train_data
X_test, y_test, ps_test, q_test     = test_data
X_vali, y_vali, ps_vali, q_vali     = vali_data

# Create torch data loaders
train_loader = dataloader(
    X_train,
    y_train,
    q_train,
    ps_train
)
test_loader = dataloader(
    X_test,
    y_test,
    q_test,
    ps_test
)
vali_loader = dataloader(
    X_vali,
    y_vali,
    q_vali,
    ps_vali
)

In [5]:
X_train.shape,y_train.shape,q_train.shape

((15623, 46), (15623,), (15623,))

In [6]:
import numpy as np
# Number of queries
np.unique(q_train).shape

(693,)

In [7]:
X_train.shape[0]/np.unique(q_train).shape[0]

22.544011544011543

## Model

In [8]:
 # Load model and optimizer
model, optimizer = load_model_and_optim(
    input_dim = train_loader.dataset.width,
    model_configs = run_configs["model_configs"],
    optimizer_configs = run_configs["optimizer_configs"],        
    device = device,
    ngpu = ngpu
)

In [9]:
count_trainable_parameters(model)

16897

In [10]:
model

Sequential(
  (0): Linear(in_features=46, out_features=128, bias=True)
  (1): Dropout(p=0.1, inplace=False)
  (2): LeakyReLU(negative_slope=0.01)
  (3): Linear(in_features=128, out_features=64, bias=True)
  (4): Dropout(p=0.1, inplace=False)
  (5): LeakyReLU(negative_slope=0.01)
  (6): Linear(in_features=64, out_features=32, bias=True)
  (7): Dropout(p=0.1, inplace=False)
  (8): LeakyReLU(negative_slope=0.01)
  (9): Linear(in_features=32, out_features=16, bias=True)
  (10): Dropout(p=0.1, inplace=False)
  (11): LeakyReLU(negative_slope=0.01)
  (12): Linear(in_features=16, out_features=1, bias=True)
)

In [9]:
optimizer

Adam (
Parameter Group 0
    amsgrad: False
    betas: [0.9, 0.999]
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0001
    maximize: False
    weight_decay: 1e-06
)

## Training

Note: We will run the model trainer step-by-step here. 

Note: This is for a binary label case.
    
Definitions and utilities:
    
- $\alpha \in \mathbb{R}$
- $s_{i} = f(x_{i})$
- $S_{ij} \in \left \{0, \pm 1 \right \}$
- $\bar{P}_{ij} = \frac{1}{2} \left( 1 + S_{ij} \right)$
- $1 -\bar{P}_{ij} = \frac{1}{2} \left( 1 - S_{ij} \right)$
- $P_{ij} = \sigma \left( s_{i},s_{j} \right) = \frac{1}{1+e^{-\alpha \left(s_{i} - s_{j} \right)}}$
- $\log P_{ij} = - \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right)$
- $1- P_{ij} = P_{ji} = \frac{1}{1+e^{\alpha \left(s_{i} - s_{j} \right)}} = \frac{ e^{-\alpha \left(s_{i} - s_{j} \right)} }{1+e^{-\alpha \left(s_{i} - s_{j} \right)}}$
- $\log \left( 1 - P_{ij} \right) = - \alpha \left(s_{i} - s_{j} \right) - \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right)$
    

The binary cross-entropy loss assumes the form:

\begin{align*}C &= - \bar{P}_{ij} \log{P_{ij}} - \left(1 - \bar{P}_{ij} \right) \log{ \left(1 - P_{ij} \right ) } \\  &= \bar{P}_{ij} \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) - \left(1 - \bar{P}_{ij} \right) \left[ - \alpha \left(s_{i} - s_{j} \right) - \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) \right] \\  &= \bar{P}_{ij} \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) - \bar{P}_{ij} \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) + \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) + \alpha \left(s_{i} - s_{j} \right) \left( 1 - \bar{P}_{ij} \right ) \\ &= \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) + \alpha \left(s_{i} - s_{j} \right) \left( 1 - \bar{P}_{ij} \right ) \\ &= \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) + \frac{1}{2} \left( 1 - S_{ij} \right ) \alpha \left(s_{i} - s_{j} \right) \\\end{align*}


We can have the following situations:
- If $S_{ij} = 0, \implies C(S_{ij}=0) = \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) + \frac{1}{2} \alpha \left(s_{i} - s_{j} \right)$

- If $S_{ij} = -1, \implies C(S_{ij}=-1) = \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) + 1 \alpha \left(s_{i} - s_{j} \right)$

- If $S_{ij} = +1, \implies C(S_{ij}=+1) = \log \left( 1+e^{-\alpha \left(s_{i} - s_{j} \right)} \right) + 0\alpha \left(s_{i} - s_{j} \right)$ 

In order to use gradient descent, we need to compute it w.r.t. the model's parameters $\theta_{k}$ to update them:
    
$$
\theta_{k} \leftarrow \theta_{k} - \gamma \nabla_{\theta_{k}}C
$$

We first need to compute the derivative of the cost w.r.t. the model's outputs:

$$
\frac{\partial C}{\partial s_{i}} = \alpha \left( \frac{1}{2}\left( 1 - S_{ij} \right) - \frac{1}{1 + e^{ \alpha \left( s_{i} - s_{j} \right)}}\right) = - \frac{\partial C}{\partial s_{j}}
$$

And since the outputs are functions of the parameters $\theta$, the gradient is

$$
\begin{align*}
\nabla_{\theta_{k}}C = \frac{\partial C}{\partial \theta_{k}} &= \frac{\partial C}{\partial s_{i}}\frac{\partial s_{i}}{\partial \theta_{k}} + \frac{\partial C}{\partial s_{j}}\frac{\partial s_{j}}{\partial \theta_{k}} \\
 &=  \frac{\partial C}{\partial s_{i}} \left( \frac{\partial s_{i}}{\partial \theta_{k}} -\frac{\partial s_{j}}{\partial \theta_{k}} \right)\\
 &= \left[ \alpha \left( \frac{1}{2}\left( 1 - S_{ij} \right) - \frac{1}{1 + e^{ \alpha \left( s_{i} - s_{j} \right)}}\right) \right] \left( \frac{\partial s_{i}}{\partial \theta_{k}} -\frac{\partial s_{j}}{\partial \theta_{k}} \right).
\end{align*}
$$

If were not for the term in the brackets this would look like a conventional gradient. This term is called `lambdas`.

In [11]:
writer = SummaryWriter(log_dir=f"runs/{run_configs['run_name']}")
print(f"Number of trainable parameters: {count_trainable_parameters(model)}")
writer.add_text('Configs', str(run_configs))

Number of trainable parameters: 16897


In [78]:
from torch.optim.lr_scheduler import ExponentialLR
import numpy as np
from lambdarank.metrics import max_dcg_k, eval_model, compute_ndcg
from lambdarank.trainer import sample_per_query_data, max_ndcg_tensor
from tqdm import tqdm

In [12]:
# Just a helping function
def format_data(data):
    X, y, ps = data
    return X[0], y[0], ps[0]

Let us get some data example and work with it:

In [22]:
alpha = model.alpha
for query_epoch, data in enumerate(train_loader):
    X, y, _ = format_data(data)
    break

In [29]:
X_sample, y_sample, _ = sample_per_query_data(X,y,y,per_query_sample_size=5)
X_sample.shape,y_sample.shape

(torch.Size([5, 46]), torch.Size([5]))

Now, we need to compute the following:

$$\lambda_{ij} = \alpha \left( \frac{1}{2}\left( 1 - S_{ij} \right) - \frac{1}{1 + e^{ \alpha \left( s_{i} - s_{j} \right)}}\right) \left| \Delta NDCG \right|_{ij}$$

$$S_{ij} = ?$$

$$\left| \Delta NDCG \right|_{ij}=?$$

In [30]:
y_sample

tensor([0., 2., 1., 0., 0.])

In [32]:
s = model(X_sample.to(device))
s

tensor([[-0.0672],
        [-0.1669],
        [-0.1705],
        [ 0.1853],
        [-0.1285]], grad_fn=<AddmmBackward0>)

In [81]:
for k in range(1,7):
    print(f"k={k} -> {compute_ndcg(s,y_sample,k):.4f}")

k=1 -> 0.0000
k=2 -> 0.0000
k=3 -> 0.0000
k=4 -> 0.3274
k=5 -> 0.3274
k=6 -> 0.3274


In [35]:
Pji = 1.0 / (1.0 + torch.exp(alpha * (s - s.t())))
Pji

tensor([[0.5000, 0.4751, 0.4742, 0.5628, 0.4847],
        [0.5249, 0.5000, 0.4991, 0.5872, 0.5096],
        [0.5258, 0.5009, 0.5000, 0.5880, 0.5105],
        [0.4372, 0.4128, 0.4120, 0.5000, 0.4222],
        [0.5153, 0.4904, 0.4895, 0.5778, 0.5000]], grad_fn=<MulBackward0>)

Relevance difference or Gain difference: $S_{ij}$
If label if binary, relevance is equal gain.

In [36]:
# Relevance difference
rel_diff = y_sample.view(-1,1) - y_sample.view(-1,1).t()
rel_diff

tensor([[ 0., -2., -1.,  0.,  0.],
        [ 2.,  0.,  1.,  2.,  2.],
        [ 1., -1.,  0.,  1.,  1.],
        [ 0., -2., -1.,  0.,  0.],
        [ 0., -2., -1.,  0.,  0.]])

In [45]:
# Gain difference
gain_diff = model.gain[y_sample.long()].reshape(-1,1) - model.gain[y_sample.long()].reshape(1,-1)
gain_diff

tensor([[ 0, -3, -1,  0,  0],
        [ 3,  0,  2,  3,  3],
        [ 1, -2,  0,  1,  1],
        [ 0, -3, -1,  0,  0],
        [ 0, -3, -1,  0,  0]])

In [46]:
Sij = (rel_diff > 0).int() - (rel_diff < 0).int()
Sij

tensor([[ 0, -1, -1,  0,  0],
        [ 1,  0,  1,  1,  1],
        [ 1, -1,  0,  1,  1],
        [ 0, -1, -1,  0,  0],
        [ 0, -1, -1,  0,  0]], dtype=torch.int32)

**Compute $\left| \Delta NDCG \right|_{ij}$**:

Sort list by $s_{i}$

In [47]:
rank_order = (s.reshape(-1).argsort(descending=True).argsort() + 1).reshape(-1, 1)
rank_order

tensor([[2],
        [4],
        [5],
        [1],
        [3]])

Score decay by position based on the prediction order

In [49]:
decay_diff = (1.0 / torch.log2(rank_order + 1.0)) - (1.0 / torch.log2(rank_order.t() + 1.0))
decay_diff

tensor([[ 0.0000,  0.2003,  0.2441, -0.3691,  0.1309],
        [-0.2003,  0.0000,  0.0438, -0.5693, -0.0693],
        [-0.2441, -0.0438,  0.0000, -0.6131, -0.1131],
        [ 0.3691,  0.5693,  0.6131,  0.0000,  0.5000],
        [-0.1309,  0.0693,  0.1131, -0.5000,  0.0000]])

Compute Max DCG normalization factor

In [67]:
k = 5
discount = torch.log2(torch.arange(start=1, end=k+1, step=1) + 1)
discount

tensor([1.0000, 1.5850, 2.0000, 2.3219, 2.5850])

In [68]:
sorted_label_tensor = model.gain[y_sample.long()].sort(descending=True)[0][:k]
sorted_label_tensor

tensor([3, 1, 0, 0, 0])

In [69]:
sorted_label_tensor / discount

tensor([3.0000, 0.6309, 0.0000, 0.0000, 0.0000])

In [70]:
max_ndcg_value = (sorted_label_tensor / discount).sum()
max_ndcg_value, 1.0/max_ndcg_value

(tensor(3.6309), tensor(0.2754))

In [122]:
for k in range(1,7):
    ndcg_norm = 1.0 / max_ndcg_tensor(y_sample,k,model.gain)
    print(f"k:{k} -> {ndcg_norm.item():.4f}")

k:1 -> 0.3333
k:2 -> 0.2754
k:3 -> 0.2754
k:4 -> 0.2754
k:5 -> 0.2754
k:6 -> 0.2754


In [74]:
k = 5
ndcg_norm = 1.0 / max_ndcg_tensor(y_sample,k,model.gain)
ndcg_norm

tensor(0.2754)

$\left| \Delta NDCG \right|_{ij}$:

In [75]:
delta_ndcg = torch.abs(ndcg_norm * gain_diff.to(device) * decay_diff)
delta_ndcg

tensor([[0.0000, 0.1655, 0.0672, 0.0000, 0.0000],
        [0.1655, 0.0000, 0.0241, 0.4704, 0.0573],
        [0.0672, 0.0241, 0.0000, 0.1689, 0.0312],
        [0.0000, 0.4704, 0.1689, 0.0000, 0.0000],
        [0.0000, 0.0573, 0.0312, 0.0000, 0.0000]])

Finally compute $\lambda_{ij}= \alpha \left( \frac{1}{2}\left( 1 - S_{ij} \right) - \frac{1}{1 + e^{ \alpha \left( s_{i} - s_{j} \right)}}\right) \left| \Delta NDCG \right|_{ij}$:

In [91]:
lambda_matrix = (alpha * delta_ndcg) * (0.5 * (1 - Sij.to(device)) - Pji).detach()
lambda_matrix

tensor([[ 0.0000,  0.0869,  0.0353, -0.0000,  0.0000],
        [-0.0869,  0.0000, -0.0120, -0.2762, -0.0292],
        [-0.0353,  0.0120,  0.0000, -0.0993, -0.0159],
        [ 0.0000,  0.2762,  0.0993,  0.0000,  0.0000],
        [-0.0000,  0.0292,  0.0159, -0.0000,  0.0000]])

In [76]:
lambda_update = (
    (alpha * delta_ndcg) * (0.5 * (1 - Sij.to(device)) - Pji)
).sum(dim=1,keepdim=True)
lambda_update

tensor([[ 0.1222],
        [-0.4043],
        [-0.1385],
        [ 0.3755],
        [ 0.0451]], grad_fn=<SumBackward1>)

Now we can use the lambda to update the model's weights:

In [77]:
s.backward(lambda_update)
optimizer.step()
model.zero_grad()

In [82]:
with torch.no_grad():
    s_new = model(X_sample.to(device))

In [87]:
y_sample

tensor([0., 2., 1., 0., 0.])

In [96]:
# Old value
s.detach()

tensor([[-0.0672],
        [-0.1669],
        [-0.1705],
        [ 0.1853],
        [-0.1285]])

In [85]:
s_new

tensor([[ 0.1748],
        [-0.2454],
        [ 0.1484],
        [-0.0591],
        [-0.2722]])

In [97]:
s_new - s.detach()

tensor([[ 0.2420],
        [-0.0784],
        [ 0.3189],
        [-0.2443],
        [-0.1437]])

In [95]:
rank_order = (s_new.reshape(-1).argsort(descending=True).argsort() + 1).reshape(-1, 1)
rank_order

tensor([[1],
        [4],
        [2],
        [3],
        [5]])

In [83]:
for k in range(1,7):
    print(f"k={k} -> {compute_ndcg(s_new,y_sample,k):.4f}")

k=1 -> 0.0000
k=2 -> 0.2398
k=3 -> 0.2398
k=4 -> 0.5672
k=5 -> 0.5672
k=6 -> 0.5672


## Training algorithm

Putting it all together:

```python
def train_model(model,optimizer,train_loader,test_loader,epochs,batch_size,ps_rate,per_query_sample_size,device,writer):
    """
    Trains a ranking model using a Learn to Rank approach, evaluating its performance over a specified number of epochs.

    Parameters
    ----------
    model : torch.nn.Module
        The neural network model to be trained.
    optimizer : torch.optim.Optimizer
        The optimization algorithm used to update model parameters.
    train_loader : torch.utils.data.DataLoader
        DataLoader for the training dataset, which yields batches of data.
    test_loader : torch.utils.data.DataLoader
        DataLoader for the testing dataset, used for evaluation.
    epochs : int
        Number of full training cycles on the entire dataset.
    batch_size : int
        Number of queries per batch for updating model parameters.
    ps_rate : float
        Probability rate at which the sampled data uses propensity scores rather than actual relevance scores.
    per_query_sample_size : int
        The number of instances to sample per query for training.
    device : torch.device
        The device (CPU or GPU) on which the computations will be performed.
    writer : torch.utils.tensorboard.SummaryWriter
        A writer for logging metrics and training progress to TensorBoard.

    Returns
    -------
    model : torch.nn.Module
        The trained model.

    Notes
    -----
    This function trains a ranking model via grouping query results, applying preference pairs, and optimizing with gradient updates.
    Propensity-based sampling and relevance-based sampling are used interchangeably based on the ps_rate.
    Performance metrics for each epoch are logged using TensorBoard.

    The function is part of a machine learning pipeline for ranking tasks in an e-commerce or similar environment where models
    are trained to automatically rank items based on predicted scores generated from user interaction data.
    """
    alpha = model.alpha
    scheduler = ExponentialLR(optimizer, gamma=0.9)
    for epoch in tqdm(range(1,epochs+1)):
        grad_batch, y_pred_batch = [], []
        query_count = 0
        for query_epoch, train_data in enumerate(train_loader):
            X_train_, y_train_, ps_train_ = train_data
            X_train_, y_train_, ps_train_ = X_train_[0], y_train_[0], ps_train_[0]
            if y_train_.sum() == 0.0:
                continue
            
            X_train,y_train,ps_train = sample_per_query_data(X_train_,y_train_,ps_train_,per_query_sample_size)

            y_pred = model(X_train.to(device))
            y_pred_batch.append(y_pred)
            
            with torch.no_grad():
                if np.random.rand() < ps_rate:
                    rel_diff = ps_train.view(-1,1) - ps_train.view(-1,1).t()
                    gain_diff = rel_diff
                else:
                    rel_diff = y_train.view(-1,1) - y_train.view(-1,1).t()
                    gain_diff = model.gain[y_train.long()].reshape(-1,1) - model.gain[y_train.long()].reshape(1,-1)
                    
                Sij = (rel_diff > 0).int() - (rel_diff < 0).int()
                Pji = 1.0 / (1.0 + torch.exp(alpha * (y_pred - y_pred.t())))

                delta_ndcg = compute_delta_ndcg(y_pred,y_train,gain_diff,len(y_train),model.gain,device)
                lambda_update = compute_lambda(alpha,delta_ndcg,Sij,Pji,device)
                grad_batch.append(lambda_update)
                
                writer.add_scalar("Lambda/train", lambda_update.detach().mean(), epoch * (query_epoch+1))
                writer.add_scalar("Pji/train", Pji.detach().mean(), epoch * (query_epoch+1))
                writer.add_scalar("Abs_delta_nDCG/train", delta_ndcg.detach().mean(), epoch * (query_epoch+1))
            
            query_count += 1
            if query_count % batch_size == 0:
                for grad,y_pred in zip(grad_batch,y_pred_batch):
                    y_pred.backward(grad)
                optimizer.step()
                model.zero_grad()
                grad_batch, y_pred_batch = [], []
        
        for k in model.eval_at:
            writer.add_scalar(f"nDCG@{k}/train", eval_model(model,train_loader,k,device), epoch)
            writer.add_scalar(f"nDCG@{k}/test", eval_model(model,test_loader,k,device), epoch)
    
        scheduler.step()
        writer.add_scalar(f"LR", scheduler.get_last_lr()[0], epoch)
    return model
```

## Running experiments

In [9]:
!python app.py --config config_1.json

Device: cpu
Number of trainable parameters: 16897
100%|█████████████████████████████████████████████| 1/1 [00:13<00:00, 13.59s/it]
ndcg@1   -> train: 0.3990 | test: 0.3810 | vali: 0.4222
ndcg@3   -> train: 0.4260 | test: 0.4433 | vali: 0.4730
ndcg@10  -> train: 0.5587 | test: 0.5732 | vali: 0.6085
ndcg@50  -> train: 0.6365 | test: 0.6503 | vali: 0.6743


In [10]:
!python app.py --config config_2.json

Device: cpu
Number of trainable parameters: 9793
100%|███████████████████████████████████████████| 10/10 [02:05<00:00, 12.55s/it]
ndcg@1   -> train: 0.5325 | test: 0.5357 | vali: 0.5133
ndcg@3   -> train: 0.5718 | test: 0.5666 | vali: 0.5766
ndcg@10  -> train: 0.6900 | test: 0.6829 | vali: 0.7035
ndcg@50  -> train: 0.7408 | test: 0.7187 | vali: 0.7489


In [11]:
!python app.py --config config_3.json

Device: cpu
Number of trainable parameters: 417
100%|█████████████████████████████████████████████| 1/1 [00:11<00:00, 11.33s/it]
ndcg@1   -> train: 0.5029 | test: 0.5262 | vali: 0.4800
ndcg@3   -> train: 0.5483 | test: 0.5677 | vali: 0.5509
ndcg@10  -> train: 0.6699 | test: 0.6787 | vali: 0.6708
ndcg@50  -> train: 0.7269 | test: 0.7308 | vali: 0.7173


## Load and check model

In [14]:
def max_ndcg_tensor(y,k,gain=None):
    k = min(len(y),k)
    discount = torch.log2(torch.arange(start=1, end=k+1, step=1) + 1)
    
    if gain is None:
        sorted_label_tensor = y.sort(descending=True)[0][:k]
    else:
        sorted_label_tensor = gain[y.long()].sort(descending=True)[0][:k]
    
    return (sorted_label_tensor / discount).sum()

def compute_block_ndcg(y_pred,y,k,gain,device):
    rank_order = (y_pred.reshape(-1).argsort(descending=True).argsort() + 1).reshape(-1, 1)
    rank_order_idx = torch.argwhere(rank_order<=k)[:,0]
    rank_order_ = rank_order[rank_order_idx]
    decay = 1.0 / torch.log2(rank_order_ + 1.0)
    max_ndcg = max_ndcg_tensor(y,k,gain)
    if max_ndcg == torch.tensor(0.0):
        ndcg_norm = torch.tensor(0.0)
    else:
        ndcg_norm = 1.0 / max_ndcg
    if gain is None:
        block_gain = y[rank_order_idx].view(-1,1)
    else:
        block_gain = gain[y[rank_order_idx].long()].reshape(-1,1)
    ndcg = ndcg_norm * (block_gain * decay).sum()
    return ndcg

In [15]:
config_file_name = "config_3.json"
with open(os.path.join("./configs",config_file_name), 'r') as f:
        run_configs = json.load(f)

# Load model and optimizer
model, optimizer = load_model_and_optim(
    input_dim = train_loader.dataset.width,
    model_configs = run_configs["model_configs"],
    optimizer_configs = run_configs["optimizer_configs"],        
    device = device,
    ngpu = ngpu
)

In [16]:
model.load_state_dict(torch.load("./models/ranker_03",weights_only=True))

<All keys matched successfully>

In [17]:
vali_ndcg,random_ndcg = [[],[]]
k = 10
with torch.no_grad():
    for _, data in enumerate(vali_loader):
        X, y, _ = format_data(data)
        preds = model(X.to(device)).detach().cpu()
        random_preds = torch.randn((len(y),1))
        vali_ndcg.append(compute_block_ndcg(preds,y,k,model.gain,device).item())
        random_ndcg.append(compute_block_ndcg(random_preds,y,k,model.gain,device).item())

In [21]:
print(f"ndcg@{k:<4}-> validation: {np.mean(vali_ndcg):.4f} | random sort: {np.mean(random_ndcg):.4f}")

ndcg@10  -> validation: 0.7006 | random sort: 0.5315
