# TSLib for v2 - Example notebook for full pipeline

## Basic imports for getting started

This notebook is a basic vignette for the usage of the `tslib` data module on the `TimeXer` model for the v2 of PyTorch Forecasting. This is an experimental version and is an unstable version of the API.

Feedback and suggestions on this pipeline - PR [#1836](https://github.com/sktime/pytorch-forecasting/pull/1836)

In [1]:
from typing import Any, Optional, Union

import numpy as np
import pandas as pd
from sklearn.preprocessing import RobustScaler, StandardScaler
import torch
from torch.optim import Optimizer
from torch.utils.data import Dataset

from pytorch_forecasting.data._tslib_data_module import TslibDataModule
from pytorch_forecasting.data.encoders import (
    EncoderNormalizer,
    NaNLabelEncoder,
    TorchNormalizer,
)
from pytorch_forecasting.data.timeseries import TimeSeries
from pytorch_forecasting.models.timexer._timexer_v2 import TimeXer

## Construct a time series dataset

This step requires us to build a `TimeSeries` object for creating a time series dataset, which identifies the features from a raw time series dataset. As you can see below, we are initialising a sample time series dataset.

In [2]:
num_series = 100
seq_length = 50
data_list = []
for i in range(num_series):
    x = np.arange(seq_length)
    y = np.sin(x / 5.0) + np.random.normal(scale=0.1, size=seq_length)
    category = i % 5
    static_value = np.random.rand()
    for t in range(seq_length - 1):
        data_list.append(
            {
                "series_id": i,
                "time_idx": t,
                "x": y[t],
                "y": y[t + 1],
                "category": category,
                "future_known_feature": np.cos(t / 10),
                "static_feature": static_value,
                "static_feature_cat": i % 3,
            }
        )
data_df = pd.DataFrame(data_list)
data_df.head()

Unnamed: 0,series_id,time_idx,x,y,category,future_known_feature,static_feature,static_feature_cat
0,0,0,-0.033191,0.22982,0,1.0,0.494593,0
1,0,1,0.22982,0.461287,0,0.995004,0.494593,0
2,0,2,0.461287,0.538736,0,0.980067,0.494593,0
3,0,3,0.538736,0.836834,0,0.955336,0.494593,0
4,0,4,0.836834,0.770511,0,0.921061,0.494593,0


## Feature Categories and Definitions

### **`time_idx`**
- **Definition**: The temporal index column that orders observations chronologically
- **Example**: Sequential time steps (0, 1, 2, ...) or timestamps
- **Usage**: Identifies the temporal ordering of data points within each time series

### **`target`** 
- **Definition**: The variable you want to predict/forecast
- **Example**: Sales volume, stock price, temperature readings
- **Usage**: The dependent variable that the model learns to forecast

### **`group`**
- **Definition**: Categorical variables that identify different time series entities
- **Example**: `series_id`, `store_id`, `product_id`, `customer_id`
- **Usage**: Distinguishes between multiple time series in the dataset

### **`num`**
- **Definition**: Numerical/continuous features used as model inputs
- **Example**: Price, quantity, weather data, economic indicators  
- **Usage**: Continuous variables that provide numerical context for predictions

### **`cat`**
- **Definition**: Categorical features that represent discrete classes or labels
- **Example**: Product category, day of week, seasonal indicators, region
- **Usage**: Discrete variables that provide categorical context for predictions

### **`known`**
- **Definition**: Future values that are known at prediction time (exogenous variables)
- **Example**: Holidays, planned promotions, scheduled events, calendar features
- **Usage**: Information available for both historical and future periods

### **`unknown`**
- **Definition**: Variables only available during training/historical periods
- **Example**: Past weather conditions, historical prices, competitor actions
- **Usage**: Features that help with training but aren't available for future predictions

### **`static`**
- **Definition**: Time-invariant features that remain constant for each time series
- **Example**: Store size, product attributes, geographic location, customer demographics
- **Usage**: Entity-specific characteristics that don't change over time

In [20]:
dataset = TimeSeries(
    data=data_df,
    time="time_idx",
    target="y",
    group=["series_id"],
    num=["x", "future_know_feature", "static_feature"],
    cat=["category", "static_feature_cat"],
    known=["future_known_feature"],
    unknown=["x", "category"],
    static=["static_feature", "static_feature_cat"],
)

  warn(


## Initialise the `TslibDataModule` using the dataset

This steps initialises a basic data module built specially for `tslib` modules and provides all the metadata required to train and implement the `tslib` of your choice!
You can refer the implementation for `TslibDataModule` for more information.

In [4]:
data_module = TslibDataModule(
    time_series_dataset=dataset,
    context_length=30,
    prediction_length=1,
    add_relative_time_idx=True,
    target_normalizer=TorchNormalizer(),
    categorical_encoders={
        "category": NaNLabelEncoder(add_nan=True),
        "static_feature_cat": NaNLabelEncoder(add_nan=True),
    },
    scalers={
        "x": StandardScaler(),
        "future_known_feature": StandardScaler(),
        "static_feature": StandardScaler(),
    },
    batch_size=32,
)



In [5]:
data_module.metadata

{'feature_names': {'categorical': ['category', 'static_feature_cat'],
  'continuous': ['x', 'future_known_feature', 'static_feature'],
  'static': ['static_feature', 'static_feature_cat'],
  'known': ['future_known_feature'],
  'unknown': ['x', 'category', 'static_feature', 'static_feature_cat'],
  'target': ['y'],
  'all': ['x',
   'category',
   'future_known_feature',
   'static_feature',
   'static_feature_cat'],
  'static_categorical': ['static_feature_cat'],
  'static_continuous': ['static_feature']},
 'feature_indices': {'categorical': [1, 4],
  'continuous': [0, 2, 3],
  'static': [],
  'known': [2],
  'unknown': [0, 1, 3, 4],
  'target': [0]},
 'n_features': {'categorical': 2,
  'continuous': 3,
  'static': 2,
  'known': 1,
  'unknown': 4,
  'target': 1,
  'all': 5,
  'static_categorical': 1,
  'static_continuous': 1},
 'context_length': 30,
 'prediction_length': 1,
 'freq': 'h',
 'features': 'MS'}

## Initialise the model

We shall try out two versions of this model, one using `MAE()` and one with `QuantileLoss()`.

Let us quickly import the required packages for the next steps.

In [22]:
import torch.nn as nn

from pytorch_forecasting.metrics import MAE, SMAPE, QuantileLoss

In [23]:
model1 = TimeXer(
    loss=nn.MSELoss(),
    hidden_size=64,
    nhead=4,
    e_layers=2,
    d_ff=256,
    dropout=0.1,
    patch_length=4,
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={
        "mode": "min",
        "factor": 0.5,
        "patience": 5,
    },
    metadata=data_module.metadata,
)

  warn(
  warn(
  warn.warn(
  warn.warn(


In [8]:
model2 = TimeXer(
    loss=QuantileLoss(quantiles=[0.1, 0.5, 0.9]),  # quantiles of 0.1, 0.5 and 0.9 used.
    hidden_size=64,
    nhead=4,
    e_layers=2,
    d_ff=256,
    dropout=0.1,
    patch_length=4,
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={
        "mode": "min",
        "factor": 0.5,
        "patience": 5,
    },
    metadata=data_module.metadata,
)

In [None]:
from lightning.pytorch import Trainer

trainer1 = Trainer(
    max_epochs=5,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    enable_model_summary=True,
)

trainer2 = Trainer(
    max_epochs=4,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    enable_model_summary=True,
)

Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


## Fit the trainer on the model and feed data using the data module

In [10]:
trainer1.fit(model1, data_module)

You are using a CUDA device ('NVIDIA GeForce RTX 3050 6GB Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name         | Type                   | Params | Mode 
----------------------------------------------------------------
0 | loss         | MSELoss                | 0      | train
1 | en_embedding | EnEmbedding            | 320    | train
2 | ex_embedding | DataEmbedding_inverted | 2.0 K  | train
3 | encoder      | Encoder                | 133 K  | train
4 | head         | FlattenHead            | 513    | train
----------------------------------------------------------------
136 K     Trainable params
0         Non-trainable params
136 K     Total params
0.546     Tot

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

c:\Users\prana\Desktop\code\pytorch-forecasting\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
c:\Users\prana\Desktop\code\pytorch-forecasting\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
c:\Users\prana\Desktop\code\pytorch-forecasting\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py:310: The number of training batches (42) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=5` reached.


Now let us train the model using `QuantileLoss`.

In [11]:
trainer2.fit(model2, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name         | Type                   | Params | Mode 
----------------------------------------------------------------
0 | loss         | QuantileLoss           | 0      | train
1 | en_embedding | EnEmbedding            | 320    | train
2 | ex_embedding | DataEmbedding_inverted | 2.0 K  | train
3 | encoder      | Encoder                | 133 K  | train
4 | head         | FlattenHead            | 1.5 K  | train
----------------------------------------------------------------
137 K     Trainable params
0         Non-trainable params
137 K     Total params
0.550     Total estimated model params size (MB)
57        Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

c:\Users\prana\Desktop\code\pytorch-forecasting\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
c:\Users\prana\Desktop\code\pytorch-forecasting\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
c:\Users\prana\Desktop\code\pytorch-forecasting\.venv\Lib\site-packages\lightning\pytorch\loops\fit_loop.py:310: The number of training batches (42) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=4` reached.


## Test the model

In [12]:
test_metrics = trainer1.test(model1, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
c:\Users\prana\Desktop\code\pytorch-forecasting\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.


Testing: |          | 0/? [00:00<?, ?it/s]

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_MAE            0.46785134077072144
       test_SMAPE           1.0638009309768677
        test_loss          0.014495044946670532
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


In [13]:
model1.eval()

TimeXer(
  (loss): MSELoss()
  (en_embedding): EnEmbedding(
    (value_embedding): Linear(in_features=4, out_features=64, bias=False)
    (position_embedding): PositionalEmbedding()
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (ex_embedding): DataEmbedding_inverted(
    (value_embedding): Linear(in_features=30, out_features=64, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (layers): ModuleList(
      (0-1): 2 x EncoderLayer(
        (self_attention): AttentionLayer(
          (inner_attention): FullAttention(
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (query_projection): Linear(in_features=64, out_features=64, bias=True)
          (key_projection): Linear(in_features=64, out_features=64, bias=True)
          (value_projection): Linear(in_features=64, out_features=64, bias=True)
          (out_projection): Linear(in_features=64, out_features=64, bias=True)
        )
        (cross_attention): AttentionLayer(
 

In [14]:
with torch.no_grad():
    test_batch = next(iter(data_module.test_dataloader()))
    x_test, y_test = test_batch
    y_pred = model1(x_test)

    print("Prediction:", y_pred["prediction"])

Prediction: tensor([[[-3.8579e-02]],

        [[ 1.3515e-01]],

        [[ 2.7090e-01]],

        [[ 4.3945e-01]],

        [[ 5.7105e-01]],

        [[ 7.0694e-01]],

        [[ 8.1090e-01]],

        [[ 8.7570e-01]],

        [[ 9.0934e-01]],

        [[ 9.0872e-01]],

        [[ 8.6581e-01]],

        [[ 7.9358e-01]],

        [[ 6.9972e-01]],

        [[ 5.8747e-01]],

        [[ 4.4550e-01]],

        [[ 2.9315e-01]],

        [[ 1.5351e-01]],

        [[-5.8678e-04]],

        [[-1.5129e-01]],

        [[ 1.4533e-02]],

        [[ 1.7025e-01]],

        [[ 3.5256e-01]],

        [[ 5.0771e-01]],

        [[ 6.4501e-01]],

        [[ 7.4584e-01]],

        [[ 8.4855e-01]],

        [[ 8.7391e-01]],

        [[ 9.2469e-01]],

        [[ 8.8924e-01]],

        [[ 8.6606e-01]],

        [[ 7.7753e-01]],

        [[ 6.8279e-01]]])


In [15]:
y_pred["prediction"].shape

torch.Size([32, 1, 1])

Let us do the same for `QuantileLoss` predictions.

In [16]:
test_metrics = trainer2.test(model2, data_module)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
c:\Users\prana\Desktop\code\pytorch-forecasting\.venv\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.


Testing: |          | 0/? [00:00<?, ?it/s]

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_MAE            14.947474479675293
       test_SMAPE            32.57101821899414
        test_loss            5.774611473083496
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


In [17]:
model2.eval()

TimeXer(
  (loss): QuantileLoss(quantiles=[0.1, 0.5, 0.9])
  (en_embedding): EnEmbedding(
    (value_embedding): Linear(in_features=4, out_features=64, bias=False)
    (position_embedding): PositionalEmbedding()
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (ex_embedding): DataEmbedding_inverted(
    (value_embedding): Linear(in_features=30, out_features=64, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): Encoder(
    (layers): ModuleList(
      (0-1): 2 x EncoderLayer(
        (self_attention): AttentionLayer(
          (inner_attention): FullAttention(
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (query_projection): Linear(in_features=64, out_features=64, bias=True)
          (key_projection): Linear(in_features=64, out_features=64, bias=True)
          (value_projection): Linear(in_features=64, out_features=64, bias=True)
          (out_projection): Linear(in_features=64, out_features=64, bias=True)
        )
        (cross

In [18]:
with torch.no_grad():
    test_batch = next(iter(data_module.test_dataloader()))
    x_test, y_test = test_batch
    y_pred = model2(x_test)

    print("Prediction:", y_pred["prediction"])

Prediction: tensor([[[[-0.1741, -0.0312,  0.2449]]],


        [[[-0.0194,  0.1198,  0.3921]]],


        [[[ 0.1472,  0.2544,  0.5401]]],


        [[[ 0.3183,  0.4101,  0.6707]]],


        [[[ 0.4626,  0.5497,  0.8223]]],


        [[[ 0.5880,  0.6819,  0.9794]]],


        [[[ 0.7212,  0.7909,  1.0700]]],


        [[[ 0.8104,  0.8627,  1.1342]]],


        [[[ 0.8615,  0.9050,  1.1836]]],


        [[[ 0.8919,  0.9103,  1.1939]]],


        [[[ 0.8414,  0.8754,  1.1404]]],


        [[[ 0.7774,  0.8125,  1.0497]]],


        [[[ 0.6535,  0.7326,  0.9382]]],


        [[[ 0.5000,  0.6076,  0.7917]]],


        [[[ 0.3172,  0.4677,  0.6275]]],


        [[[ 0.1383,  0.3008,  0.4571]]],


        [[[-0.0549,  0.1177,  0.2809]]],


        [[[-0.2488, -0.0911,  0.0679]]],


        [[[-0.4082, -0.2451, -0.0699]]],


        [[[-0.2056, -0.0571,  0.2309]]],


        [[[-0.0128,  0.0945,  0.3519]]],


        [[[ 0.1674,  0.2839,  0.5486]]],


        [[[ 0.3257,  0.4233,  0.7065]]],



In [19]:
y_pred["prediction"].shape

torch.Size([32, 1, 1, 3])