# `pytorch-forecasting v2` Model Training and Inference - Beta API

<div class="alert alert-block alert-info">
:warning: The vignette showcased here is part of an experimental rework of the `pytorch-forecasting` data layer, planned for release in v2.0.0. The API is currently unstable and subject to change without prior notice.

Feedback and suggestions are highly encouraged — please share them in <a href="https://github.com/sktime/pytorch-forecasting/issues/1736">issue 1736</a>.
</div>


In this vignette, we demonstrate how to train and evaluate the **Temporal Fusion Transformer (TFT)** using the new `TimeSeries` and `DataModule` API from the v2 pipeline.


## Steps

1. **Load Data**  
2. **Create Dataset & DataModule**  
3. **Initialize, Train & Run Inference with the Model**



### Load Data

We generate a synthetic dataset using `load_toydata` which returns a `pandas` DataFrame with purely numerical values.  
*(Note: The current pipeline assumes all inputs are numerical only.)*




###  Create Dataset & DataModule

- `TimeSeries` returns the raw data in terms of tensors .
- `DataModule` wraps the dataset, handles splits, preprocessing, batching, and exposes `metadata` for the model initialisation.



### Initialize the Model

We initialize the TFT model using the `metadata` provided by the `DataModule`. This metadata includes all required dimensional info for the encoder, decoder, and static inputs.



### Train the Model

We use a `Trainer` from PyTorch Lightning to train the model

### Run Inference

After training, we can make predictions using the trained model


# 1. Load Data
We generate a synthetic dataset using `load_toydata` that creates a `pandas` DataFrame with just numerical values as for now **the pipeline assumes the data to be numerical only**.

In [1]:
from pytorch_forecasting.data.examples import load_toydata

  from tqdm.autonotebook import tqdm


In [2]:
num_series = 100  # Number of individual time series to generate
seq_length = 50  # Length of each time series
data_df = load_toydata(num_series, seq_length)
data_df.head()

Unnamed: 0,series_id,time_idx,x,y,category,future_known_feature,static_feature,static_feature_cat
0,0,0,0.167712,0.172154,0,1.0,0.300509,0
1,0,1,0.172154,0.467233,0,0.995004,0.300509,0
2,0,2,0.467233,0.554952,0,0.980067,0.300509,0
3,0,3,0.554952,0.746529,0,0.955336,0.300509,0
4,0,4,0.746529,0.711745,0,0.921061,0.300509,0


# 2. Create the dataset and datamodule
We create a `TimeSeries` dataset instance that returns the raw data in terms of tensors, then this "raw data" is sent to the `data_module`that will internally handle the dataloaders and preprocessing

`TimeSeries` dataset's Key arguments:
- `data`: DataFrame with sequence data.
- `time`: integer typed column denoting the time index within `data`.
- `target`:  Column(s) in `data` denoting the forecasting target.
- `group`: List of column names identifying a time series instance within `data`.
- `num`: List of numerical features.
- `cat`: List of categorical features.
- `known`: Features known in future
- `unknown`: Features not known in the future
- `static`: List of variables that do not change over time,

In [3]:
from pytorch_forecasting.data.timeseries import TimeSeries

In [4]:
# create `TimeSeries` dataset that returns the raw data in terms of tensors
dataset = TimeSeries(
    data=data_df,
    time="time_idx",
    target="y",
    group=["series_id"],
    num=["x", "future_known_feature", "static_feature"],
    cat=["category", "static_feature_cat"],
    known=["future_known_feature"],
    unknown=["x", "category"],
    static=["static_feature", "static_feature_cat"],
)

  warn(


`EncoderDecoderTimeSeriesDataModule` key arguments:
- `time_series_dataset`: `TimeSeries` dataset instance
- `max_encoder_length` : Maximum length of the encoder input sequence.
- `max_prediction_length` : Maximum length of the decoder output sequence.
- `batch_size` : Batch size for DataLoader.
- `categorical_encoders` :  Dictionary of categorical encoders.
- `scalers` : Dictionary of feature scalers.
- `target_normalizer`: Normalizer for the target variable.

In [5]:
from sklearn.preprocessing import StandardScaler
from pytorch_forecasting.data.data_module import EncoderDecoderTimeSeriesDataModule
from pytorch_forecasting.data.encoders import (
    NaNLabelEncoder,
    TorchNormalizer,
)

In [6]:
# create the `data_module` that handles the dataloaders and preprocessing
data_module = EncoderDecoderTimeSeriesDataModule(
    time_series_dataset=dataset,
    max_encoder_length=30,
    max_prediction_length=1,
    batch_size=32,
    categorical_encoders={
        "category": NaNLabelEncoder(add_nan=True),
        "static_feature_cat": NaNLabelEncoder(add_nan=True),
    },
    scalers={
        "x": StandardScaler(),
        "future_known_feature": StandardScaler(),
        "static_feature": StandardScaler(),
    },
    target_normalizer=TorchNormalizer(),
)

  warn(


# 3. Initialise and train the model

To initialise the model you now don't have to pass arguments like `encoder_cont`, `decoder_cont` etc as they are calculated internally using the `metadata` property [[source]](https://github.com/sktime/pytorch-forecasting/blob/4a34931e499c2b59de3939fcffcaabd75204b045/pytorch_forecasting/data/data_module.py#L264-L292) of `EncoderDecoderTimeSeriesDataModule`. But you still have to pass other params like `loss`, `optimizer` etc


```python
model = TFT(
    loss=nn.MSELoss(),
    logging_metrics=[MAE(), SMAPE()],
    metadata=data_module.metadata,  # <-- crucial for model setup
    ...
)
```

The `metadata` includes:
- `max_encoder_length`, `max_prediction_length`
- number of continuous/categorical variables in encoder/decoder
- number of static features

These are used to configure internal layers like `encoder_cont`, `decoder_cat`, etc.


In [7]:
import torch
from pytorch_forecasting.metrics import MAE, SMAPE
from pytorch_forecasting.models.temporal_fusion_transformer._tft_v2 import TFT

In [8]:
# Initialise the Model
model = TFT(
    loss=MAE(),
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={"mode": "min", "factor": 0.1, "patience": 10},
    hidden_size=64,
    num_layers=2,
    attention_head_size=4,
    dropout=0.1,
    metadata=data_module.metadata,  # pass the metadata from the datamodule to the model
    # to initialise important params like `encoder_cont` etc
)

  warn(


We use a `Trainer` from PyTorch Lightning to train the model:

```python
trainer = Trainer(max_epochs=5, ...)
trainer.fit(model, data_module)
```

The `Trainer`:
- Pulls data from `data_module`
- Handles device placement
- Logs training progress and metrics


In [9]:
from lightning.pytorch import Trainer

In [10]:
# Train the model
print("\nTraining model...")
trainer = Trainer(
    max_epochs=5,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    log_every_n_steps=10,
)

trainer.fit(model, data_module)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 4050 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision



Training model...


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name                  | Type               | Params | Mode 
---------------------------------------------------------------------
0 | loss                  | MAE                | 0      | train
1 | encoder_var_selection | Sequential         | 709    | train
2 | decoder_var_selection | Sequential         | 193    | train
3 | static_context_linear | Linear             | 192    | train
4 | lstm_encoder          | LSTM               | 51.5 K | train
5 | lstm_decoder          | LSTM               | 50.4 K | train
6 | self_attention        | MultiheadAttention | 16.6 K | train
7 | pre_output            | Linear             | 4.2 K  | train
8 | output_layer          | Linear             | 65     | train
---------------------------------------------------------------------
123 K     Trainable params
0         Non-trainable params
123 K     Total params
0.495     Total estimated model params size (MB)
18        Modules in train mode
0         Modul

Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]

/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.


                                                                           

/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.


Epoch 4: 100%|██████████| 42/42 [00:02<00:00, 16.95it/s, v_num=2, train_loss_step=0.0977, val_loss=0.120, val_MAE=0.120, val_SMAPE=0.467, train_loss_epoch=0.133, train_MAE=0.133, train_SMAPE=0.473]

`Trainer.fit` stopped: `max_epochs=5` reached.


Epoch 4: 100%|██████████| 42/42 [00:02<00:00, 16.84it/s, v_num=2, train_loss_step=0.0977, val_loss=0.120, val_MAE=0.120, val_SMAPE=0.467, train_loss_epoch=0.133, train_MAE=0.133, train_SMAPE=0.473]


After training, we can make predictions using the trained model:

```python
model.eval()
with torch.no_grad():
    batch = next(iter(data_module.test_dataloader()))
    x, y = batch
    y_pred = model(x)
```

#### Output
Output of TFT model is a `dict` with key `prediction`:

- `y_pred["prediction"]`: Tensor of shape `(batch_size, prediction_length, output_size)`


In [11]:
# Evaluate the model
print("\nEvaluating model...")
test_metrics = trainer.test(model, data_module)

model.eval()
with torch.no_grad():
    test_batch = next(iter(data_module.test_dataloader()))
    x_test, y_test = test_batch
    y_pred = model(x_test)

    print("\nPrediction shape:", y_pred["prediction"].shape)
    print("First prediction values:", y_pred["prediction"][0].cpu().numpy())
    print("First true values:", y_test[0].cpu().numpy())
print("\nTFT model test complete!")

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/aryan/pytorch-forecasting/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.



Evaluating model...
Testing DataLoader 0: 100%|██████████| 9/9 [00:00<00:00, 27.38it/s] 
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_MAE            0.11830110847949982
       test_SMAPE           0.4569336473941803
        test_loss           0.11830110847949982
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Prediction shape: torch.Size([32, 1, 1])
First prediction values: [[0.03997546]]
First true values: [-0.16256696]

TFT model test complete!
