# Example Notebook for a basic vignette for `pytorch-forecasting v2` Model Training and Inference

<div class="alert alert-block alert-info">
:warning: The "Data Pipeline" showcased here is part of an experimental rework of the `pytorch-forecasting` data layer, planned for release in v2.0.0. The API is currently unstable and subject to change without prior notice. This notebook serves as a basic demonstration of the intended workflow and is not recommended for use in production environments. Feedback and suggestions are highly encouraged â€” please share them in <a href="https://github.com/sktime/pytorch-forecasting/issues/1736">issue 1736</a>.
</div>


In this notebook, we demonstrate how to train and evaluate the **Temporal Fusion Transformer (TFT)** using the new `TimeSeries` and `DataModule` API from the v2 pipeline.
We can do this in 2 ways:
1. **High-level package API:**

    This approach handles data loading, dataloader creation, and model training internally. It provides a simple, `scikit-learn`-like `fit` â†’ `predict` workflow.
    Users can still configure key training options (such as the `trainer`, callbacks, and training parameters) but cannot plug in fully custom `trainer` implementations or override internal pipeline logic.

2. **Low-level 3-stage pipeline**:
This involves explicitly constructing:
    * a `TimeSeries` object

    * a `DataModule`

    * the model (e.g., `TFT`)
    
    This workflow is ideal if you need custom setups such as custom trainers, callbacks, or advanced data preprocessing.
    It requires a deeper understanding of how the three layers (TimeSeries, DataModule, and the model) interact, but offers maximum flexibility.

# Create Synthetic data
We generate a synthetic dataset using `load_toydata` that creates a `pandas` DataFrame with just numerical values as for now **the pipeline assumes the data to be numerical only**.

In [2]:
from pytorch_forecasting.data.examples import load_toydata

In [3]:
num_series = 100  # Number of individual time series to generate
seq_length = 50  # Length of each time series
data_df = load_toydata(num_series, seq_length)
data_df.head()

Unnamed: 0,series_id,time_idx,x,y,category,future_known_feature,static_feature,static_feature_cat
0,0,0,-0.030643,0.14828,0,1.0,0.039213,0
1,0,1,0.14828,0.433029,0,0.995004,0.039213,0
2,0,2,0.433029,0.742511,0,0.980067,0.039213,0
3,0,3,0.742511,0.72927,0,0.955336,0.039213,0
4,0,4,0.72927,0.628604,0,0.921061,0.039213,0


# High-level API


## Steps
* Create the `TimeSeries` object
* Create `configs` for model, `datamodule`, `trainer` etc.
* Create the `model_pkg` object
* perform `pkg.fit` and `pkg.predict`.

##  Create Dataset object

`TimeSeries` returns the raw data in terms of tensors .

---

`TimeSeries` dataset's Key arguments:
- `data`: DataFrame with sequence data.
- `time`: integer typed column denoting the time index within `data`.
- `target`:  Column(s) in `data` denoting the forecasting target.
- `group`: List of column names identifying a time series instance within `data`.
- `num`: List of numerical features.
- `cat`: List of categorical features.
- `known`: Features known in future
- `unknown`: Features not known in the future
- `static`: List of variables that do not change over time,



In [4]:
from pytorch_forecasting.data.timeseries import TimeSeries

In [5]:
# create `TimeSeries` dataset that returns the raw data in terms of tensors
dataset = TimeSeries(
    data=data_df,
    time="time_idx",
    target="y",
    group=["series_id"],
    num=["x", "future_known_feature", "static_feature"],
    cat=["category", "static_feature_cat"],
    known=["future_known_feature"],
    unknown=["x", "category"],
    static=["static_feature", "static_feature_cat"],
)

  warn(


## Create the configs


In [13]:
from sklearn.preprocessing import StandardScaler
from pytorch_forecasting.data.encoders import (
    EncoderNormalizer,
    NaNLabelEncoder,
    TorchNormalizer,
)
from pytorch_forecasting.metrics import MAE, SMAPE

Here we use `EncoderDecoderTimeSeriesDataModule`


`EncoderDecoderTimeSeriesDataModule` key arguments:
- `time_series_dataset`: `TimeSeries` dataset instance
- `max_encoder_length` : Maximum length of the encoder input sequence.
- `max_prediction_length` : Maximum length of the decoder output sequence.
- `batch_size` : Batch size for DataLoader.
- `categorical_encoders` :  Dictionary of categorical encoders.
- `scalers` : Dictionary of feature scalers.
- `target_normalizer`: Normalizer for the target variable.

In [14]:
datamodule_cfg = dict(
    max_encoder_length=30,
    max_prediction_length=1,
    batch_size=32,
    categorical_encoders={
        "category": NaNLabelEncoder(add_nan=True),
        "static_feature_cat": NaNLabelEncoder(add_nan=True),
    },
    scalers={
        "x": StandardScaler(),
        "future_known_feature": StandardScaler(),
        "static_feature": StandardScaler(),
    },
    target_normalizer=TorchNormalizer(),
)

We would use `TFT` model in this tutorial

In [15]:
model_cfg = dict(
    loss=MAE(),
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={"mode": "min", "factor": 0.1, "patience": 10},
    hidden_size=64,
    num_layers=2,
    attention_head_size=4,
    dropout=0.1,
)

In [16]:
trainer_cfg = dict(
    max_epochs=5,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    log_every_n_steps=10,
)

In [17]:
from pytorch_forecasting.models.temporal_fusion_transformer._tft_pkg_v2 import (
    TFT_pkg_v2,
)

## Create the `model_pkg` object

This `pkg` class acts as a wrapper around the whole ML pipeline in `pytorch-forecasting` and we can simply just define the `pkg` class and then use `pkg.fit` and `pkg.predict` to perform the "fit", "predict" mechanisms.

In [18]:
model_pkg = TFT_pkg_v2(
    model_cfg=model_cfg,
    trainer_cfg=trainer_cfg,
    datamodule_cfg=datamodule_cfg,
)

{'loss': MAE(), 'logging_metrics': [MAE(), SMAPE()], 'optimizer': 'adam', 'optimizer_params': {'lr': 0.001}, 'lr_scheduler': 'reduce_lr_on_plateau', 'lr_scheduler_params': {'mode': 'min', 'factor': 0.1, 'patience': 10}, 'hidden_size': 64, 'num_layers': 2, 'attention_head_size': 4, 'dropout': 0.1}


In [None]:
model_pkg.fit(dataset)  # You can also pass in a DataModule here

  warn(
  warn(
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name                  | Type               | Params | Mode 
---------------------------------------------------------------------
0 | loss                  | MAE                | 0      | train
1 | encoder_var_selection | Sequential         | 709    | train
2 | decoder_var_selection | Sequential         | 193    | train
3 | static_context_linear | Linear             | 192    | train
4 | lstm_encoder          | LSTM               | 51.5 K | train
5 | lstm_decoder          | LSTM               | 50.4 K | train
6 | self_attention        | MultiheadAttention | 16.6 K | train
7 | pr

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO: `Trainer.fit` stopped: `max_epochs=5` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.


Artifacts saved in: /content/pytorch-forecasting/checkpoints


PosixPath('/content/pytorch-forecasting/checkpoints/best-epoch=3-step=168.ckpt')


#### Output
Output of TFT model is a `dict` with key `prediction`:

- `y_pred["prediction"]`: Tensor of shape `(batch_size, prediction_length, output_size)`


In [None]:
preds = model_pkg.predict(dataset, return_info=["index", "x", "y"])
# You can also pass in a DataModule or Dataloader here

  warn(
INFO: ðŸ’¡ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:ðŸ’¡ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: |          | 0/? [00:00<?, ?it/s]

In [21]:
print("First Predicted Value:")
print("Index:", preds["index"][0].item())
print("Prediction:", preds["prediction"][0].item())
print("Actual:", preds["y"][0].item())

First Predicted Value:
Index: -0.0801810473203659
Prediction: 0.11192154139280319
Actual: -0.1557866632938385


# 3-stage pipeline




## Steps

1. Create `TimeSeries` Dataset object
2. Create DataModule object
3. Initialize, Train & Run Inference with the Model




###  Create Dataset & DataModule

- `TimeSeries` returns the raw data in terms of tensors .
- `DataModule` wraps the dataset, handles splits, preprocessing, batching, and exposes `metadata` for the model initialisation.



### Initialize the Model

We initialize the TFT model using the `metadata` provided by the `DataModule`. This metadata includes all required dimensional info for the encoder, decoder, and static inputs.



### Train the Model

We use a `Trainer` from PyTorch Lightning to train the model

### Run Inference

After training, we can make predictions using the trained model


## 1. Create the dataset
We create a `TimeSeries` dataset instance that returns the raw data in terms of tensors, then this "raw data" is sent to the `data_module`that will internally handle the dataloaders and preprocessing

`TimeSeries` dataset's Key arguments:
- `data`: DataFrame with sequence data.
- `time`: integer typed column denoting the time index within `data`.
- `target`:  Column(s) in `data` denoting the forecasting target.
- `group`: List of column names identifying a time series instance within `data`.
- `num`: List of numerical features.
- `cat`: List of categorical features.
- `known`: Features known in future
- `unknown`: Features not known in the future
- `static`: List of variables that do not change over time,

In [22]:
from pytorch_forecasting.data.timeseries import TimeSeries

In [23]:
# create `TimeSeries` dataset that returns the raw data in terms of tensors
dataset = TimeSeries(
    data=data_df,
    time="time_idx",
    target="y",
    group=["series_id"],
    num=["x", "future_known_feature", "static_feature"],
    cat=["category", "static_feature_cat"],
    known=["future_known_feature"],
    unknown=["x", "category"],
    static=["static_feature", "static_feature_cat"],
)

  warn(


## 2. Create datamodule

`EncoderDecoderTimeSeriesDataModule` key arguments:
- `time_series_dataset`: `TimeSeries` dataset instance
- `max_encoder_length` : Maximum length of the encoder input sequence.
- `max_prediction_length` : Maximum length of the decoder output sequence.
- `batch_size` : Batch size for DataLoader.
- `categorical_encoders` :  Dictionary of categorical encoders.
- `scalers` : Dictionary of feature scalers.
- `target_normalizer`: Normalizer for the target variable.

In [24]:
from sklearn.preprocessing import StandardScaler
from pytorch_forecasting.data.data_module import EncoderDecoderTimeSeriesDataModule
from pytorch_forecasting.data.encoders import (
    EncoderNormalizer,
    NaNLabelEncoder,
    TorchNormalizer,
)

In [None]:
# create the `data_module` that handles the dataloaders and preprocessing
data_module = EncoderDecoderTimeSeriesDataModule(
    time_series_dataset=dataset,
    max_encoder_length=30,
    max_prediction_length=1,
    batch_size=32,
    categorical_encoders={
        "category": NaNLabelEncoder(add_nan=True),
        "static_feature_cat": NaNLabelEncoder(add_nan=True),
    },
    scalers={
        "x": StandardScaler(),
        "future_known_feature": StandardScaler(),
        "static_feature": StandardScaler(),
    },
    target_normalizer=TorchNormalizer(),
)

  warn(


## 3. Initialise and train the model

To initialise the model you now don't have to pass arguments like `encoder_cont`, `decoder_cont` etc as they are calculated internally using the `metadata` property [[source]](https://github.com/sktime/pytorch-forecasting/blob/4a34931e499c2b59de3939fcffcaabd75204b045/pytorch_forecasting/data/data_module.py#L264-L292) of `EncoderDecoderTimeSeriesDataModule`. But you still have to pass other params like `loss`, `optimizer` etc


```python
model = TFT(
    loss=nn.MSELoss(),
    logging_metrics=[MAE(), SMAPE()],
    metadata=data_module.metadata,  # <-- crucial for model setup
    ...
)
```

The `metadata` includes:
- `max_encoder_length`, `max_prediction_length`
- number of continuous/categorical variables in encoder/decoder
- number of static features

These are used to configure internal layers like `encoder_cont`, `decoder_cat`, etc.


In [None]:
import torch
import torch.nn as nn
from pytorch_forecasting.metrics import MAE, SMAPE
from pytorch_forecasting.models.temporal_fusion_transformer._tft_v2 import TFT

In [60]:
# Initialise the Model
model = TFT(
    loss=MAE(),
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    optimizer_params={"lr": 1e-3},
    lr_scheduler="reduce_lr_on_plateau",
    lr_scheduler_params={"mode": "min", "factor": 0.1, "patience": 10},
    hidden_size=64,
    num_layers=2,
    attention_head_size=4,
    dropout=0.1,
    metadata=data_module.metadata,  # pass the metadata from the datamodule to the model
    # to initialise important params like `encoder_cont` etc
)

  warn(


We use a `Trainer` from PyTorch Lightning to train the model:

```python
trainer = Trainer(max_epochs=5, ...)
trainer.fit(model, data_module)
```

The `Trainer`:
- Pulls data from `data_module`
- Handles device placement
- Logs training progress and metrics


In [None]:
from lightning.pytorch import Trainer

In [61]:
# Train the model
print("\nTraining model...")
trainer = Trainer(
    max_epochs=5,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    log_every_n_steps=10,
)

trainer.fit(model, data_module)

INFO: ðŸ’¡ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:ðŸ’¡ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name                  | Type               | Params | Mode 
---------------------------------------------------------------------
0 | l


Training model...


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO: `Trainer.fit` stopped: `max_epochs=5` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.



#### Output
Output of TFT model is a `dict` with key `prediction`:

- `y_pred["prediction"]`: Tensor of shape `(batch_size, prediction_length, output_size)`


In [None]:
data_module.setup(stage="test")
test_dataloader = data_module.test_dataloader()

In [None]:
preds = model.predict(test_dataloader, return_info=["index", "x", "y"])

INFO: ðŸ’¡ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:ðŸ’¡ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: |          | 0/? [00:00<?, ?it/s]

In [None]:
print("First Predicted Value:")
print("Index:", preds["index"][0].item())
print("Prediction:", preds["prediction"][0].item())
print("Actual:", preds["y"][0].item())

First Predicted Value:
Index: 0.11104673147201538
Prediction: -0.001255139708518982
Actual: 0.07348770648241043
