# How to implement custom models

## Building a simple, first model

For demonstration purposes we will choose a simple fully connected model. It takes a timeseries of size `input_size` as input and outputs a new timeseries of size `output_size`. You can think of this `input_size` encoding steps and `output_size` decoding/prediction steps.

In [36]:
import os
import warnings

warnings.filterwarnings("ignore")

os.chdir("../../..")

In [2]:
import torch
from torch import nn


class FullyConnectedModule(nn.Module):
    def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int):
        super().__init__()

        # input layer
        module_list = [nn.Linear(input_size, hidden_size), nn.ReLU()]
        # hidden layers
        for _ in range(n_hidden_layers):
            module_list.extend([nn.Linear(hidden_size, hidden_size), nn.ReLU()])
        # output layer
        module_list.append(nn.Linear(hidden_size, output_size))

        self.sequential = nn.Sequential(*module_list)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x of shape: batch_size x n_timesteps_in
        # output of shape batch_size x n_timesteps_out
        return self.sequential(x)


# test that network works as intended
network = FullyConnectedModule(input_size=5, output_size=2, hidden_size=10, n_hidden_layers=2)
x = torch.rand(20, 5)
network(x).shape

torch.Size([20, 2])

In [3]:
from typing import Dict

from pytorch_forecasting.models import BaseModel


class FullyConnectedModel(BaseModel):
    def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int, **kwargs):
        # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this
        self.save_hyperparameters()
        # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this
        super().__init__(**kwargs)
        self.network = FullyConnectedModule(
            input_size=self.hparams.input_size,
            output_size=self.hparams.output_size,
            hidden_size=self.hparams.hidden_size,
            n_hidden_layers=self.hparams.n_hidden_layers,
        )

    def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
        # x is a batch generated based on the TimeSeriesDataset
        network_input = x["encoder_cont"].squeeze(-1)
        prediction = self.network(network_input)

        # We need to return a dictionary that at least contains the prediction and the target_scale.
        # The parameter can be directly forwarded from the input.
        return dict(prediction=prediction, target_scale=x["target_scale"])


model = FullyConnectedModel(input_size=5, output_size=2, hidden_size=10, n_hidden_layers=2)

This is a very basic implementation that could be readily used for training. But before we add additional features, let's first have a look how we pass data to this model.

### Passing data to a model

In [4]:
import numpy as np
import pandas as pd

test_data = pd.DataFrame(
    dict(
        value=np.random.rand(30) - 0.5,
        group=np.repeat(np.arange(3), 10),
        time_idx=np.tile(np.arange(10), 3),
    )
)
test_data

Unnamed: 0,value,group,time_idx
0,0.346013,0,0
1,0.49893,0,1
2,0.267375,0,2
3,0.474667,0,3
4,0.041184,0,4
5,0.171251,0,5
6,-0.371327,0,6
7,0.006309,0,7
8,-0.406183,0,8
9,0.104648,0,9


In [5]:
from pytorch_forecasting import TimeSeriesDataSet

# create the dataset from the pandas dataframe
dataset = TimeSeriesDataSet(
    test_data,
    group_ids=["group"],
    target="value",
    time_idx="time_idx",
    min_encoder_length=5,
    max_encoder_length=5,
    min_prediction_length=2,
    max_prediction_length=2,
    time_varying_unknown_reals=["value"],
)

In [6]:
dataset.get_parameters()

{'time_idx': 'time_idx',
 'target': 'value',
 'group_ids': ['group'],
 'weight': None,
 'max_encoder_length': 5,
 'min_encoder_length': 5,
 'min_prediction_idx': 0,
 'min_prediction_length': 2,
 'max_prediction_length': 2,
 'static_categoricals': [],
 'static_reals': [],
 'time_varying_known_categoricals': [],
 'time_varying_known_reals': [],
 'time_varying_unknown_categoricals': [],
 'time_varying_unknown_reals': ['value'],
 'variable_groups': {},
 'dropout_categoricals': [],
 'constant_fill_strategy': {},
 'allow_missings': False,
 'add_relative_time_idx': False,
 'add_target_scales': False,
 'add_encoder_length': False,
 'target_normalizer': GroupNormalizer(),
 'categorical_encoders': {'__group_id__group': NaNLabelEncoder(),
  'group': NaNLabelEncoder()},
 'scalers': {'value': GroupNormalizer()},
 'randomize_length': None,
 'predict_mode': False}

Now, we take a look at the output of the dataloader. It's `x` will be fed to the model's forward method, that is why it is so important to understand it.

In [7]:
# convert the dataset to a dataloader
dataloader = dataset.to_dataloader(batch_size=4)

# and load the first batch
x, y = next(iter(dataloader))
print("x =", x)
print("\ny =", y)
print("\nsizes of x =")
for key, value in x.items():
    print(f"\t{key} = {value.size()}")

x = {'encoder_cat': tensor([], size=(4, 5, 0), dtype=torch.int64), 'encoder_cont': tensor([[[-1.1748],
         [-0.6134],
         [ 1.1748],
         [-0.7944],
         [ 0.7807]],

        [[-0.7793],
         [-0.7359],
         [-0.1038],
         [ 0.2095],
         [-0.2538]],

        [[-0.7461],
         [-0.7793],
         [-0.7359],
         [-0.1038],
         [ 0.2095]],

        [[ 2.0526],
         [ 0.9672],
         [ 1.9389],
         [-0.0930],
         [ 0.5167]]]), 'encoder_target': tensor([[-0.1896, -0.0698,  0.3117, -0.1084,  0.2276],
        [-0.1052, -0.0960,  0.0389,  0.1057,  0.0069],
        [-0.0981, -0.1052, -0.0960,  0.0389,  0.1057],
        [ 0.4989,  0.2674,  0.4747,  0.0412,  0.1713]]), 'encoder_lengths': tensor([5, 5, 5, 5]), 'decoder_cat': tensor([], size=(4, 2, 0), dtype=torch.int64), 'decoder_cont': tensor([[[-0.3466],
         [-0.0228]],

        [[ 0.2434],
         [ 0.8438]],

        [[-0.2538],
         [ 0.2434]],

        [[-2.0266],
   

This explains why we had to first extract the correct input in our simple `FullyConnectedModel` above before passing it to our `FullyConnectedModule`.
As a reminder:
       

In [8]:
def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
    # x is a batch generated based on the TimeSeriesDataset
    network_input = x["encoder_cont"].squeeze(-1)
    prediction = self.network(network_input)

    # We need to return a dictionary that at least contains the prediction and the target_scale.
    # The parameter can be directly forwarded from the input.
    return dict(prediction=prediction, target_scale=x["target_scale"])

For such a simple architecture, we can ignore most of the inputs in ``x``. You do not have to worry about moving tensors to specifc GPUs, [PyTorch Lightning](https://pytorch-lightning.readthedocs.io) will take care of this for you.

Now, let's check if our model works:

In [9]:
x, y = next(iter(dataloader))
model(x)

{'prediction': tensor([[-0.2281, -0.2557],
         [-0.2806, -0.3446],
         [-0.2609, -0.3056],
         [-0.1920, -0.1979]], grad_fn=<AddmmBackward>),
 'target_scale': tensor([[0.0610, 0.2133],
         [0.0610, 0.2133],
         [0.0610, 0.2133],
         [0.0610, 0.2133]])}

In [10]:
dataset.x_to_index(x)

Unnamed: 0,time_idx,group
0,8,0
1,6,1
2,8,1
3,6,0


### Coupling datasets and models

In [11]:
class FullyConnectedModel(BaseModel):
    def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int, **kwargs):
        # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this
        self.save_hyperparameters()
        # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this
        super().__init__(**kwargs)
        self.network = FullyConnectedModule(
            input_size=self.hparams.input_size,
            output_size=self.hparams.output_size,
            hidden_size=self.hparams.hidden_size,
            n_hidden_layers=self.hparams.n_hidden_layers,
        )

    def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
        # x is a batch generated based on the TimeSeriesDataset
        network_input = x["encoder_cont"].squeeze(-1)
        prediction = self.network(network_input).unsqueeze(-1)

        # We need to return a dictionary that at least contains the prediction and the target_scale.
        # The parameter can be directly forwarded from the input.
        return dict(prediction=prediction, target_scale=x["target_scale"])

    @classmethod
    def from_dataset(cls, dataset: TimeSeriesDataSet, **kwargs):
        new_kwargs = {
            "output_size": dataset.max_prediction_length,
            "input_size": dataset.max_encoder_length,
        }
        new_kwargs.update(kwargs)  # use to pass real hyperparameters and override defaults set by dataset
        # example for dataset validation
        assert dataset.max_prediction_length == dataset.min_prediction_length, "Decoder only supports a fixed length"
        assert dataset.min_encoder_length == dataset.max_encoder_length, "Encoder only supports a fixed length"
        assert (
            len(dataset.time_varying_known_categoricals) == 0
            and len(dataset.time_varying_known_reals) == 0
            and len(dataset.time_varying_unknown_categoricals) == 0
            and len(dataset.static_categoricals) == 0
            and len(dataset.static_reals) == 0
            and len(dataset.time_varying_unknown_reals) == 1
            and dataset.time_varying_unknown_reals[0] == dataset.target
        ), "Only covariate should be the target in 'time_varying_unknown_reals'"

        return super().from_dataset(dataset, **new_kwargs)

Now, let's initialize from our dataset:

In [12]:
model = FullyConnectedModel.from_dataset(dataset, hidden_size=10, n_hidden_layers=2)
model.summarize("full")  # print model summary
model.hparams


   | Name                 | Type                 | Params
---------------------------------------------------------------
0  | loss                 | SMAPE                | 0     
1  | logging_metrics      | ModuleList           | 0     
2  | network              | FullyConnectedModule | 302   
3  | network.sequential   | Sequential           | 302   
4  | network.sequential.0 | Linear               | 60    
5  | network.sequential.1 | ReLU                 | 0     
6  | network.sequential.2 | Linear               | 110   
7  | network.sequential.3 | ReLU                 | 0     
8  | network.sequential.4 | Linear               | 110   
9  | network.sequential.5 | ReLU                 | 0     
10 | network.sequential.6 | Linear               | 22    


"hidden_size":                10
"input_size":                 5
"learning_rate":              0.001
"log_gradient_flow":          False
"log_interval":               -1
"log_val_interval":           -1
"logging_metrics":            ModuleList()
"loss":                       SMAPE()
"monotone_constaints":        {}
"n_hidden_layers":            2
"optimizer":                  ranger
"output_size":                2
"output_transformer":         GroupNormalizer()
"reduce_on_plateau_min_lr":   1e-05
"reduce_on_plateau_patience": 1000
"weight_decay":               0.0

### Defining additional hyperparameters

In [13]:
model.hparams

"hidden_size":                10
"input_size":                 5
"learning_rate":              0.001
"log_gradient_flow":          False
"log_interval":               -1
"log_val_interval":           -1
"logging_metrics":            ModuleList()
"loss":                       SMAPE()
"monotone_constaints":        {}
"n_hidden_layers":            2
"optimizer":                  ranger
"output_size":                2
"output_transformer":         GroupNormalizer()
"reduce_on_plateau_min_lr":   1e-05
"reduce_on_plateau_patience": 1000
"weight_decay":               0.0

In [14]:
print(BaseModel.__init__.__doc__)


        BaseModel for timeseries forecasting from which to inherit from

        Args:
            log_interval (Union[int, float], optional): Batches after which predictions are logged. If < 1.0, will log
                multiple entries per batch. Defaults to -1.
            log_val_interval (Union[int, float], optional): batches after which predictions for validation are
                logged. Defaults to None/log_interval.
            learning_rate (float, optional): Learning rate. Defaults to 1e-3.
            log_gradient_flow (bool): If to log gradient flow, this takes time and should be only done to diagnose
                training failures. Defaults to False.
            loss (Metric, optional): metric to optimize. Defaults to SMAPE().
            logging_metrics (nn.ModuleList[MultiHorizonMetric]): list of metrics that are logged during training.
                Defaults to [].
            reduce_on_plateau_patience (int): patience after which learning rate is reduced by a

## Using covariates

In [15]:
from pytorch_forecasting.models.base_model import BaseModelWithCovariates

print(BaseModelWithCovariates.__doc__)


    Model with additional methods using covariates.

    Assumes the following hyperparameters:

    Args:
        static_categoricals (List[str]): names of static categorical variables
        static_reals (List[str]): names of static continuous variables
        time_varying_categoricals_encoder (List[str]): names of categorical variables for encoder
        time_varying_categoricals_decoder (List[str]): names of categorical variables for decoder
        time_varying_reals_encoder (List[str]): names of continuous variables for encoder
        time_varying_reals_decoder (List[str]): names of continuous variables for decoder
        x_reals (List[str]): order of continuous variables in tensor passed to forward function
        x_categoricals (List[str]): order of categorical variables in tensor passed to forward function
        embedding_sizes (Dict[str, Tuple[int, int]]): dictionary mapping categorical variables to tuple of integers
            where the first integer denotes the nu

In [16]:
from typing import Dict, List, Tuple

from pytorch_forecasting.models.nn import MultiEmbedding


class FullyConnectedModelWithCovariates(BaseModelWithCovariates):
    def __init__(
        self,
        input_size: int,
        output_size: int,
        hidden_size: int,
        n_hidden_layers: int,
        x_reals: List[str],
        x_categoricals: List[str],
        embedding_sizes: Dict[str, Tuple[int, int]],
        embedding_labels: Dict[str, List[str]],
        static_categoricals: List[str],
        static_reals: List[str],
        time_varying_categoricals_encoder: List[str],
        time_varying_categoricals_decoder: List[str],
        time_varying_reals_encoder: List[str],
        time_varying_reals_decoder: List[str],
        embedding_paddings: List[str],
        categorical_groups: Dict[str, List[str]],
        **kwargs,
    ):
        # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this
        self.save_hyperparameters()
        # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this
        super().__init__(**kwargs)

        # create embedder - can be fed with x["encoder_cat"] or x["decoder_cat"] and will return
        # dictionary of category names mapped to embeddings
        self.input_embeddings = MultiEmbedding(
            embedding_sizes=self.hparams.embedding_sizes,
            categorical_groups=self.hparams.categorical_groups,
            embedding_paddings=self.hparams.embedding_paddings,
            x_categoricals=self.hparams.x_categoricals,
            max_embedding_size=self.hparams.hidden_size,
        )

        # calculate the size of all concatenated embeddings + continous variables
        n_features = sum(
            embedding_size for classes_size, embedding_size in self.hparams.embedding_sizes.values()
        ) + len(self.reals)

        # create network that will be fed with continious variables and embeddings
        self.network = FullyConnectedModule(
            input_size=self.hparams.input_size * n_features,
            output_size=self.hparams.output_size,
            hidden_size=self.hparams.hidden_size,
            n_hidden_layers=self.hparams.n_hidden_layers,
        )

    def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
        # x is a batch generated based on the TimeSeriesDataset
        batch_size = x["encoder_lengths"].size(0)
        embeddings = self.input_embeddings(x["encoder_cat"])  # returns dictionary with embedding tensors
        network_input = torch.cat(
            [x["encoder_cont"]]
            + [
                emb
                for name, emb in embeddings.items()
                if name in self.encoder_variables or name in self.static_variables
            ],
            dim=-1,
        )
        prediction = self.network(network_input.view(batch_size, -1))

        # We need to return a dictionary that at least contains the prediction and the target_scale.
        # The parameter can be directly forwarded from the input.
        return dict(prediction=prediction, target_scale=x["target_scale"])

    @classmethod
    def from_dataset(cls, dataset: TimeSeriesDataSet, **kwargs):
        new_kwargs = {
            "output_size": dataset.max_prediction_length,
            "input_size": dataset.max_encoder_length,
        }
        new_kwargs.update(kwargs)  # use to pass real hyperparameters and override defaults set by dataset
        # example for dataset validation
        assert dataset.max_prediction_length == dataset.min_prediction_length, "Decoder only supports a fixed length"
        assert dataset.min_encoder_length == dataset.max_encoder_length, "Encoder only supports a fixed length"

        return super().from_dataset(dataset, **new_kwargs)

Note that the model does not make use of the known covariates in the decoder - this is obviously suboptimal but not scope of this tutorial. Anyways, let us create a new dataset with categorical variables and see how the model can be instantiated from it.

In [17]:
import numpy as np
import pandas as pd

from pytorch_forecasting import TimeSeriesDataSet

test_data_with_covariates = pd.DataFrame(
    dict(
        # as before
        value=np.random.rand(30),
        group=np.repeat(np.arange(3), 10),
        time_idx=np.tile(np.arange(10), 3),
        # now adding covariates
        categorical_covariate=np.random.choice(["a", "b"], size=30),
        real_covariate=np.random.rand(30),
    )
).astype(
    dict(group=str)
)  # categorical covariates have to be of string type
test_data_with_covariates

Unnamed: 0,value,group,time_idx,categorical_covariate,real_covariate
0,0.007754,0,0,a,0.193218
1,0.941164,0,1,a,0.572405
2,0.308594,0,2,a,0.728801
3,0.887074,0,3,a,0.023298
4,0.341287,0,4,b,0.148565
5,0.526935,0,5,a,0.563323
6,0.870729,0,6,a,0.255756
7,0.846605,0,7,b,0.16057
8,0.151567,0,8,a,0.597226
9,0.816219,0,9,a,0.411941


In [18]:
# create the dataset from the pandas dataframe
dataset_with_covariates = TimeSeriesDataSet(
    test_data_with_covariates,
    group_ids=["group"],
    target="value",
    time_idx="time_idx",
    min_encoder_length=5,
    max_encoder_length=5,
    min_prediction_length=2,
    max_prediction_length=2,
    time_varying_unknown_reals=["value"],
    time_varying_known_reals=["real_covariate"],
    time_varying_known_categoricals=["categorical_covariate"],
    static_categoricals=["group"],
)

model = FullyConnectedModelWithCovariates.from_dataset(dataset_with_covariates, hidden_size=10, n_hidden_layers=2)
model.summarize("full")  # print model summary
model.hparams


   | Name                                              | Type                 | Params
--------------------------------------------------------------------------------------------
0  | loss                                              | SMAPE                | 0     
1  | logging_metrics                                   | ModuleList           | 0     
2  | input_embeddings                                  | MultiEmbedding       | 11    
3  | input_embeddings.embeddings                       | ModuleDict           | 11    
4  | input_embeddings.embeddings.group                 | Embedding            | 9     
5  | input_embeddings.embeddings.categorical_covariate | Embedding            | 2     
6  | network                                           | FullyConnectedModule | 552   
7  | network.sequential                                | Sequential           | 552   
8  | network.sequential.0                              | Linear               | 310   
9  | network.sequential.1           

"categorical_groups":                {}
"embedding_labels":                  {'group': {'0': 0, '1': 1, '2': 2}, 'categorical_covariate': {'a': 0, 'b': 1}}
"embedding_paddings":                []
"embedding_sizes":                   {'group': [3, 3], 'categorical_covariate': [2, 1]}
"hidden_size":                       10
"input_size":                        5
"learning_rate":                     0.001
"log_gradient_flow":                 False
"log_interval":                      -1
"log_val_interval":                  -1
"logging_metrics":                   ModuleList()
"loss":                              SMAPE()
"monotone_constaints":               {}
"n_hidden_layers":                   2
"optimizer":                         ranger
"output_size":                       2
"output_transformer":                GroupNormalizer(transformation='relu')
"reduce_on_plateau_min_lr":          1e-05
"reduce_on_plateau_patience":        1000
"static_categoricals":               ['group']
"stati

To test that the model could be trained, pass a sample batch.

In [19]:
x, y = next(iter(dataset_with_covariates.to_dataloader(batch_size=4)))  # generate batch
model(x)  # pass batch through model

{'prediction': tensor([[-0.2650, -0.0057],
         [-0.2587, -0.0262],
         [-0.2486, -0.0395],
         [-0.2714, -0.0140]], grad_fn=<AddmmBackward>),
 'target_scale': tensor([[0.4589, 0.2995],
         [0.4589, 0.2995],
         [0.4589, 0.2995],
         [0.4589, 0.2995]])}

## Implementing an autoregressive / recurrent model

In [20]:
from torch.nn.utils import rnn

from pytorch_forecasting.models.base_model import AutoRegressiveBaseModel


class LSTMModel(AutoRegressiveBaseModel):
    def __init__(self, target: str, n_layers: int, hidden_size: int, dropout: float = 0.1, **kwargs):
        # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this
        self.save_hyperparameters()
        # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this
        super().__init__(**kwargs)

        # use pytorch implementation of LSTM
        self.lstm = nn.LSTM(
            hidden_size=self.hparams.hidden_size,
            input_size=1,
            num_layers=self.hparams.n_layers,
            dropout=self.hparams.dropout,
            batch_first=True,
        )
        self.output_layer = nn.Linear(self.hparams.hidden_size, 1)

    @property
    def target_position(self):
        # position of target within reals vector: with covariates: self.hparams.x_reals.index(self.hparams.target)
        return 0

    def encode(self, x: Dict[str, torch.Tensor]):
        # we need at least one encoding step as because the target needs to be lagged by one time step
        # as we are lazy, we also require that the encoder length is at least 1, so we can easily generate a
        # hidden state here. See the DeepAR implementation for how to use a minimal encoder length of 1
        assert x["encoder_lengths"].min() > 2
        input_vector = x["encoder_cont"].clone()
        # lag target by one
        input_vector[..., self.target_position] = torch.roll(input_vector[..., self.target_position], shifts=1, dims=1)
        input_vector = input_vector[:, 1:]  # first time step cannot be used because of lagging

        # determine effective encoder_length length
        effective_encoder_lengths = x["encoder_lengths"] - 1
        # run through LSTM network
        _, hidden_state = self.lstm(
            rnn.pack_padded_sequence(
                input_vector, effective_encoder_lengths.cpu(), enforce_sorted=False, batch_first=True
            )
        )  # second ouput is not needed (hidden state)
        return hidden_state

    def decode(self, x: Dict[str, torch.Tensor], hidden_state):
        # again lag target by one
        input_vector = x["decoder_cont"].clone()
        input_vector[..., self.target_position] = torch.roll(input_vector[..., self.target_position], shifts=1, dims=1)
        # but this time fill in missing target from encoder_cont at the first time step instead of throwing it away
        last_encoder_target = x["encoder_cont"][
            torch.arange(x["encoder_cont"].size(0)), x["encoder_lengths"] - 1, self.target_position
        ]
        input_vector[:, 0, self.target_position] = last_encoder_target

        if self.training:  # training attribute is provided from PyTorch and indicates if module is in training model
            packed_decoder = rnn.pack_padded_sequence(
                input_vector, lengths=x["decoder_lengths"].cpu(), batch_first=True, enforce_sorted=False
            )
            # run through same lstm
            lstm_output, _ = self.lstm(packed_decoder, hidden_state)
            # unpack sequence
            lstm_output, _ = rnn.pad_packed_sequence(lstm_output, batch_first=True)
            # transform into right shape
            prediction = self.output_layer(lstm_output)

        else:  # if not training, need to predict in autoregressive manner
            # predict one by one
            max_decoder_length = x["decoder_lengths"].max()
            # initialize previous target and hidden state
            last_target = last_encoder_target
            last_hidden_state = hidden_state

            predictions = []

            # for each time step run prediction
            for i in range(max_decoder_length):
                current_input_vector = input_vector[:, i].unsqueeze(1)  # select time step in decoder
                current_input_vector[:, 0, self.target_position] = last_target  # insert previous target

                # make lstm prediction
                lstm_prediction, new_hidden_state = self.lstm(current_input_vector, last_hidden_state)
                prediction = self.output_layer(lstm_prediction).squeeze(1)

                # save prediction
                predictions.append(prediction)

                # prepare for next time step
                last_hidden_state = new_hidden_state

                # Prediction should be passed through transformer and then inversely transformed.
                # The inverse transformation might be only approximately the inverse of the
                # forward transformation making this step important.
                rescaled_prediction = self.transform_output(
                    dict(prediction=prediction, target_scale=x["target_scale"])
                )  # inverse transform
                normalized_prediction = self.output_transformer.transform(
                    rescaled_prediction, target_scale=x["target_scale"]
                )  # transform

                last_target = normalized_prediction.squeeze(1)

            # stack all predictions
            prediction = torch.stack(predictions, dim=1)

        return prediction

    def forward(self, x: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
        hidden_state = self.encode(x)  # encode to hidden state
        prediction = self.decode(x, hidden_state)  # decode leveraging hidden state
        return dict(prediction=prediction, target_scale=x["target_scale"])


model = LSTMModel.from_dataset(dataset, n_layers=2, hidden_size=10)
model.summarize("full")
model.hparams


  | Name            | Type       | Params
-----------------------------------------------
0 | loss            | SMAPE      | 0     
1 | logging_metrics | ModuleList | 0     
2 | lstm            | LSTM       | 1 K   
3 | output_layer    | Linear     | 11    


"dropout":                    0.1
"hidden_size":                10
"learning_rate":              0.001
"log_gradient_flow":          False
"log_interval":               -1
"log_val_interval":           -1
"logging_metrics":            ModuleList()
"loss":                       SMAPE()
"monotone_constaints":        {}
"n_layers":                   2
"optimizer":                  ranger
"output_transformer":         GroupNormalizer()
"reduce_on_plateau_min_lr":   1e-05
"reduce_on_plateau_patience": 1000
"target":                     value
"weight_decay":               0.0

In [21]:
x, y = next(iter(dataloader))

print(
    "prediction shape in training:", model(x)["prediction"].size()
)  # batch_size x decoder time steps x 1 (1 for one target dimension)
model.eval()  # set model into eval mode to use autoregressive prediction
print("prediction shape in inference:", model(x)["prediction"].size())  # should be the same as in training

prediction shape in training: torch.Size([4, 2, 1])
prediction shape in inference: torch.Size([4, 2, 1])


## Using and defining a custom/non-trivial metric

To use a different metric, simply pass it to the model when initializing it (preferably via the `from_dataset()` method). For example, to use mean absolute error with our `FullyConnectedModel` from the beginning of this tutorial, type

In [22]:
from pytorch_forecasting.metrics import MAE

model = FullyConnectedModel.from_dataset(dataset, hidden_size=10, n_hidden_layers=2, loss=MAE())
model.hparams

"hidden_size":                10
"input_size":                 5
"learning_rate":              0.001
"log_gradient_flow":          False
"log_interval":               -1
"log_val_interval":           -1
"logging_metrics":            ModuleList()
"loss":                       MAE()
"monotone_constaints":        {}
"n_hidden_layers":            2
"optimizer":                  ranger
"output_size":                2
"output_transformer":         GroupNormalizer()
"reduce_on_plateau_min_lr":   1e-05
"reduce_on_plateau_patience": 1000
"weight_decay":               0.0

Note that some metrics might require a certain form of model prediction, e.g. quantile prediction assumes an output of shape `batch_size x n_decoder_timesteps x n_quantiles` instead of `batch_size x n_decoder_timesteps`. For the `FullyConnectedModel`, this means that we need to use a modified `FullyConnectedModule`network. Here `n_outputs` corresponds to the number of quantiles.

In [23]:
import torch
from torch import nn


class FullyConnectedMultiOutputModule(nn.Module):
    def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int, n_outputs: int):
        super().__init__()

        # input layer
        module_list = [nn.Linear(input_size, hidden_size), nn.ReLU()]
        # hidden layers
        for _ in range(n_hidden_layers):
            module_list.extend([nn.Linear(hidden_size, hidden_size), nn.ReLU()])
        # output layer
        self.n_outputs = n_outputs
        module_list.append(
            nn.Linear(hidden_size, output_size * n_outputs)
        )  # <<<<<<<< modified: replaced output_size with output_size * n_outputs

        self.sequential = nn.Sequential(*module_list)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x of shape: batch_size x n_timesteps_in
        # output of shape batch_size x n_timesteps_out
        return self.sequential(x).reshape(x.size(0), -1, self.n_outputs)  # <<<<<<<< modified: added reshape


# test that network works as intended
network = FullyConnectedMultiOutputModule(input_size=5, output_size=2, hidden_size=10, n_hidden_layers=2, n_outputs=7)
x = torch.rand(20, 5)
network(x).shape  # <<<<<<<<<< instead of shape (20, 2), returning additional dimension for quantiles

torch.Size([20, 2, 7])

### Simple case: model output can be readily converted to prediction

In [24]:
from pytorch_forecasting.metrics import MultiHorizonMetric


class MAE(MultiHorizonMetric):
    def loss(self, y_pred, target):
        loss = (self.to_prediction(y_pred) - target).abs()
        return loss

### Advanced case: model ouptut cannot be readily converted to prediction

In [52]:
from copy import copy

from pytorch_forecasting.metrics import NormalDistributionLoss


class FullyConnectedForDistributionLossModel(BaseModel):  # we inherit the `from_dataset` method
    def __init__(self, input_size: int, output_size: int, hidden_size: int, n_hidden_layers: int, **kwargs):
        # saves arguments in signature to `.hparams` attribute, mandatory call - do not skip this
        self.save_hyperparameters()
        # pass additional arguments to BaseModel.__init__, mandatory call - do not skip this
        super().__init__(**kwargs)
        self.network = FullyConnectedMultiOutputModule(
            input_size=self.hparams.input_size,
            output_size=self.hparams.output_size,
            hidden_size=self.hparams.hidden_size,
            n_hidden_layers=self.hparams.n_hidden_layers,
            n_outputs=2,  # <<<<<<<< we predict two outputs for mean and scale of the normal distribution
        )
        self.loss = NormalDistributionLoss()

    @classmethod
    def from_dataset(cls, dataset: TimeSeriesDataSet, **kwargs):
        new_kwargs = {
            "output_size": dataset.max_prediction_length,
            "input_size": dataset.max_encoder_length,
        }
        new_kwargs.update(kwargs)  # use to pass real hyperparameters and override defaults set by dataset
        # example for dataset validation
        assert dataset.max_prediction_length == dataset.min_prediction_length, "Decoder only supports a fixed length"
        assert dataset.min_encoder_length == dataset.max_encoder_length, "Encoder only supports a fixed length"
        assert (
            len(dataset.time_varying_known_categoricals) == 0
            and len(dataset.time_varying_known_reals) == 0
            and len(dataset.time_varying_unknown_categoricals) == 0
            and len(dataset.static_categoricals) == 0
            and len(dataset.static_reals) == 0
            and len(dataset.time_varying_unknown_reals) == 1
            and dataset.time_varying_unknown_reals[0] == dataset.target
        ), "Only covariate should be the target in 'time_varying_unknown_reals'"

        return super().from_dataset(dataset, **new_kwargs)

    def forward(self, x: Dict[str, torch.Tensor], n_samples: int = None) -> Dict[str, torch.Tensor]:
        # x is a batch generated based on the TimeSeriesDataset
        network_input = x["encoder_cont"].squeeze(-1)
        prediction = self.network(network_input)  # shape batch_size x n_decoder_steps x 2
        if (
            self.training or n_samples is None
        ):  # training is a PyTorch variable indicating if a module is being trained (tracing gradients) or evaluated
            assert n_samples is None, "We need to predict parameters when training"
            prediction_type = "parameters"
        else:
            # let's sample from our distribution - first we need to scale the parameters to real space
            scaled_parameters = self.transform_output(
                dict(
                    prediction=prediction,
                    target_scale=x["target_scale"],
                    prediction_type="parameters",
                )
            )
            # and then sample from distribution
            prediction = self.loss.sample(scaled_parameters, n_samples)
            prediction_type = "samples"
        return dict(prediction=prediction, target_scale=x["target_scale"], prediction_type=prediction_type)

    def transform_output(self, out: Dict[str, torch.Tensor]) -> torch.Tensor:
        # input is forward's output
        # depending on output, transform differently
        if out["prediction_type"] == "samples":  # samples are already rescaled
            out = out["prediction"]
        else:  # parameters need to be rescaled
            out = self.loss.rescale_parameters(
                out["prediction"], target_scale=out["target_scale"], encoder=self.output_transformer
            )
        return out

    def log_prediction(self, x, out, batch_idx) -> None:
        if (
            out["prediction_type"] == "parameters"
            and (batch_idx % self.log_interval == 0 or self.log_interval < 1.0)
            and self.log_interval > 0
        ):
            out = copy(out)  # copy to avoid side-effects but do not deep copy to re-use references
            # sample from distribution to create valid prediction
            y_hat_detached = out["prediction"].detach()
            y_hat_samples = self.loss.sample(y_hat_detached, 100)
            out["prediction"] = y_hat_samples
            out["prediction_type"] = "samples"
        super().log_prediction(x, out, batch_idx=batch_idx)

    def log_metrics(
        self,
        x: Dict[str, torch.Tensor],
        y: torch.Tensor,
        out: Dict[str, torch.Tensor],
    ) -> None:
        # Metrics (in contrast to the training loss: distribution loss) are calculated based on point predictions.
        # Therefore, we need to convert parameter outputs to
        if out["prediction_type"] == "parameters":
            # use distribution properties to create point prediction
            out = copy(out)  # copy to avoid side-effects but do not deep copy to re-use references
            y_hat_detached = out["prediction"].detach()
            y_hat_point_detached = self.loss.map_x_to_distribution(y_hat_detached).mean.unsqueeze(-1)
            out["prediction"] = y_hat_point_detached
            out["prediction_type"] = "samples"
        super().log_metrics(x, y, out)


model = FullyConnectedForDistributionLossModel.from_dataset(dataset, hidden_size=10, n_hidden_layers=2)
model.hparams

"hidden_size":                10
"input_size":                 5
"learning_rate":              0.001
"log_gradient_flow":          False
"log_interval":               -1
"log_val_interval":           -1
"logging_metrics":            ModuleList()
"loss":                       SMAPE()
"monotone_constaints":        {}
"n_hidden_layers":            2
"optimizer":                  ranger
"output_size":                2
"output_transformer":         GroupNormalizer()
"reduce_on_plateau_min_lr":   1e-05
"reduce_on_plateau_patience": 1000
"weight_decay":               0.0

In [44]:
x, y = next(iter(dataloader))

print("parameter predition shape: ", model(x)["prediction"].size())
model.eval()  # set model into eval mode for sampling
print("sample prediction shape: ", model(x, n_samples=200)["prediction"].size())

parameter predition shape:  torch.Size([4, 2, 2])
sample prediction shape:  torch.Size([4, 2, 200])


In [45]:
model.predict(dataloader, n_samples=100, mode="quantiles").shape

torch.Size([12, 2, 7])

The returned quantiles are here determined by the quantiles defined in the loss function and can be modified by passing a list of quantiles to at initialization.

In [46]:
model.loss.quantiles

[0.02, 0.1, 0.25, 0.5, 0.75, 0.9, 0.98]

In [47]:
NormalDistributionLoss(quantiles=[0.2, 0.8]).quantiles

[0.2, 0.8]

## Adding custom plotting and interpretation

PyTorch Forecasting supports plotting of predictions and interpretations. The figures can also be logged as part of monitoring training progress using tensorboard. Sometimes, the output of the network cannot be directly plotted together with the actually observed time series. In these cases (such as our `FullyConnectedForDistributionLossModel` from the previous section), we need to fix the plotting function. Further, sometimes we want to visualize certain properties of the network every other batch or after every epoch. It is easy to make this happen with PyTorch Forecasting.

## Minimal testing of models

Testing models is essential to quickly detect problems and iterate quickly. Some issues can be only identified after lengthy training but many problems show up after one or two batches. PyTorch Lightning, on which PyTorch Forecasting is built, makes it easy to set up such tests.

In [53]:
from pytorch_lightning import Trainer

model = FullyConnectedForDistributionLossModel.from_dataset(dataset, hidden_size=10, n_hidden_layers=2, log_interval=1)
trainer = Trainer(fast_dev_run=True)
trainer.fit(model, train_dataloader=dataloader, val_dataloaders=dataloader)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
Running in fast_dev_run mode: will run a full train, val and test loop using a single batch

  | Name            | Type                            | Params
--------------------------------------------------------------------
0 | loss            | NormalDistributionLoss          | 0     
1 | logging_metrics | ModuleList                      | 0     
2 | network         | FullyConnectedMultiOutputModule | 324   


Epoch 0:   0%|          | 0/2 [28:47<?, ?it/s] 
Epoch 0:  50%|█████     | 1/2 [00:00<00:00,  6.90it/s, loss=-0.155, v_num=9, train_loss_step=-.155]
Validating: 0it [00:00, ?it/s][A
Epoch 0: 100%|██████████| 2/2 [00:00<00:00,  6.68it/s, loss=-0.155, v_num=9, train_loss_step=-.155, val_loss=0.136]
Epoch 0: 100%|██████████| 2/2 [00:00<00:00,  6.53it/s, loss=-0.155, v_num=9, train_loss_step=-.155, val_loss=0.136]


1