# Forecasting with Deep Learning models using GluonTS

GitHub repo: https://github.com/awslabs/gluon-ts

Documentation: https://ts.gluon.ai

In [None]:
!pip install -U "gluonts[torch]~=0.13.2" matplotlib orjson tensorboard optuna datasets

GluonTS is a Python library for deep learning based forecasting models. It provides:

1. Model implementations (initially in MXNet, now moving to PyTorch)
    
    - DeepAR (RNN, sampling based)
    
    - MQ-CNN (CNN encoder + MLP decoder, quantile regression based)
    
    - WaveNet (data quantization + dilated convolutions, sampling based)
    
    - Transformer-based architectures (vanilla encoder/decoder transformer, TFT, PatchTST)


1. Tools to construct data pipelines for the models
    
    - Missing value imputation and masking

    - Adding calendar features

    - Sampling and batching training instances

    - Different forecasts types (e.g. samples vs quantiles)


1. Evaluation utils
    
    - Splitting data for training/validation/test
    
    - Evaluating common metrics


1. Dataset for experiments


1. Model "infrastructure"
    
    - Serialization/deserialization of full model pipeline
    
    - Docker container to train/deploy model in the cloud (e.g. Amazon SageMaker)

<img src="https://github.com/lostella/isf-deep-learning-workshop/blob/main/notebooks/figures/flow.png?raw=true" alt="flow" width="50%"/>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from itertools import islice
from pathlib import Path
from pprint import pprint
from tqdm import tqdm

In [None]:
from gluonts.dataset.pandas import PandasDataset

## M5 dataset

Case study via the [M5 forecasting competition dataset](https://www.kaggle.com/competitions/m5-forecasting-accuracy). M-competitions named after Spyros Makridakis, currently in their [6th edition](https://mofc.unic.ac.cy/the-m6-competition/). M5 data provided by Walmart. We assume the data set is downloaded locally (we can't provide it for Kaggle licensing).


* 42,840 hierarchical time series, 3049 products from 3 categories, 7 departments

* 3 US states: California (CA), Texas (TX), and Wisconsin (WI), 10 stores

* “Hierarchical” levels: item level, department level, product category level, and state level.

* Daily sales: Jan 2011 to June 2016. 

* included co-variates: prices, promotions, and holidays. 

* no missing values

### Loading the data

We use mainly standard pandas to load and manipulate data, for GluonTS models to use.

Goal: bring data into a format that GluonTS models can consume, for training or for prediction. That is, a collection of `dict` with
* `start` attribute for timestamp (`pd.Period`)
* `target` attribute for the sequence we want to model (`np.ndarray`)
* other attributes for features (in this example, `feat_dynamic_real` and `feat_static_cat`)

In [None]:
m5_files_path = Path("m5-forecasting-accuracy")

In [None]:
cal = pd.read_csv(m5_files_path / "calendar.csv")
weekly_prices = pd.read_csv(m5_files_path / "sell_prices.csv")
sales_and_features = pd.read_csv(m5_files_path / "sales_train_validation.csv")

In [None]:
assert len(sales_and_features["item_id"].unique()) == 3049
assert len(sales_and_features["store_id"].unique()) == 10
assert len(sales_and_features) == 30490

In [None]:
sales_and_features

Let's take a subset of this to make things a bit faster:

In [None]:
sales_and_features = sales_and_features[sales_and_features.dept_id == "FOODS_3"]

We want to split the data into static (categorical features) vs dynamic (sales data). We keep the 'id' column in both, to be able to join the two. We also keep 'item_id' and 'store_id' in the sales data, to be able to join with prices later.

In [None]:
features_columns = ["id", "dept_id", "cat_id", "store_id", "state_id"]
sales_columns = ["id", "item_id", "store_id"] + [f"d_{k}" for k in range(1, 1914)]

Split data into static (categorical features) vs dynamic (sales data).

In [None]:
features = sales_and_features[features_columns].set_index("id").astype("category")
sales = sales_and_features[sales_columns]

Turn sales data into long format, to join with prices more easily.

In [None]:
sales_long = sales.melt(id_vars=["id", "item_id", "store_id"], var_name="d", value_name="sales")

In [None]:
weekly_prices

To join sales data with prices, first we add the `"wm_yr_wk"` column from `cal`. We also add the `"date"` column to build the time index. Then we join with `weekly_prices` on `"store_id"`, `"item_id"`, `"wm_yr_wk"`, to get the `"sell_price"` column in.

In [None]:
temp = sales_long.merge(
    cal[["d", "wm_yr_wk", "date"]], on="d", how="left", suffixes=(None, "_right")
)

In [None]:
sales_with_prices = temp.merge(weekly_prices, on=["store_id", "item_id", "wm_yr_wk"], how="left", suffixes=(None, "_right"))

In [None]:
sales_with_prices.index = pd.to_datetime(sales_with_prices["date"])

In [None]:
len(sales_with_prices)

Some rows have missing price, which means the item was not for sale. Let's replace price there with some constant, and add a column indicating whether the product was for sale.

In [None]:
sales_with_prices["for_sale"] = sales_with_prices["sell_price"].notna()
sales_with_prices["sell_price"].fillna(0.0, inplace=True)

Also we want to keep our target and feature columns as float32, to be compatible with the model later.

In [None]:
sales_with_prices["sales"] = sales_with_prices["sales"].astype(np.float32)
sales_with_prices["sell_price"] = sales_with_prices["sell_price"].astype(np.float32)
sales_with_prices["for_sale"] = sales_with_prices["for_sale"].astype(np.float32)

In [None]:
sales_with_prices

We're ready to construct our dataset object.

In [None]:
from gluonts.dataset.pandas import PandasDataset

In [None]:
dataset = PandasDataset.from_long_dataframe(
    sales_with_prices,
    item_id="id",
    target="sales",
    feat_dynamic_real=["sell_price", "for_sale"],
    static_features=features,
)

In [None]:
len(dataset)

In [None]:
dataset

In [None]:
for entry in dataset:
    pprint(entry)
    break

Let's store some metadata and turn the dataset into a list: this will be faster to iterate compared to `PandasDataset` (good for model training and evaluation).

In [None]:
num_feat_dynamic_real = dataset.num_feat_dynamic_real
static_cardinalities = dataset.static_cardinalities.tolist()

In [None]:
dataset = list(dataset)

## A transformer model

We will train a transformer-based architecture ([Temporal Fusion Transformer model](https://arxiv.org/abs/1912.09363), TFT) on the above data.

Models in GluonTS are exposed as "estimator" objects. These define the full model pipeline:

* data pre-processing (replacing missing values in the data, adding other calendar-related features, ...)

* how data is sampled for training

* the specific deep learning model to use

* any post-processing to the model output to turn it into a forecast

An estimator is trained with a training and validation datasets, and produces a "predictor" that contains the trained model to be used for prediction.

<img src="https://github.com/lostella/isf-deep-learning-workshop/blob/main/notebooks/figures/flow.png?raw=true" alt="flow" width="50%"/>

In [None]:
from gluonts.torch.model.tft import TemporalFusionTransformerEstimator
from pytorch_lightning.loggers import TensorBoardLogger

In [None]:
estimator = TemporalFusionTransformerEstimator(
    freq="1D",
    prediction_length=7,
    context_length=180,
    quantiles=[0.1, 0.5, 0.9],
    static_cardinalities=static_cardinalities,
    dynamic_dims=[num_feat_dynamic_real],
    batch_size=32,
    trainer_kwargs={
        "max_epochs": 20,
        "logger": TensorBoardLogger("tb_logs"),
    }
)

### Split data for training and evaluation

In [None]:
from gluonts.dataset.split import split

training_dataset, test_gen = split(dataset, offset=-21)

![Link Name](https://github.com/lostella/isf-deep-learning-workshop/blob/main/notebooks/figures/split1.png?raw=true)

In [None]:
test_data = test_gen.generate_instances(prediction_length=7, windows=3)

![Link Name](https://github.com/lostella/isf-deep-learning-workshop/blob/main/notebooks/figures/split2.png?raw=true)
![Link Name](https://github.com/lostella/isf-deep-learning-workshop/blob/main/notebooks/figures/split3.png?raw=true)
![Link Name](https://github.com/lostella/isf-deep-learning-workshop/blob/main/notebooks/figures/split4.png?raw=true)

Again to keep runtime low, let's only generate a single backtest window per each series in the dataset.

In [None]:
test_data = test_gen.generate_instances(prediction_length=7, windows=1)

### Model training

In [None]:
predictor = estimator.train(training_dataset)

In [None]:
predictor

### Saving a loading models

In [None]:
model_path = Path("tft_predictor")
model_path.mkdir(exist_ok=True)

predictor.serialize(model_path)

In [None]:
from gluonts.model import Predictor
predictor = Predictor.deserialize(model_path)

In [None]:
predictor

### What's inside a model

* a [`torch.nn.module` class](https://github.com/awslabs/gluonts/blob/3ccb6d377a5bf9b27de74a47cdab295f4d61f7a7/src/gluonts/torch/model/tft/module.py#L35), implementing the network iteself

* a [`pytorch_lightning.LightningModule`](https://github.com/awslabs/gluonts/blob/3ccb6d377a5bf9b27de74a47cdab295f4d61f7a7/src/gluonts/torch/model/tft/lightning_module.py#L24) defines how the model is to be trained

* a [data preprocessing pipeline](https://github.com/awslabs/gluonts/blob/3ccb6d377a5bf9b27de74a47cdab295f4d61f7a7/src/gluonts/torch/model/tft/estimator.py#L211-L286) is used to construct batches to feed the network

## Forecasting, evaluating, comparing

We will plot forecasts, evaluate accuracy and identify worst-cases, compare models.

In [None]:
forecasts_tft = list(predictor.predict(test_data.input))

In [None]:
forecasts_tft[0]

In [None]:
import matplotlib.pyplot as plt
from gluonts.dataset.util import to_pandas

for (input, target), forecast in islice(zip(test_data, forecasts_tft), 3):
    plt.figure()
    plt.plot(to_pandas(input)[-100:].to_timestamp())
    plt.plot(to_pandas(target).to_timestamp())
    forecast.plot(intervals=(0.3, 0.8), color="green")
    # break

### Evaluating and comparing models

In [None]:
from gluonts.ev.metrics import RMSE, MASE, MeanWeightedSumQuantileLoss
from gluonts.model.evaluation import evaluate_forecasts

In [None]:
evaluate_forecasts(
    forecasts_tft,
    test_data=test_data,
    metrics=[RMSE(), MASE(), MeanWeightedSumQuantileLoss([0.1, 0.5, 0.9])],
    seasonality=1,
    axis=1  # aggregate over time axis
)

Omitting the `axis` we get metrics aggregate over all axes (time + dataset dimensions).

In [None]:
metrics_tft = evaluate_forecasts(
    forecasts_tft,
    test_data=test_data,
    metrics=[RMSE(), MASE(), MeanWeightedSumQuantileLoss([0.1, 0.5, 0.9])],
    seasonality=1,
)
metrics_tft

Let's do the same for a baseline model (naive) and compare accuracy.

In [None]:
from gluonts.model.seasonal_naive import SeasonalNaivePredictor

forecasts_naive = list(tqdm(
    SeasonalNaivePredictor(freq="D", prediction_length=7, season_length=1).predict(test_data.input)
))

metrics_naive = evaluate_forecasts(
    forecasts_naive,
    test_data=test_data,
    metrics=[RMSE(), MASE(), MeanWeightedSumQuantileLoss([0.1, 0.5, 0.9])],
    seasonality=1,
)

In [None]:
df = pd.concat({"TFT": metrics_tft, "Naive": metrics_naive})
df

## Hyperparameter Tuning

Tuning the model hyperparameters (a.g. architectural choices, number of layers, hidden layers sizes, etc.) is often important to get the best results.

GluonTS **does not** provide model tuning features out of the box, but interfaces easily with dedicated packages.

In [None]:
import optuna

In [None]:
def tft_tuning_objective(trial):
    # get suggested hyperparameters values
    context_length = trial.suggest_int("context_length", 30, 180)
    variable_dim = trial.suggest_int("variable_dim", 10, 50)

    # set up model
    estimator = TemporalFusionTransformerEstimator(
        freq="1D",
        prediction_length=7,
        context_length=context_length,
        quantiles=[0.1, 0.5, 0.9],
        static_cardinalities=static_cardinalities,
        dynamic_dims=[num_feat_dynamic_real],
        variable_dim=variable_dim,
        batch_size=32,
        trainer_kwargs={
            "max_epochs": 5,  # TODO set larger
        }
    )

    # train model
    predictor = estimator.train(training_dataset)

    # predict
    forecasts = list(predictor.predict(test_data.input))

    # evaluate model
    df = evaluate_forecasts(forecasts, test_data=test_data, metrics=[MASE()], seasonality=1)
    return df["MASE"].iloc[0]
    

In [None]:
study = optuna.create_study()

In [None]:
res = study.optimize(tft_tuning_objective, n_trials=5)

## Other datasets for experiments

It is important to validate the performance of a model class against multiple datasets. This is especially true when working on novel architectures, or adapting architectures from other domains (NLP, computer vision) to time series.

Examples of available public datasets include the [Monash Time Series Repository](https://forecastingdata.org/), and there are several other available.

Many of these are accessible directly through GluonTS or HuggingFace:

### GluonTS dataset repository

In [None]:
from gluonts.dataset.repository import get_dataset, dataset_names

In [None]:
len(dataset_names)

In [None]:
dataset_names[:10]

In [None]:
solar = get_dataset("solar-energy")

In [None]:
solar.metadata

In [None]:
for entry in solar.train:
    print(entry)
    break

In [None]:
for entry in solar.test:
    print(entry)
    break

### HuggingFace datasets

In [None]:
from gluonts.dataset.common import ListDataset
from datasets import load_dataset

traffic = load_dataset("monash_tsf", "traffic_hourly")
dataset_training = ListDataset(traffic["train"], freq="H")
dataset_testing = ListDataset(traffic["test"], freq="H")

In [None]:
for entry in dataset_training:
    print(entry)
    break

In [None]:
for entry in dataset_testing:
    print(entry)
    break