# Forecasting, hierarchical data, tuning, and more

## Introduction

In this notebook, we will cover more advanced forecasting topics, specially focused on hierarchical data, tuning, and reconciliation.
We will use sales data from [this kaggle dataset](https://www.kaggle.com/datasets/utathya/future-volume-prediction?resource=download), which contains sales data for different products (SKUs) and agencies.

## Agenda

1. Simple forecasting with builtin parallelization
2. Reconciliation
3. Tuning with Optuna
4. Advanced patterns in hierarchical forecasting
5. Benchmarking


In [2]:
import warnings
import logging

warnings.filterwarnings("ignore")
logger = logging.getLogger('cmdstanpy')
logger.setLevel(logging.ERROR)

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Loading and preparing the data

The dataset is a 3-level hierarchical time series, with the following levels:

1. Total sales for all SKUs and agencies
2. Sales for each agency
3. Sales for each SKU in each agency


![hierarchy diagram](img/mermaid_hierarchy.png)

In sktime, we use pandas multiindex to represent the hierarchy, where each level in the index represent a level in the hierarchy. The last level is reserved to the time index.

In [4]:
from utils import load_stallion

_, y = load_stallion()
y

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2013-01,80.676
Agency_01,SKU_01,2013-02,98.064
Agency_01,SKU_01,2013-03,133.704
Agency_01,SKU_01,2013-04,147.312
Agency_01,SKU_01,2013-05,175.608
...,...,...,...
Agency_60,SKU_23,2017-08,1.980
Agency_60,SKU_23,2017-09,1.260
Agency_60,SKU_23,2017-10,0.990
Agency_60,SKU_23,2017-11,0.090


### Aggregating the data

Since the dataset do not come with totals for each level, we will need to add them.
It can be easily done with `Aggregator` transformer from sktime.

In [5]:
from sktime.transformations.hierarchical.aggregate import Aggregator

y = Aggregator().fit_transform(y)
y

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2013-01,80.676000
Agency_01,SKU_01,2013-02,98.064000
Agency_01,SKU_01,2013-03,133.704000
Agency_01,SKU_01,2013-04,147.312000
Agency_01,SKU_01,2013-05,175.608000
...,...,...,...
__total,__total,2017-08,599553.665250
__total,__total,2017-09,556966.701300
__total,__total,2017-10,542554.007475
__total,__total,2017-11,457914.412950


### Some useful pandas multiindex operations

Multiindex is a powerful tool in pandas, and knowing its operations can be very useful when working with hierarchical data.

In [None]:
y.index

MultiIndex([('Agency_01',  'SKU_01', '2013-01'),
            ('Agency_01',  'SKU_01', '2013-02'),
            ('Agency_01',  'SKU_01', '2013-03'),
            ('Agency_01',  'SKU_01', '2013-04'),
            ('Agency_01',  'SKU_01', '2013-05'),
            ('Agency_01',  'SKU_01', '2013-06'),
            ('Agency_01',  'SKU_01', '2013-07'),
            ('Agency_01',  'SKU_01', '2013-08'),
            ('Agency_01',  'SKU_01', '2013-09'),
            ('Agency_01',  'SKU_01', '2013-10'),
            ...
            (  '__total', '__total', '2017-03'),
            (  '__total', '__total', '2017-04'),
            (  '__total', '__total', '2017-05'),
            (  '__total', '__total', '2017-06'),
            (  '__total', '__total', '2017-07'),
            (  '__total', '__total', '2017-08'),
            (  '__total', '__total', '2017-09'),
            (  '__total', '__total', '2017-10'),
            (  '__total', '__total', '2017-11'),
            (  '__total', '__total', '2017-12')],
   

In [None]:
y.index.get_level_values(0)

Index(['Agency_01', 'Agency_01', 'Agency_01', 'Agency_01', 'Agency_01',
       'Agency_01', 'Agency_01', 'Agency_01', 'Agency_01', 'Agency_01',
       ...
       '__total', '__total', '__total', '__total', '__total', '__total',
       '__total', '__total', '__total', '__total'],
      dtype='object', name='agency', length=24540)

In [None]:
y.loc[("Agency_01", "SKU_01"),].head()

Unnamed: 0_level_0,volume
date,Unnamed: 1_level_1
2013-01,80.676
2013-02,98.064
2013-03,133.704
2013-04,147.312
2013-05,175.608


In [None]:
y.loc[pd.IndexSlice[:, "SKU_01"],].head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2013-01,80.676
Agency_01,SKU_01,2013-02,98.064
Agency_01,SKU_01,2013-03,133.704
Agency_01,SKU_01,2013-04,147.312
Agency_01,SKU_01,2013-05,175.608


In [None]:
y.index.get_level_values(-1)

PeriodIndex(['2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06',
             '2013-07', '2013-08', '2013-09', '2013-10',
             ...
             '2017-03', '2017-04', '2017-05', '2017-06', '2017-07', '2017-08',
             '2017-09', '2017-10', '2017-11', '2017-12'],
            dtype='period[M]', name='date', length=24540)

In [None]:
y.index.droplevel(-1).unique()

MultiIndex([('Agency_01',  'SKU_01'),
            ('Agency_01',  'SKU_02'),
            ('Agency_01',  'SKU_03'),
            ('Agency_01',  'SKU_04'),
            ('Agency_01',  'SKU_05'),
            ('Agency_01',  'SKU_11'),
            ('Agency_01', '__total'),
            ('Agency_02',  'SKU_01'),
            ('Agency_02',  'SKU_02'),
            ('Agency_02',  'SKU_03'),
            ...
            ('Agency_59', '__total'),
            ('Agency_60',  'SKU_01'),
            ('Agency_60',  'SKU_02'),
            ('Agency_60',  'SKU_03'),
            ('Agency_60',  'SKU_04'),
            ('Agency_60',  'SKU_05'),
            ('Agency_60',  'SKU_07'),
            ('Agency_60',  'SKU_23'),
            ('Agency_60', '__total'),
            (  '__total', '__total')],
           names=['agency', 'sku'], length=409)

### Train-test split and visualization

In [6]:
from sktime.forecasting.model_selection import temporal_train_test_split

y_train, y_test = temporal_train_test_split(y, test_size=18)

test_fh = y_test.index.get_level_values(-1).unique()
test_fh

PeriodIndex(['2016-07', '2016-08', '2016-09', '2016-10', '2016-11', '2016-12',
             '2017-01', '2017-02', '2017-03', '2017-04', '2017-05', '2017-06',
             '2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12'],
            dtype='period[M]', name='date')

In [7]:
from utils import display_hierarchical_timeseries

display_hierarchical_timeseries(y_train, y_test)

interactive(children=(Dropdown(description='Level 0:', options=('Agency_01', 'Agency_02', 'Agency_03', 'Agency…

## 1. Upcasting and Parallelization

Instead of needing to manually iterate over the series, we can use the builtin parallelization to handle this 🙂.

When a univariate forecasting model is fitted to a hierarchical time series, one model copy is created for each series in the hierarchy and fitted separately. All models share the same hyperparameter.

In [14]:
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.fbprophet import Prophet

import logging

logger = logging.getLogger("cmdstanpy")
logger.addHandler(logging.NullHandler())
logger.propagate = False
logger.setLevel(logging.CRITICAL)

### Upcasting without parallelization

In [15]:
model = Prophet()

model.fit(y_train)
model.predict(fh=test_fh)

Importing plotly failed. Interactive plots will not work.


### With parallelization

Since this set of models is independent, we can fit them in parallel, which can be done by setting the config before calling fit.

**Warning**: if the model you are using already uses joblib, the parallelization won't work.

In [16]:
parallel_config = {
        "backend:parallel": "joblib",
        "backend:parallel:params": {"backend": "loky", "n_jobs": -1},
    }

In [17]:
model = Prophet()

model.set_config(**parallel_config)

model.fit(y_train)

Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.


You can define more specific parallelization configurations, and use dask for example. Check the [docs](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.base.BaseForecaster.html#sktime.forecasting.base.BaseForecaster.set_config) for more information.

In [18]:
y_pred = model.predict(fh=test_fh)
y_pred

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2016-07,34.567985
Agency_01,SKU_01,2016-08,82.860599
Agency_01,SKU_01,2016-09,62.245607
Agency_01,SKU_01,2016-10,76.752518
Agency_01,SKU_01,2016-11,13.102124
...,...,...,...
__total,__total,2017-08,549435.017496
__total,__total,2017-09,485786.520166
__total,__total,2017-10,522701.951932
__total,__total,2017-11,474187.879867


In [19]:
display_hierarchical_timeseries(y_train, y_test, {"Prophet": y_pred})

interactive(children=(Dropdown(description='Level 0:', options=('Agency_01', 'Agency_02', 'Agency_03', 'Agency…

After fitting, we can easily access fitted models and their parameters using `get_fitted_params`. The inner forecasters are stored in pandas dataframe in a structure similar to the timeseries.

In [20]:
fitted_params = model.get_fitted_params()
fitted_params["forecasters"]

Unnamed: 0,Unnamed: 1,forecasters
Agency_01,SKU_01,Prophet()
Agency_01,SKU_02,Prophet()
Agency_01,SKU_03,Prophet()
Agency_01,SKU_04,Prophet()
Agency_01,SKU_05,Prophet()
...,...,...
Agency_60,SKU_05,Prophet()
Agency_60,SKU_07,Prophet()
Agency_60,SKU_23,Prophet()
Agency_60,__total,Prophet()


## 2. Reconciliation

A common problem in hierarchical forecasting in obtaining a `coherent` forecast to share. Probably, your forecasts won't be _coherent_ with respect to the hierarchy. If we take the sum of the bottom levels, it won't be equal to our total forecast.


In [21]:
def get_difference_between_total_and_bottom_up(y_pred):
    bottom_up = (
        Aggregator().fit_transform(y_pred).loc[("__total", "__total"), "volume"]
    )
    total_forecast = y_pred.loc[("__total", "__total"), "volume"]
    difference = total_forecast - bottom_up
    return difference

get_difference_between_total_and_bottom_up(y_pred).head()

date
2016-07    501.557149
2016-08    301.663973
2016-09    715.300614
2016-10     92.243533
2016-11    360.956793
Freq: M, Name: volume, dtype: float64


This difference means two things:

1. By definition, one of them is wrong.
2. The users of the forecasts will be confused.

There are, fortunately, techniques to fix this. We call them `reconciliation` techniques and they are easy to use in sktime.


In [22]:
from sktime.transformations.hierarchical.reconcile import Reconciler

reconciler = Reconciler()
y_pred_reconciled = reconciler.fit_transform(y_pred)
y_pred_reconciled.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2016-07,34.567985
Agency_01,SKU_01,2016-08,82.860599
Agency_01,SKU_01,2016-09,62.245607
Agency_01,SKU_01,2016-10,76.752518
Agency_01,SKU_01,2016-11,13.102124


In [23]:
get_difference_between_total_and_bottom_up(y_pred_reconciled).head()

date
2016-07   -2.328306e-10
2016-08   -2.328306e-10
2016-09    0.000000e+00
2016-10    1.164153e-10
2016-11    1.164153e-10
Freq: M, Name: volume, dtype: float64

### How does reconciliation work?


* The hierarchy constrains are a set of linear constraints, and the `coherent` forecasts actually lie in a hyperplane defined by these constraints

* The general idea is to project the base forecasts onto this hyperplane.

* This generates a new forecast that can improve base ones by sharing information across nodes.


In [24]:
Reconciler(method="bu").fit_transform(y_pred)
Reconciler(method="ols").fit_transform(y_pred)
Reconciler(method="wls_str").fit_transform(y_pred)
Reconciler(method="td_fcst").fit_transform(y_pred)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2016-07,34.112841
Agency_01,SKU_01,2016-08,82.065487
Agency_01,SKU_01,2016-09,61.643588
Agency_01,SKU_01,2016-10,76.026624
Agency_01,SKU_01,2016-11,12.675994
...,...,...,...
__total,__total,2017-08,549435.017496
__total,__total,2017-09,485786.520166
__total,__total,2017-10,522701.951932
__total,__total,2017-11,474187.879867


### Optimal reconciliation

And we can use `ReconcilerForecaster` to do more advanced reconciliation, using `mint_shrink`, `mint_cov` or `wls_var` methods, that consider the errors of each model to adjust the forecasts.

Disadvantages:

* May be unstable when there are constant series
* May not respect positivity constraints if this is important for your application

In [25]:
from sktime.forecasting.reconcile import ReconcilerForecaster
from sktime.forecasting.naive import NaiveForecaster

reconciler_model = ReconcilerForecaster(
    forecaster=Prophet().set_config(**parallel_config),
    method="mint_shrink",
)

reconciler_model.fit(y_train)

In [26]:
reconciler_model.get_fitted_params(deep=False)

{'forecaster': Prophet(),
 'residuals':                                  volume
 agency    sku     date                 
 Agency_01 SKU_01  2013-01    -10.362352
                   2013-02    -15.338475
                   2013-03    -20.481117
                   2013-04      9.760680
                   2013-05     45.244104
 ...                                 ...
 __total   __total 2016-02   -691.201695
                   2016-03  14099.975581
                   2016-04   -440.371876
                   2016-05  -5486.103783
                   2016-06  -1369.538206
 
 [17178 rows x 1 columns]}

## 3. Tuning hyperparameters with Optuna

* Optuna is a hyperparameter optimization framework that supports many sampling strategies. 
* Default is Tree of Parzen Estimators (TPE), which is a Bayesian-like optimization algorithm.
* Sktime also has implemented grid search, randomized search, skopt interface.

In [27]:
# The tuner forecaster
from sktime.forecasting.model_selection import ForecastingOptunaSearchCV
# The cross-validation strategy for the tuner
from sktime.split import ExpandingWindowSplitter
# Optuna interface for defining the search space
from optuna.distributions import (
    CategoricalDistribution
)

First, we need to define with cross validation strategy we will use to evaluate the models. In this case, we will use `ExpandingWindowSplitter`.

In [28]:
cv = ExpandingWindowSplitter(fh=[1, 2, 3, 4], initial_window=36, step_length=24)

Then, create the tuning forecaster, by passing the model and the parameter grid. The number of evaluations can be defined with `n_evals` parameter.

In [29]:
tuning_model = ForecastingOptunaSearchCV(
    forecaster=model * Reconciler(),
    param_grid={
        "method": CategoricalDistribution(
            ["ols", "bu", "td_fcst"]
        ),
    },
    cv=cv,
    n_evals=2
)

tuning_model.fit(y_train)

[I 2024-08-23 09:03:15,132] A new study created in memory with name: no-name-7b3ae9d8-a192-4009-bb59-9cae712d3f9d


In [31]:
fitted_params = tuning_model.get_fitted_params(deep=False)
fitted_params["cv_results"]

Unnamed: 0,number,mean_test_MeanAbsolutePercentageError,datetime_start,datetime_complete,duration,params_method,state,params,rank_test_MeanAbsolutePercentageError
0,0,940551800000000.0,2024-08-23 09:03:15.134247,2024-08-23 09:03:28.667969,0 days 00:00:13.533722,bu,COMPLETE,{'method': 'bu'},1.0
1,1,1.622193e+16,2024-08-23 09:03:28.668039,2024-08-23 09:03:42.434539,0 days 00:00:13.766500,ols,COMPLETE,{'method': 'ols'},2.0


In [32]:

fitted_params["best_params"]

{'method': 'bu'}

In [33]:
best_forecaster = fitted_params["best_forecaster"]
best_forecaster

## 4. Advanced patterns in hierarchical forecasting

In [34]:
from sktime.forecasting.compose import HierarchyEnsembleForecaster
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.arima import AutoARIMA

ensemble_by_level = HierarchyEnsembleForecaster(
    # We pass a tuple of (name, forecaster, level) for each level
    # The name could be anything, but it should be unique
    forecasters=[("total", Prophet(), 0),
                 ("agency", AutoARIMA(), 1),
                 ("sku", ExponentialSmoothing(), 2)],
    by="level",
)

ensemble_by_level.fit(y_train)

In [35]:
ensemble_by_level.predict(test_fh)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2016-07,34.567985
Agency_01,SKU_01,2016-08,82.860599
Agency_01,SKU_01,2016-09,62.245607
Agency_01,SKU_01,2016-10,76.752518
Agency_01,SKU_01,2016-11,13.102124
...,...,...,...
__total,__total,2017-08,549435.017496
__total,__total,2017-09,485786.520166
__total,__total,2017-10,522701.951932
__total,__total,2017-11,474187.879867


In [36]:
ensemble_by_level.forecasters_

[('level_0', Prophet(), 0),
 ('level_1', Prophet(), 1),
 ('level_2', Prophet(), 2)]

### Finding the best model for each level

* Possibly, the best model for aggregated levels isn't the best for the bottom levels.
* We can use `MultiplexForecaster`, `HierarchyEnsembleForecaster` and optuna to do some more advanced tuning.

In [37]:
from sktime.forecasting.compose import MultiplexForecaster
from sktime.forecasting.naive import NaiveForecaster

multiplex_forecaster = MultiplexForecaster(
    forecasters=[
        ("ets", ExponentialSmoothing(trend="add", sp=12)),
        ("prophet", Prophet(freq="M")),
        ("naive", NaiveForecaster(strategy="last"))
    ],
    selected_forecaster="ets"
)

multiplex_forecaster

In [38]:
from sktime.forecasting.compose import ForecastByLevel

multiplex_ensemble = HierarchyEnsembleForecaster(
    forecasters=[
        ("total", multiplex_forecaster, 0),
        ("agency", multiplex_forecaster, 1),
        ("sku", multiplex_forecaster, 2),
    ],
    by="level",
)

multiplex_ensemble.set_config(**parallel_config)


In [46]:
tune_ensemble = ForecastingOptunaSearchCV(
    forecaster=multiplex_ensemble,
    param_grid={
        "total__selected_forecaster": CategoricalDistribution(
            ["ets", "prophet", "naive"]
        ),
        "agency__selected_forecaster": CategoricalDistribution(
            ["ets", "prophet", "naive"]
        ),
        "sku__selected_forecaster": CategoricalDistribution(
            ["ets", "prophet", "naive"]
        ),
    },
    cv=cv,
    n_evals=10,
    error_score="raise",
)


tune_ensemble.fit(y_train)

[I 2024-08-23 09:32:21,688] A new study created in memory with name: no-name-2c87de14-42dd-4816-9870-b60dd5ec2082


In [47]:
tune_ensemble.best_params_

{'level_0__selected_forecaster': 'prophet',
 'level_1__selected_forecaster': 'prophet',
 'level_2__selected_forecaster': 'ets'}

## 5. Benchmarking

* `evaluate` provides a simple way to evaluate the performance of the forecasting strategy
* When forecasting hierarchies, it is important to know the `multilevel` argument in the metrics (e.g. `uniform_average` or `uniform_average_time`).

In [48]:
from sktime.split import ExpandingWindowSplitter

cv = ExpandingWindowSplitter(
    initial_window=len(y_train.index.get_level_values(-1).unique()),
    fh=[0, 1, 2, 3],
)

In [49]:
from sktime.forecasting.model_evaluation import evaluate
from sktime.performance_metrics.forecasting import MeanSquaredScaledError

results = evaluate(
    tune_ensemble.best_forecaster_,
    cv=cv,
    y=y,
    scoring=MeanSquaredScaledError(multilevel="uniform_average")
)



In [50]:
results

Unnamed: 0,test_MeanSquaredScaledError,fit_time,pred_time,len_train_window,cutoff
0,2.449228e+17,12.417812,1.286826,17178,2016-06
1,324977600000000.0,12.168213,1.246374,17587,2016-07
2,5512198000000000.0,12.501674,1.26625,17996,2016-08
3,1.648265e+16,12.115915,1.292988,18405,2016-09
4,1.979087e+16,11.951772,1.49997,18814,2016-10
5,125661000000000.0,12.110751,1.260057,19223,2016-11
6,23349690000000.0,12.333579,1.352688,19632,2016-12
7,41500420000000.0,12.558406,1.252721,20041,2017-01
8,79734230000000.0,12.867914,1.299742,20450,2017-02
9,27.35448,12.198798,1.25196,20859,2017-03


## Recap

- Broadcasting and parallelization to get strong baselines fast with sktime
- Reconciliation to get coherent forecasts
- Tuning with Optuna
- Scaling to advanced patterns and tuning
- Benchmarking

## Next:

- Global Forecasting
- Creating 2nd party libraries

## Credits notebook 2:

- Notebook creation: felipeangelimvieira, fkiraly
- Sktime forecasting module: [many contributors](https://www.sktime.net/en/latest/about/contributors.html)