# Forecasting, hierarchical data, tuning, and more

In this notebook, we will cover more advanced forecasting topics, specially focused on hierarchical data, tuning, and reconciliation.
We will use sales data from [this kaggle dataset](https://www.kaggle.com/datasets/utathya/future-volume-prediction?resource=download), which contains sales data for different products (SKUs) and agencies.

## Agenda

1. Data preparation for hierarchical forecasting
2. Simple forecasting with builtin parallelization
3. Reconciliation
4. Tuning with Optuna
5. Tuning indivually for each timeseries
6. Benchmarking


In [1]:
import warnings
import logging

warnings.filterwarnings("ignore")
logger = logging.getLogger('cmdstanpy')
logger.setLevel(logging.ERROR)

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Loading and preparing the data

The dataset is a 3-level hierarchical time series, with the following levels:

1. Total sales for all SKUs and agencies
2. Sales for each agency
3. Sales for each SKU in each agency


```mermaid
graph TD
    Root["__total"] --> Agency_01
    Root --> Agency_02
    Root --> Agency_60
    
    Agency_01 --> SKU_01_A01["SKU_01"]
    Agency_01 --> SKU_02_A01["SKU_02"]
    Agency_01 --> SKU_11_A01["SKU_11"]
    Agency_01 --> Agency_01_Total["__total"]
    
    Agency_02 --> SKU_01_A02["SKU_01"]
    Agency_02 --> SKU_02_A02["SKU_02"]
    Agency_02 --> SKU_03_A02["SKU_03"]
    Agency_02 --> Agency_02_Total["__total"]
    
    Agency_60 --> SKU_01_A60["SKU_01"]
    Agency_60 --> SKU_02_A60["SKU_02"]
    Agency_60 --> SKU_23_A60["SKU_23"]
    Agency_60 --> Agency_60_Total["__total"]

```

In sktime, we use pandas multiindex to represent the hierarchy, where each level in the index represent a level in the hierarchy. The last level is reserved to the time index.

In [3]:
from utils import load_stallion

_, y = load_stallion()
y

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2013-01-01,80.676
Agency_01,SKU_01,2013-02-01,98.064
Agency_01,SKU_01,2013-03-01,133.704
Agency_01,SKU_01,2013-04-01,147.312
Agency_01,SKU_01,2013-05-01,175.608
...,...,...,...
Agency_60,SKU_23,2017-08-01,1.980
Agency_60,SKU_23,2017-09-01,1.260
Agency_60,SKU_23,2017-10-01,0.990
Agency_60,SKU_23,2017-11-01,0.090


### Aggregating and visualizing the data

Since the dataset do not come with totals for each level, we will need to add them.
It can be easily done with `Aggregator` transformer from sktime.

In [4]:
from sktime.transformations.hierarchical.aggregate import Aggregator

y = Aggregator().fit_transform(y)
y

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2013-01-01,80.676000
Agency_01,SKU_01,2013-02-01,98.064000
Agency_01,SKU_01,2013-03-01,133.704000
Agency_01,SKU_01,2013-04-01,147.312000
Agency_01,SKU_01,2013-05-01,175.608000
...,...,...,...
__total,__total,2017-08-01,599553.665250
__total,__total,2017-09-01,556966.701300
__total,__total,2017-10-01,542554.007475
__total,__total,2017-11-01,457914.412950


In [5]:
from sktime.forecasting.model_selection import temporal_train_test_split

y_train, y_test = temporal_train_test_split(y, test_size=18)

test_fh = y_test.index.get_level_values(-1).unique()
test_fh

DatetimeIndex(['2016-07-01', '2016-08-01', '2016-09-01', '2016-10-01',
               '2016-11-01', '2016-12-01', '2017-01-01', '2017-02-01',
               '2017-03-01', '2017-04-01', '2017-05-01', '2017-06-01',
               '2017-07-01', '2017-08-01', '2017-09-01', '2017-10-01',
               '2017-11-01', '2017-12-01'],
              dtype='datetime64[ns]', name='date', freq=None)

In [6]:
from utils import display_hierarchical_timeseries

display_hierarchical_timeseries(y_train, y_test)

interactive(children=(Dropdown(description='Level 0:', options=('Agency_01', 'Agency_02', 'Agency_03', 'Agency…

### Some useful pandas multiindex operations

Multiindex is a powerful tool in pandas, and knowing its operations can be very useful when working with hierarchical data.

In [7]:
y.index

MultiIndex([('Agency_01',  'SKU_01', '2013-01-01'),
            ('Agency_01',  'SKU_01', '2013-02-01'),
            ('Agency_01',  'SKU_01', '2013-03-01'),
            ('Agency_01',  'SKU_01', '2013-04-01'),
            ('Agency_01',  'SKU_01', '2013-05-01'),
            ('Agency_01',  'SKU_01', '2013-06-01'),
            ('Agency_01',  'SKU_01', '2013-07-01'),
            ('Agency_01',  'SKU_01', '2013-08-01'),
            ('Agency_01',  'SKU_01', '2013-09-01'),
            ('Agency_01',  'SKU_01', '2013-10-01'),
            ...
            (  '__total', '__total', '2017-03-01'),
            (  '__total', '__total', '2017-04-01'),
            (  '__total', '__total', '2017-05-01'),
            (  '__total', '__total', '2017-06-01'),
            (  '__total', '__total', '2017-07-01'),
            (  '__total', '__total', '2017-08-01'),
            (  '__total', '__total', '2017-09-01'),
            (  '__total', '__total', '2017-10-01'),
            (  '__total', '__total', '2017-11-01

In [8]:
y.index.get_level_values(0)

Index(['Agency_01', 'Agency_01', 'Agency_01', 'Agency_01', 'Agency_01',
       'Agency_01', 'Agency_01', 'Agency_01', 'Agency_01', 'Agency_01',
       ...
       '__total', '__total', '__total', '__total', '__total', '__total',
       '__total', '__total', '__total', '__total'],
      dtype='object', name='agency', length=24540)

In [9]:
y.loc[("Agency_01", "SKU_01"),].head()

Unnamed: 0_level_0,volume
date,Unnamed: 1_level_1
2013-01-01,80.676
2013-02-01,98.064
2013-03-01,133.704
2013-04-01,147.312
2013-05-01,175.608


In [10]:
y.loc[pd.IndexSlice[:, "SKU_01"],].head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,volume
agency,sku,date,Unnamed: 3_level_1
Agency_01,SKU_01,2013-01-01,80.676
Agency_01,SKU_01,2013-02-01,98.064
Agency_01,SKU_01,2013-03-01,133.704
Agency_01,SKU_01,2013-04-01,147.312
Agency_01,SKU_01,2013-05-01,175.608


In [11]:
y.index.get_level_values(-1)

DatetimeIndex(['2013-01-01', '2013-02-01', '2013-03-01', '2013-04-01',
               '2013-05-01', '2013-06-01', '2013-07-01', '2013-08-01',
               '2013-09-01', '2013-10-01',
               ...
               '2017-03-01', '2017-04-01', '2017-05-01', '2017-06-01',
               '2017-07-01', '2017-08-01', '2017-09-01', '2017-10-01',
               '2017-11-01', '2017-12-01'],
              dtype='datetime64[ns]', name='date', length=24540, freq=None)

In [12]:
y.index.droplevel(-1).unique()

MultiIndex([('Agency_01',  'SKU_01'),
            ('Agency_01',  'SKU_02'),
            ('Agency_01',  'SKU_03'),
            ('Agency_01',  'SKU_04'),
            ('Agency_01',  'SKU_05'),
            ('Agency_01',  'SKU_11'),
            ('Agency_01', '__total'),
            ('Agency_02',  'SKU_01'),
            ('Agency_02',  'SKU_02'),
            ('Agency_02',  'SKU_03'),
            ...
            ('Agency_59', '__total'),
            ('Agency_60',  'SKU_01'),
            ('Agency_60',  'SKU_02'),
            ('Agency_60',  'SKU_03'),
            ('Agency_60',  'SKU_04'),
            ('Agency_60',  'SKU_05'),
            ('Agency_60',  'SKU_07'),
            ('Agency_60',  'SKU_23'),
            ('Agency_60', '__total'),
            (  '__total', '__total')],
           names=['agency', 'sku'], length=409)

## Upcasting and Parallelization

Instead of needing to manually iterate over the series, we can use the builtin parallelization to handle this 🙂.

When a univariate forecasting model is fitted to a hierarchical time series, one model copy is created for each series in the hierarchy and fitted separately. All models share the same hyperparameter.

In [13]:
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.fbprophet import Prophet

import logging

logger = logging.getLogger("cmdstanpy")
logger.addHandler(logging.NullHandler())
logger.propagate = False
logger.setLevel(logging.CRITICAL)

### Broadcasting without parallelization

In [14]:
model = Prophet(freq="Q")

model.fit(y_train)

Importing plotly failed. Interactive plots will not work.
22:02:33 - cmdstanpy - INFO - Chain [1] start processing
22:02:33 - cmdstanpy - INFO - Chain [1] done processing
22:02:33 - cmdstanpy - INFO - Chain [1] start processing
22:02:33 - cmdstanpy - INFO - Chain [1] done processing
22:02:34 - cmdstanpy - INFO - Chain [1] start processing
22:02:34 - cmdstanpy - INFO - Chain [1] done processing
22:02:34 - cmdstanpy - INFO - Chain [1] start processing
22:02:34 - cmdstanpy - INFO - Chain [1] done processing
22:02:34 - cmdstanpy - INFO - Chain [1] start processing
22:02:34 - cmdstanpy - INFO - Chain [1] done processing
22:02:34 - cmdstanpy - INFO - Chain [1] start processing
22:02:34 - cmdstanpy - INFO - Chain [1] done processing
22:02:34 - cmdstanpy - INFO - Chain [1] start processing
22:02:34 - cmdstanpy - INFO - Chain [1] done processing
22:02:34 - cmdstanpy - INFO - Chain [1] start processing
22:02:34 - cmdstanpy - INFO - Chain [1] done processing
22:02:35 - cmdstanpy - INFO - Chain [1

KeyboardInterrupt: 

### With parallelization

Since this set of models is independent, we can fit them in parallel, which can be done by setting the config before calling fit.

In [None]:
model = Prophet(freq="Q")

model.set_config(
    **{
        "backend:parallel": "joblib",
        "backend:parallel:params": {"backend": "loky", "n_jobs": -1},
    }
)

model.fit(y_train)

You can define more specific parallelization configurations, and use dask for example. Check the [docs](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.base.BaseForecaster.html#sktime.forecasting.base.BaseForecaster.set_config) for more information.

In [None]:
y_pred = model.predict(fh=test_fh)
y_pred

After fitting, we can easily access fitted models and their parameters using `get_fitted_params`. The inner forecasters are stored in pandas dataframe in a structure similar to the timeseries.

In [None]:
fitted_params = model.get_fitted_params()
fitted_params["forecasters"]

## 2. Reconciliation

A common problem in hierarchical forecasting in obtaining a `coherent` forecast to share. Probably, your forecasts won't be _coherent_ with respect to the hierarchy. If we take the sum of the bottom levels, it won't be equal to our total forecast.


In [None]:
def get_difference_between_total_and_bottom_up(y_pred):
    bottom_up = (
        Aggregator().fit_transform(y_pred).loc[("__total", "__total"), "volume"]
    )
    total_forecast = y_pred.loc[("__total", "__total"), "volume"]
    difference = total_forecast - bottom_up
    return difference

get_difference_between_total_and_bottom_up(y_pred).head()


This difference mean two things:

1. By definition, one of them is wrong.
2. The users of the forecasts will be confused.

There are, fortunately, techniques to fix this. We call them `reconciliation` techniques and they are easy to use in sktime.


In [None]:
from sktime.transformations.hierarchical.reconcile import Reconciler

reconciler = Reconciler()
y_pred_reconciled = reconciler.fit_transform(y_pred)
y_pred_reconciled.head()

In [None]:
get_difference_between_total_and_bottom_up(y_pred_reconciled).head()

### How does reconciliation work?


* The hierarchy constrains are a set of linear constraints, and the `coherent` forecasts actually lie in a hyperplane defined by these constraints

* Sktime provides a transformer that can apply these simple constraints to our forecasts.

In [None]:
Reconciler(method="bu").fit_transform(y_pred)
Reconciler(method="ols").fit_transform(y_pred)
Reconciler(method="wls_str").fit_transform(y_pred)
Reconciler(method="td_fcst").fit_transform(y_pred)

And we can use `ReconcilerForecaster` to do more advanced reconciliation, using `mint_shrink`, `mint_cov` or `wls_var` methods, that consider the errors of each model to adjust the forecasts.

In [None]:
from sktime.forecasting.reconcile import ReconcilerForecaster

reconciler_model = ReconcilerForecaster(
    forecaster=model,
    method="mint_shrink",
)

reconciler_model.fit(y_train)

Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.


In [None]:
reconciler_model.get_fitted_params(deep=False)

{'forecaster': Prophet(freq='Q'),
 'residuals':                                     volume
 agency    sku     date                    
 Agency_01 SKU_01  2013-01-01    -10.362352
                   2013-02-01    -15.338475
                   2013-03-01    -20.481117
                   2013-04-01      9.760680
                   2013-05-01     45.244104
 ...                                    ...
 __total   __total 2016-02-01   -691.201695
                   2016-03-01  14099.975581
                   2016-04-01   -440.371876
                   2016-05-01  -5486.103783
                   2016-06-01  -1369.538206
 
 [17178 rows x 1 columns]}

## 3. Tuning hyperparameters with Optuna

* Optuna is a hyperparameter optimization framework that supports many sampling strategies. 
* Default is Tree of Parzen Estimators (TPE), which is a Bayesian-like optimization algorithm.

In [None]:
from sktime.forecasting.model_selection import ForecastingOptunaSearchCV
from sktime.split import ExpandingWindowSplitter
from optuna.distributions import (
    CategoricalDistribution,
    IntUniformDistribution,
    LogUniformDistribution,
)
from sktime.performance_metrics.forecasting import MeanSquaredScaledError

First, we need to define with cross validation strategy we will use to evaluate the models. In this case, we will use `ExpandingWindowSplitter`.

In [None]:
cv = ExpandingWindowSplitter(fh=[0, 1, 2, 3], initial_window=36, step_length=12)

In [None]:
tuning_model = ForecastingOptunaSearchCV(
    forecaster=reconciler_model,
    param_grid={
        "n_changepoints": IntUniformDistribution(2, 20),
        "method": CategoricalDistribution(
            ["mint_shrink", "mint_cov", "ols", "bu", "td_fcst"]
        ),
    },
    cv=cv,
    n_evals=2,
)

tuning_model.fit(y_train)

[I 2024-08-21 21:01:46,349] A new study created in memory with name: no-name-3ed89a00-944b-4d41-b52b-af9fc398eec3
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.


In [None]:
fitted_params = tuning_model.get_fitted_params(deep=False)
fitted_params["best_params"]

{'n_changepoints': 17, 'method': 'bu'}

In [None]:
best_forecaster = fitted_params["best_forecaster"]
best_forecaster

In [None]:
fitted_params["cv_results"]

Unnamed: 0,number,mean_test_MeanAbsolutePercentageError,datetime_start,datetime_complete,duration,params_method,params_n_changepoints,state,params,rank_test_MeanAbsolutePercentageError
0,0,940348700000000.0,2024-08-21 21:01:46.350583,2024-08-21 21:02:05.638387,0 days 00:00:19.287804,bu,17,COMPLETE,"{'n_changepoints': 17, 'method': 'bu'}",1.0
1,1,1.315362e+16,2024-08-21 21:02:05.638647,2024-08-21 21:02:15.507548,0 days 00:00:09.868901,ols,2,COMPLETE,"{'n_changepoints': 2, 'method': 'ols'}",2.0


## 4. Advanced patterns in hierarchical forecasting

In [None]:
from sktime.forecasting.compose import HierarchyEnsembleForecaster

ensemble_by_level = HierarchyEnsembleForecaster(
    forecasters=[("level_0", model, 0),
                 ("level_1", model, 1),
                 ("level_2", model, 2)],
    by="level",
)

ensemble_by_level.fit(y_train)

Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.
Importing plotly failed. Interactive plots will not work.


In [None]:
ensemble_by_level.get_params(deep=False)

{'by': 'level',
 'default': None,
 'forecasters': [('level_0', Prophet(freq='Q'), 0),
  ('level_1', Prophet(freq='Q'), 1),
  ('level_2', Prophet(freq='Q'), 2)]}

In [None]:
ensemble_by_level.forecasters_

[('level_0', Prophet(freq='Q'), 0),
 ('level_1', Prophet(freq='Q'), 1),
 ('level_2', Prophet(freq='Q'), 2)]

### Finding the best model for each level

* Possibly, the best model for aggregated levels isn't the best for the bottom levels.
* We can use `MultiplexForecaster`, `HierarchyEnsembleForecaster` and optuna to do some more advanced tuning.

In [None]:
from sktime.forecasting.compose import MultiplexForecaster
from sktime.forecasting.naive import NaiveForecaster

multiplex_forecaster = MultiplexForecaster(
    forecasters=[
        ("ets", ExponentialSmoothing(trend="add", sp=12)),
        ("prophet", Prophet()),
        ("naive", NaiveForecaster(strategy="last"))
    ]
)

multiplex_forecaster

In [None]:
from sktime.forecasting.compose import ForecastByLevel

multiplex_ensemble = HierarchyEnsembleForecaster(
    forecasters=[
        ("level_0", multiplex_forecaster, 0),
        ("level_1", multiplex_forecaster, 1),
        ("level_2", multiplex_forecaster, 2),
    ],
    by="level",
)

multiplex_ensemble


In [None]:
tune_ensemble = ForecastingOptunaSearchCV(
    forecaster=multiplex_ensemble * Reconciler(),
    param_grid={
        "HierarchyEnsembleForecaster__level_0__selected_forecaster": CategoricalDistribution(
            ["ets", "prophet", "naive"]
        ),
        "HierarchyEnsembleForecaster__level_1__selected_forecaster": CategoricalDistribution(
            ["ets", "prophet", "naive"]
        ),
        "HierarchyEnsembleForecaster__level_2__selected_forecaster": CategoricalDistribution(
            ["ets", "prophet", "naive"]
        ),
        "Reconciler__method": CategoricalDistribution(["ols", "bu", "td_fcst"]),
    },
    cv=cv,
    n_evals=2,
    error_score="raise"
)


tune_ensemble.fit(y_train)


[I 2024-08-21 21:51:27,006] A new study created in memory with name: no-name-3ecfda98-0b8e-4367-b4c0-a31abac4dfb7
21:51:27 - cmdstanpy - INFO - Chain [1] start processing
21:51:27 - cmdstanpy - INFO - Chain [1] done processing
21:51:27 - cmdstanpy - INFO - Chain [1] start processing
21:51:27 - cmdstanpy - INFO - Chain [1] done processing
21:51:27 - cmdstanpy - INFO - Chain [1] start processing
21:51:28 - cmdstanpy - INFO - Chain [1] done processing
21:51:28 - cmdstanpy - INFO - Chain [1] start processing
21:51:28 - cmdstanpy - INFO - Chain [1] done processing
21:51:28 - cmdstanpy - INFO - Chain [1] start processing
21:51:28 - cmdstanpy - INFO - Chain [1] done processing
21:51:28 - cmdstanpy - INFO - Chain [1] start processing
21:51:28 - cmdstanpy - INFO - Chain [1] done processing
21:51:28 - cmdstanpy - INFO - Chain [1] start processing
21:51:28 - cmdstanpy - INFO - Chain [1] done processing
21:51:28 - cmdstanpy - INFO - Chain [1] start processing
21:51:28 - cmdstanpy - INFO - Chain [1

In [None]:
tune_ensemble.best_params_

{'HierarchyEnsembleForecaster__level_0__selected_forecaster': 'prophet',
 'HierarchyEnsembleForecaster__level_1__selected_forecaster': 'prophet',
 'HierarchyEnsembleForecaster__level_2__selected_forecaster': 'naive',
 'Reconciler__method': 'bu'}

## 5. Benchmarking

In [None]:
from sktime.forecasting.model_evaluation import evaluate
from sktime.performance_metrics.forecasting import MeanSquaredScaledError

results = evaluate(
    tune_ensemble.best_forecaster_,
    cv=cv,
    y=y,
    scoring=MeanSquaredScaledError(multilevel="uniform_average")
)

21:57:09 - cmdstanpy - INFO - Chain [1] start processing
21:57:09 - cmdstanpy - INFO - Chain [1] done processing
21:57:10 - cmdstanpy - INFO - Chain [1] start processing
21:57:10 - cmdstanpy - INFO - Chain [1] done processing
21:57:10 - cmdstanpy - INFO - Chain [1] start processing
21:57:10 - cmdstanpy - INFO - Chain [1] done processing
21:57:10 - cmdstanpy - INFO - Chain [1] start processing
21:57:10 - cmdstanpy - INFO - Chain [1] done processing
21:57:10 - cmdstanpy - INFO - Chain [1] start processing
21:57:10 - cmdstanpy - INFO - Chain [1] done processing
21:57:10 - cmdstanpy - INFO - Chain [1] start processing
21:57:10 - cmdstanpy - INFO - Chain [1] done processing
21:57:10 - cmdstanpy - INFO - Chain [1] start processing
21:57:10 - cmdstanpy - INFO - Chain [1] done processing
21:57:10 - cmdstanpy - INFO - Chain [1] start processing
21:57:11 - cmdstanpy - INFO - Chain [1] done processing
21:57:11 - cmdstanpy - INFO - Chain [1] start processing
21:57:11 - cmdstanpy - INFO - Chain [1]

In [None]:
results

Unnamed: 0,test_MeanSquaredScaledError,fit_time,pred_time,len_train_window,cutoff
0,45813340000000.0,12.127365,3.036484,14724,2015-12-01 00:00:00
1,23349690000000.0,12.698082,2.864668,19632,2016-12-01 00:00:00


## Recap

- Broadcasting and parallelization to get strong baselines fast with sktime
- Reconciliation to get coherent forecasts
- Tuning with Optuna
- Scaling to advanced patterns and tuning
- Benchmarking

## Next:

- Global Forecasting
- Creating 2nd party libraries