# sktime - A Unified Framework for Machine Learning with Time Series

Tutorial at the PyData Global 2021

Find out more at: https://github.com/alan-turing-institute/sktime

---
## Hierarchical time series

In typical business use cases, time series often present in hierarchical from.

### Data

<img src="./img/hierarchytree.png" width="1200" alt="arrow heads">

Examples include:
* Product sales in different categories (e.g. M5 time series competition)
* Tourism demand in different regions
* Balance sheet structures across cost centers / accounts

Many hierarchical time series datasets can be found here:
https://forecastingdata.org/
(SKTIME is working on a loader to easily access data in this repo)


For literature see also:
https://otexts.com/fpp2/hierarchical.html

The above example shows a clean hierarchy tree leading from the top level strictly
downwards to lower level branches. In practice, we can also often see a more complicated 
aggregation structure where the product hierarchy and the geographic hierarchy can 
both be used together. 

<img src="./img/hierarchytree_grouped.png" width="1200" alt="arrow heads">


# How does sktime represent hierarchical data?
# General intro to sktime datatypes

`sktime` provides modules for a number of time series related learning tasks.

These modules use `sktime` specific in-memory (i.e., python workspace) representations for time series and related objects, most importantly individual time series and time series panels. `sktime`'s in-memory representations rely on `pandas` and `numpy`, with additional conventions on the `pandas` and `numpy` object.

Users of `sktime` should be aware of these representations, since presenting the data in an `sktime` compatible representation is usually the first step in using any of the `sktime` modules.

This notebook introduces the data types used in `sktime`, related functionality such as converters and validity checkers, and common workflows for loading and conversion:

**Section 1** introduces in-memory data containers used in `sktime`, with examples.

**Section 2** introduces validity checkers and conversion functionality for in-memory data containers.

**Section 3** introduces common workflows to load data from file formats

In [None]:
# import to retrieve examples
from sktime.datatypes import get_examples

### Section 1.1.1: Time series - the `"pd.DataFrame"` mtype

In the `"pd.DataFrame"` mtype, time series are represented by an in-memory container `obj: pandas.DataFrame` as follows.

* structure convention: `obj.index` must be monotonous, and one of `Int64Index`, `RangeIndex`, `DatetimeIndex`, `PeriodIndex`.
* variables: columns of `obj` correspond to different variables
* variable names: column names `obj.columns`
* time points: rows of `obj` correspond to different, distinct time points
* time index: `obj.index` is interpreted as a time index.
* capabilities: can represent multivariate series; can represent unequally spaced series

In [None]:
get_examples(mtype="pd.DataFrame", as_scitype="Series")[0]

Example of a bivariate series in `"pd.DataFrame"` representation.
This series has two variables, named `"a"` and `"b"`. Both are observed at the same four time points 0, 1, 2, 3.

### Workflow

1. Model specification
2. Fitting
3. Prediction
4. Evaluation

In [None]:
get_examples(mtype="pd.DataFrame", as_scitype="Series")[1]

Unnamed: 0,a,b
0,1.0,3.0
1,4.0,7.0
2,0.5,2.0
3,-3.0,-0.428571


### Section 1.2: Time series panels - the `"Panel"` scitype

The major representations of time series panels in `sktime` are:

* `"pd-multiindex"` - a `pandas.DataFrame`, with row multi-index (`instances`, `timepoints`), cols = variables
* `"numpy3D"` - a 3D `np.ndarray`, with axis 0 = instances, axis 1 = variables, axis 2 = time points
* `"df-list"` - a `list` of `pandas.DataFrame`, with list index = instances, data frame rows = time points, data frame cols = variables

These representations are considered primary representations in `sktime` and are core to internal computations.

There are further, minor representations of time series panels in `sktime`:

* `"nested_univ"` - a `pandas.DataFrame`, with `pandas.Series` in cells. data frame rows = instances, data frame cols = variables, and series axis = time points
* `"numpyflat"` - a 2D `np.ndarray` with rows = instances, and columns indexed by a pair index of (variables, time points). This format is only being converted to and cannot be converted from (since number of variables and time points may be ambiguous).
* `"pd-wide"` - a `pandas.DataFrame` in wide format: has column multi-index (variables, time points), rows = instances; the "variables" index can be omitted for univariate time series
* `"pd-long"` - a `pandas.DataFrame` in long format: has cols `instances`, `timepoints`, `variable`, `value`; entries in `value` are indexed by tuples of values in (`instances`, `timepoints`, `variable`).

The minor representations are currently not fully consolidated in-code and are not discussed further below. Contributions are appreciated.

### Section 1.2.1: Time series panels - the `"pd-multiindex"` mtype

In the `"pd-multiindex"` mtype, time series panels are represented by an in-memory container `obj: pandas.DataFrame` as follows.

* structure convention: `obj.index` must be a pair multi-index of type `(RangeIndex, t)`, where `t` is one of `Int64Index`, `RangeIndex`, `DatetimeIndex`, `PeriodIndex` and monotonous. `obj.index` must have name `("instances", "timepoints")`.
* instances: rows with the same `"instances"` index correspond to the same instance; rows with different `"instances"` index correspond to different instances.
* instance index: the first element of pairs in `obj.index` is interpreted as an instance index. 
* variables: columns of `obj` correspond to different variables
* variable names: column names `obj.columns`
* time points: rows of `obj` with the same `"timepoints"` index correspond correspond to the same time point; rows of `obj` with different `"timepoints"` index correspond correspond to the different time points.
* time index: the second element of pairs in `obj.index` is interpreted as a time index. 
* capabilities: can represent panels of multivariate series; can represent unequally spaced series; can represent panels of unequally supported series; cannot represent panels of series with different sets of variables.

Example of a panel of multivariate series in `"pd-multiindex"` mtype representation.
The panel contains three multivariate series, with instance indices 0, 1, 2. All series have two variables with names `"var_0"`, `"var_1"`. All series are observed at three time points 0, 1, 2.

In [None]:
get_examples(mtype="pd-multiindex", as_scitype="Panel")[0]

### Section 1.2.2: Hierarchical types
* structure convention: `obj.index` must be a  multi-index of type `(RangeIndex, t)`, where `t` is one of `Int64Index`, `RangeIndex`, `DatetimeIndex`, `PeriodIndex` and monotonous. `obj.index` must multi column names, which the last column being `(""timepoints")`.
* instances: rows with the same `"instances"` index correspond to the same instance; rows with different `"instances"` index correspond to different instances.
* instance index: the first element of pairs in `obj.index` is interpreted as an instance index. 
* variables: columns of `obj` correspond to different variables
* variable names: column names `obj.columns`
* time points: rows of `obj` with the same `"timepoints"` index correspond correspond to the same time point; rows of `obj` with different `"timepoints"` index correspond correspond to the different time points.
* time index: the second element of pairs in `obj.index` is interpreted as a time index. 
* capabilities: can represent panels of multivariate series; can represent unequally spaced series; can represent panels of unequally supported series; cannot represent panels of series with different sets of variables.

In [6]:
"""Example generation for testing.
Example of two-dimensional hierarchy
"""

import pandas as pd

###
# example 0: multivariate, equally sampled

cols = ["instances_0", "instances_1", "timepoints"] + [f"var_{i}" for i in range(2)]

Xlist = [
    pd.DataFrame(
        [["a", 0, 0, 1, 4], ["a", 0, 1, 2, 5], ["a", 0, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["a", 1, 0, 1, 4], ["a", 1, 1, 2, 55], ["a", 1, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["a", 2, 0, 1, 42], ["a", 2, 1, 2, 5], ["a", 2, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["b", 0, 0, 1, 4], ["b", 0, 1, 2, 5], ["b", 0, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["b", 1, 0, 1, 4], ["b", 1, 1, 2, 55], ["b", 1, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["b", 2, 0, 1, 42], ["b", 2, 1, 2, 5], ["b", 2, 2, 3, 6]], columns=cols
    ),
]

X = pd.concat(Xlist)
X = X.set_index(["instances_0", "instances_1", "timepoints"])
display(X)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,var_0,var_1
instances_0,instances_1,timepoints,Unnamed: 3_level_1,Unnamed: 4_level_1
a,0,0,1,4
a,0,1,2,5
a,0,2,3,6
a,1,0,1,4
a,1,1,2,55
a,1,2,3,6
a,2,0,1,42
a,2,1,2,5
a,2,2,3,6
b,0,0,1,4


In [None]:
X.index.get_level_values(level=-1)

### Hierarchical forecasting
* data with hierarchical format can be forecast based on one of the different panel
mtypes
* forecasts that are generated this way are independent of each other
* if we add up these forecasts, we get consistent forecasts that sum up across
dimensions.

In [None]:
from sktime.datatypes import check_is_mtype, convert
from sktime.forecasting.arima import ARIMA
from sktime.utils._testing.hierarchical import _make_hierarchical
from sktime.utils._testing.panel import _make_panel_X
n_instances = 10
PANEL_MTYPES = ["pd-multiindex", "nested_univ", "numpy3D"]
HIER_MTYPES = ["pd_multiindex_hier"]

y = _make_panel_X(n_instances=n_instances, random_state=42)
y = convert(y, from_type="nested_univ", to_type=PANEL_MTYPES[0])

y_pred = ARIMA().fit(y).predict([1, 2, 3])
valid, _, metadata = check_is_mtype(y_pred, PANEL_MTYPES[0], return_metadata=True)



ModuleNotFoundError: ARIMA requires package 'pmdarima' in python environment to be instantiated, but 'pmdarima' was not found. 'pmdarima' is a soft dependency and not included in the base sktime installation. Please run: `pip install pmdarima` to install the pmdarima package. To install all soft dependencies, run: `pip install sktime[all_extras]`

### Hierarchical forecasting - challenges
* The above approach is called "bottom up reconciliation"
* There exist other reconciliation approaches 


### Global forecasting vs Univariate Forecasting



"Many businesses nowadays rely on large quantities of time series data making time 
series forecasting an important research area. Global forecasting models [that] are trained
 **across sets of time series** have shown huge potential in providing accurate forecasts compared
 with the univariate forecasting models that work on isolated series."

<img src="./img/flow.png" width="400" alt="arrow heads">

https://robjhyndman.com/publications/monash-forecasting-data/

 Why does global forecasting matter?
 * In practice, we often have time series of limited range
 * Estimation is difficult, and we cannot model complex dependencies
 * Assumption of global forecasting: We can observe the identical data generating process (DGP) multiple times
 * Non-identical DGPs can be fine too, as long as the degree of dissimilarity is captured by exogeneous information
 * Now we have much more information and can estimate more reliably and more complex models (caveat: unless complexity is purely driven by time dynamics)
 
As a result of these advantages, global forecasting models have been very successful in competition, e.g.
* Rossmann Store Sales
* Walmart Sales in Stormy Weather
* M5 competition

Many business problems in practice are essentially global forecasting problem - often also reflecting hierarchical information (see above)
* Product sales in different categories (e.g. M5 time series competition)
* Balance sheet structures across cost centers / accounts
* Dynamics of pandemics observed at different points in time

Distinction to multivariate forecasting
* Multivariate forecasting focuses on modeling interdependence between time series
* Global can model interdependence, but focus lies on enhancing observation space

Implementation in sktime
* Multivariate forecasting models are supported in sktime via ? VAR...* 
* Global forecasting

For the following example we will use the `"pd-multiindex"` representation of the `"Panel"` scitype discussed in Section 1.2. 

In that case, `"instances"` is the unique identifier for the individual time series, while `"timepoints"` captures the time index.

In [101]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline

from sktime.forecasting.compose import ForecastingPipeline, make_reduction
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.transformations.series.summarize import WindowSummarizer
from sktime.transformations.series.date import DateTimeFeatures


pd.options.mode.chained_assignment = None
pd.set_option("display.max_columns", None)

# %%
# Load M5 Data and prepare
y = pd.read_pickle("global_fc/y.pkl")
X = pd.read_pickle("global_fc/X.pkl")


The dataset is based on the M5 competition. The data features sales of products in 
different stores, different states and different product categories. 

For a detailed analysis of the competition please take a look at the paper 
"M5 accuracy competition: Results, findings, and conclusions".


https://doi.org/10.1016/j.ijforecast.2021.11.013


You can see 
a glimpse of the data here:

In [52]:
print(y.head())
print(X.head())

                            y
instances timepoints         
1         2016-03-15   756.67
          2016-03-16   679.13
          2016-03-17   633.40
          2016-03-18  1158.04
          2016-03-19   914.24
                      dept_id  cat_id  store_id  state_id  event_name_1  \
instances timepoints                                                      
1         2016-03-15        1       1        10         3             1   
          2016-03-16        1       1        10         3             1   
          2016-03-17        1       1        10         3             7   
          2016-03-18        1       1        10         3             1   
          2016-03-19        1       1        10         3             1   

                      event_type_1  event_name_2  event_type_2  snap  \
instances timepoints                                                   
1         2016-03-15             1             1             1     3   
          2016-03-16             1             1


The data set consists out of time series grouped via the instances argument in the first column
of the multiindex. We will focus on modeling individual products. The hierarchical 
information is provided as exgoneous information. 

For the M5 competition, winning solution used exogeneous features about the hierarchies like `"dept_id"`, `"store_id"` etc. to capture similarities and dissimilarities of the products. Other features include holiday events and snap days (specific assisstance program of US social security paid on certain days).

We can split into test and train set using temporal_train_test_split. SKTIME supports 
splitting instances for time series data, i.e. to cut every instance of the time series individually. Splitting is as simple as this:


In [102]:
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
display(y_train.head(5))
display(y_test.head(5))

Unnamed: 0_level_0,Unnamed: 1_level_0,y
instances,timepoints,Unnamed: 2_level_1
1,2016-03-15,756.67
1,2016-03-16,679.13
1,2016-03-17,633.4
1,2016-03-18,1158.04
1,2016-03-19,914.24


Unnamed: 0_level_0,Unnamed: 1_level_0,y
instances,timepoints,Unnamed: 2_level_1
1,2016-04-14,874.57
1,2016-04-15,895.29
1,2016-04-16,1112.63
1,2016-04-17,1014.86
1,2016-04-18,691.91


SKTIME will make sure that both y and X are split in the same way, and also preserve the structure of the hierarchies.

### Rationale for tree based models

Tree based models possess some unique advantages and disadvantages when applied to the domain of time series.
They are able to exploit complex non-linear relationships / dependencies between time series and covariates, but also possess plenty of hyperparameters and -at least out of the box- cannot extrapolate.

In **univariate time series forecasting**, tree based models often do not have enough data to reliably train hyperparameters, and statistical models like ARIMA or ETS are often superior. 

But due to the abundance of data in **global forecasting** - the M5 competition contained 42,840 time series -  the advantages of those models can begin to shine.

SKTIME supports all major Python implementations of tree based models. In this example we will use a Random Forest Regressor.

In [103]:
regressor = make_pipeline(
    RandomForestRegressor(random_state=1),
)

One big caveat with tree based models is the fact they are not time series models per se and therefore do not understand concepts like autocorrelation or seasonalities. 

As a result, we need to generate appropriate features that capture the dynamics of the time series. In sktime we have the transformer  `"WindowSummarizer"` to capture that time dependence.

The `"WindowSummarizer"` can be used to generate features useful for time series forecasting based on a provided dictionary of functions, window shifts and window lengths.

See the following example for an application of `"WindowSummarizer"`:

In [None]:
import pandas as pd
from sktime.transformations.series.summarize import WindowSummarizer
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ForecastingPipeline
#y = load_airline()
kwargs = {
        "lag_feature": {
            "lag": [1],
            "mean": [[1, 3], [3, 6]],
            "std": [[1, 4]],
        }
    }

transformer = WindowSummarizer(**kwargs)
y_transformed = transformer.fit_transform(y_train)

display(y_transformed.head(10))

The notation `"mean": [[1, 3]]` (captured in the column `"y_mean_1_3"`) can be interpreted in the following way:

The summarization function `"mean": [[1, 3]]` is applied to the a window of length 3 lagged by one period when compare to the observation we want to forecast. This can be visualized in the following way (from the docs):

    For `window = [1, 3]`, we have a `lag` of 1 and `window_length` of 3 to target the three last days (exclusive z) that were observed. Summarization is done across windows like this:
    |-------------------------- |
    | x x x x x x x x * * * z x |
    |---------------------------|

By default, `"WindowSummarizer"` uses pandas rolling window functions to allow for a speedy generation of features. 
* "sum",
* "mean",
* "median",
* "std",
* "var",
* "kurt",
* "min",
* "max",
* "corr",
* "cov",
* "skew",
* "sem"

These functions are typically very fast since they are optimized for rolling, grouped operations. 

In the M5 competition, arguably the most relevant features were:

* **mean** calculations to capture level shifts, e.g. last week sales, sales of the week prior to the last month etc.
* **standard deviation** to capture increases / decreases in volatility in sales, and how it impacts future sales
* rolling **skewness** / **kurtosis** calculations, to capture changes in store sales tendencies.
* various different calculations to capture periods of zero sales (e.g. out of stock scenarios)

Only the first three calculations can be implemented using native pandas functions. You can, however, also provide any other arbitrary function to WindowSummarizer, either programmed by you, or from an external package. 

In this example, we will define the function `count_gt130` to count how many observations lie above the threshold of 130 within a window of length 3, lagged by 2 periods.

In [None]:
import numpy as np
def count_gt130(x):
    """Count how many observations lie above threshold 130."""
    return np.sum((x > 700)[::-1])

y = load_airline()
kwargs = {
        "lag_feature": {
            "lag": [1],
            count_gt130: [[2, 3]],
            "std": [[1, 4]],
        }
    }

transformer = WindowSummarizer(**kwargs)
y_transformed = transformer.fit_transform(y_train)

display(y_transformed.head(10))

Other arguments you can provide to `"WindowSummarizer"` are:

    n_jobs : int, optional (default=-1)
        The number of jobs to run in parallel for applying the window functions.
        ``-1`` means using all processors.
    target_cols: list of str, optional (default = None)
        Specifies which columns in X to target for applying the window functions.
        ``None`` will target the first column

In a global forecasting setting, you typically want to target primarly `y`. However, sometimes you also want to apply `"WindowSummarizer"` to some columns in `X`. Use 
`"target_cols"` to specify which columns and apply `"WindowSummarizer"` within a `"ForecastingPipeline"`.

In the M5 competition, lagging of exogeneous features was especially useful for lags around holiday dummies (often sales are affected for a few days before and after major holidays) as well as changes in item prices (discounts as well as persistent price changes)

In [None]:
from sktime.forecasting.naive import NaiveForecaster
from sktime.datasets import load_longley
y_ll, X_ll = load_longley()
y_train_ll, y_test_ll, X_train_ll, X_test_ll = temporal_train_test_split(y_ll, X_ll)
fh = ForecastingHorizon(X_test_ll.index, is_relative=False)
# Example transforming only X
pipe = ForecastingPipeline(
    steps=[
        ("a", WindowSummarizer(n_jobs=1, target_cols=["POP", "GNPDEFL"])),
        ("b", WindowSummarizer(n_jobs=1, target_cols=["GNP"], **kwargs)),
        ("forecaster", NaiveForecaster(strategy="drift")),
    ]
)
pipe_return = pipe.fit(y_train_ll, X_train_ll)
y_pred1 = pipe_return.predict(fh=fh, X=X_test_ll)
display(y_pred1)

1959    67075.727273
1960    67638.454545
1961    68201.181818
1962    68763.909091
Freq: A-DEC, dtype: float64

Global forecasting models are often very performance and resource intensive. To address that, we have implemented an en-bloc approach in sktime to directly compute 
the relevant features in a parallel way. You can use that functionality by passing the WindowSummarizer as a transformer within our make_reduction function. 

In this case, you do not need to set a window_length, since it will be inferred from the WindowSummarizer (taking a look at the features that goes furthest back into time.)

In [104]:
forecaster = make_reduction(
    regressor,
    scitype="tabular-regressor",
    transformers=[WindowSummarizer(**kwargs, n_jobs=1)],
    window_length=None,
    strategy="recursive",
)


As mentioned, concepts relating to calendar seasonalities are also not understood by tree based models and need to be provided by means of feature engineering. This relates for example to:
* day of the week effects historically observed for stock prices (prices on Fridays used to differ from Monday prices).
* used car prices being higher in spring than in summer
* spendings at the beginning of the month differing from end of month due to salary effects.


Calendar seasonalities can be modeled by means of dummy variables or fourier terms. As a rule of thumb, use dummy variables for discontinous effects and fourier terms when you believe there is a certain degree of smoothness in the seasonality.

SKTIME currently supports the generation of calendar dummy variables via the DateTimeFeatures transformer. You can either manually specify the desired seasonality or provide to DateTimeFeatures the base frequency of the time series (daily, weekly etc.) and the desired complexity (few vs many features) and DateTimeFeatures will pick a set of sensible seasonalities. SKTIME will support fourier terms in a future release.

In [76]:
transformer = DateTimeFeatures(ts_freq="D")
X_hat = transformer.fit_transform(X_train)

new_cols = [i for i in X_hat if not i in X_train.columns]
display(X_hat[new_cols])

Unnamed: 0_level_0,Unnamed: 1_level_0,year,month,weekday
instances,timepoints,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,2016-03-15,2016,3,1
1,2016-03-16,2016,3,2
1,2016-03-17,2016,3,3
1,2016-03-18,2016,3,4
1,2016-03-19,2016,3,5
1,2016-03-20,2016,3,6
1,2016-03-21,2016,3,0
1,2016-03-22,2016,3,1
1,2016-03-23,2016,3,2
1,2016-03-24,2016,3,3


DateTimeFeatures supports the following frequencies:
* Y - year
* Q - quarter
* M - month
* W - week
* D - day
* H - hour
* T - minute
* S - second
* L - millisecond

You can specify the manual generation of dummy features with the notation e.g. "day_of_month", "day_of_week", "week_of_quarter". 

In [86]:
transformer = DateTimeFeatures(manual_selection=["week_of_month", "day_of_quarter"])
X_hat = transformer.fit_transform(X_train)

new_cols = [i for i in X_hat if not i in X_train.columns]
display(X_hat[new_cols])

Unnamed: 0_level_0,Unnamed: 1_level_0,day_of_quarter,week_of_month
instances,timepoints,Unnamed: 2_level_1,Unnamed: 3_level_1
1,2016-03-15,75,3
1,2016-03-16,76,3
1,2016-03-17,77,3
1,2016-03-18,78,3
1,2016-03-19,79,3
1,2016-03-20,80,3
1,2016-03-21,81,3
1,2016-03-22,82,4
1,2016-03-23,83,4
1,2016-03-24,84,4


### Putting it all together

Using the `"WindowSummarizer"`, `"DateTimeFeatures"` and the `"make_reduction"` function we can now set up a working example of a an end to end global forecasting pipeline based on a sample of the M5 competition data:

In [107]:
# fh = ForecastingHorizon(X_test.index, is_relative=False)
pipe = ForecastingPipeline(
    steps=[
        ("event_dynamics", WindowSummarizer(n_jobs=-1, **kwargs, target_cols=["event_type_1","event_type_2"])),
        ("snap_dynamics", WindowSummarizer(n_jobs=-1, target_cols=["snap"])),
        ("daily_season", DateTimeFeatures(ts_freq="D")),
        ("forecaster", forecaster),
    ]
)

#display(X_train)
pipe_return = pipe.fit(y_train, X_train)
y_pred1 = pipe_return.predict(fh=1, X=X_test)
display(y_pred1)

  if not hasattr(x, "freq") or x.freq is None:
  by *= x.freq
  cutoff = _coerce_to_period(cutoff, freq=cutoff.freqstr)


Unnamed: 0_level_0,Unnamed: 1_level_0,y
instances,timepoints,Unnamed: 2_level_1
1,2016-03-15,756.67
2,2016-03-15,1901.15


### Road ahead

The major steps needed for global forecasting have been implemented, but a key challenge remains. 

<img src="./img/road_ahead.PNG" width="1000" alt="road ahead">

Due to the complexity of time series feature generation, **cross validation** for tree based models is still a very manual issue requiring expertise and often just trial and error approaches.

We intend to integrate the following global forecasting features into the next SKTIME releases:
* Automated tuning strategy:
    * complexity argument taking into account the frequency / length of time series and internal heuristic to generate appropriate features
    * autodetect approraite features based on autocorrelation function and seasonsality detection
    * Using feature importances to identify relevant features


* Strategies for extrapolation 
    * Trend removal and addition
    * Postprocessing based on other models that recognize trends
    * Multivariate trend removal to detect / remove common trends

* Quantile Regression 
    * Tree based models were not only winning solution in M5 for point forecasts, but also quantile regression

## Contributions welcome!

blab lambda

---

### Hierarchical reconciliation

forecast reconciliation = ensuring that linear hierarchy dependencies are met,\
e.g., "sum of individual shop sales in Berlin must equal sum of total sales in Berlin"\
requires hierarchical (or panel) data, usually involves totals

sktime provides functionality for reconciliation:

* data container convention for node-wise aggregates
* functionality to compute node-wise aggregates - `Aggregator`
* transformer implementing reconiliation logic - `Reconciler`

#### The node-wise aggregate data format

`sktime` uses a special case of the `pd_multiindex_hier` format to store node-wise aggregates:

* a `__total` index element in an instance (non-time-like) level indicates summation over all instances below that level
* the `__total` index element is reserved and cannot be used for anything else
* entries below a `__total` index element are sums of entries over all other instances in the same levels where a `__total` element is found

example:

In [None]:
from utils import load_hier_total_example

load_hier_total_example()

#### The aggregation transformer

The node-wise aggregated format can be obtained by applying the `Aggregator` transformer.

In a pipeline with non-aggregate dinput, this allows making forecasts by totals.

In [None]:
from sktime.datatypes import get_examples

y_hier = get_examples("pd_multiindex_hier")[1]
y_hier

In [None]:
from sktime.transformations.hierarchical.aggregate import Aggregator

Aggregator().fit_transform(y_hier)

If used at the start of a pipeline, forecasts are made for node `__total`-s as well as individual instances.

Note: in general, this does not result in a reconciled forecast, i.e., forecast totals will not add up.

In [None]:
from sktime.forecasting.naive import NaiveForecaster

pipeline_to_forecast_totals = Aggregator() * NaiveForecaster()

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

If used at the end of a pipeline, forecasts are reconciled bottom-up.

That will result in a reconciled forecast, although bottom-up may not be the method of choice.

In [None]:
from sktime.forecasting.naive import NaiveForecaster

pipeline_to_forecast_totals = NaiveForecaster() * Aggregator()

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

#### Advanced reconciliation

Fur transformer-like reconciliation, use the `Reconciler`.
It supports advanced techniques such as OLS and WLS:

In [None]:
from sktime.transformations.hierarchical.reconcile import Reconciler

pipeline_with_reconciliation = Aggregator() * NaiveForecaster() * Reconciler(method="ols")

In [None]:
pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

Roadmap items:

* reconciliation of wrapper type
* reconciliation & global forecasting
* probabilistic reconciliation

---

### Credits

notebook creation: danbartl, fkiraly

hierarchical forecasting framework: ciaran-g, fkiraly\
reduction compatibility with hierarchical forecasting: danbartl\
window summarizer, reduction with transform-from-y: danbartl\
aggregation and reconciliation: ciaran-g