# sktime - A Unified Framework for Machine Learning with Time Series

Tutorial at the PyData Global 2021

Find out more at: https://github.com/alan-turing-institute/sktime

---
## Hierarchical time series

In typical business use cases, time series often present in hierarchical from.

### Data

<img src="./img/hierarchytree.png" width="1200" alt="arrow heads">

Examples include:
* Product sales in different categories (e.g. M5 time series competition)
* Tourism demand in different regions
* Balance sheet structures across cost centers / accounts

Many hierarchical time series datasets can be found here:
https://forecastingdata.org/
(SKTIME is working on a loader to easily access data in this repo)


For literature see also:
https://otexts.com/fpp2/hierarchical.html

The above example shows a clean hierarchy tree leading from the top level strictly
downwards to lower level branches. In practice, we can also often see a more complicated 
aggregation structure where the product hierarchy and the geographic hierarchy can 
both be used together. 

<img src="./img/hierarchytree_grouped.png" width="1200" alt="arrow heads">


# How does sktime represent hierarchical data?
# General intro to sktime datatypes

`sktime` provides modules for a number of time series related learning tasks.

These modules use `sktime` specific in-memory (i.e., python workspace) representations for time series and related objects, most importantly individual time series and time series panels. `sktime`'s in-memory representations rely on `pandas` and `numpy`, with additional conventions on the `pandas` and `numpy` object.

Users of `sktime` should be aware of these representations, since presenting the data in an `sktime` compatible representation is usually the first step in using any of the `sktime` modules.

This notebook introduces the data types used in `sktime`, related functionality such as converters and validity checkers, and common workflows for loading and conversion:

**Section 1** introduces in-memory data containers used in `sktime`, with examples.

**Section 2** introduces validity checkers and conversion functionality for in-memory data containers.

**Section 3** introduces common workflows to load data from file formats

In [None]:
# import to retrieve examples
from sktime.datatypes import get_examples

### Section 1.1.1: Time series - the `"pd.DataFrame"` mtype

In the `"pd.DataFrame"` mtype, time series are represented by an in-memory container `obj: pandas.DataFrame` as follows.

* structure convention: `obj.index` must be monotonous, and one of `Int64Index`, `RangeIndex`, `DatetimeIndex`, `PeriodIndex`.
* variables: columns of `obj` correspond to different variables
* variable names: column names `obj.columns`
* time points: rows of `obj` correspond to different, distinct time points
* time index: `obj.index` is interpreted as a time index.
* capabilities: can represent multivariate series; can represent unequally spaced series

In [None]:
get_examples(mtype="pd.DataFrame", as_scitype="Series")[0]

Example of a bivariate series in `"pd.DataFrame"` representation.
This series has two variables, named `"a"` and `"b"`. Both are observed at the same four time points 0, 1, 2, 3.

### Workflow

1. Model specification
2. Fitting
3. Prediction
4. Evaluation

In [None]:
get_examples(mtype="pd.DataFrame", as_scitype="Series")[1]

### Section 1.2: Time series panels - the `"Panel"` scitype

The major representations of time series panels in `sktime` are:

* `"pd-multiindex"` - a `pandas.DataFrame`, with row multi-index (`instances`, `timepoints`), cols = variables
* `"numpy3D"` - a 3D `np.ndarray`, with axis 0 = instances, axis 1 = variables, axis 2 = time points
* `"df-list"` - a `list` of `pandas.DataFrame`, with list index = instances, data frame rows = time points, data frame cols = variables

These representations are considered primary representations in `sktime` and are core to internal computations.

There are further, minor representations of time series panels in `sktime`:

* `"nested_univ"` - a `pandas.DataFrame`, with `pandas.Series` in cells. data frame rows = instances, data frame cols = variables, and series axis = time points
* `"numpyflat"` - a 2D `np.ndarray` with rows = instances, and columns indexed by a pair index of (variables, time points). This format is only being converted to and cannot be converted from (since number of variables and time points may be ambiguous).
* `"pd-wide"` - a `pandas.DataFrame` in wide format: has column multi-index (variables, time points), rows = instances; the "variables" index can be omitted for univariate time series
* `"pd-long"` - a `pandas.DataFrame` in long format: has cols `instances`, `timepoints`, `variable`, `value`; entries in `value` are indexed by tuples of values in (`instances`, `timepoints`, `variable`).

The minor representations are currently not fully consolidated in-code and are not discussed further below. Contributions are appreciated.

### Section 1.2.1: Time series panels - the `"pd-multiindex"` mtype

In the `"pd-multiindex"` mtype, time series panels are represented by an in-memory container `obj: pandas.DataFrame` as follows.

* structure convention: `obj.index` must be a pair multi-index of type `(RangeIndex, t)`, where `t` is one of `Int64Index`, `RangeIndex`, `DatetimeIndex`, `PeriodIndex` and monotonous. `obj.index` must have name `("instances", "timepoints")`.
* instances: rows with the same `"instances"` index correspond to the same instance; rows with different `"instances"` index correspond to different instances.
* instance index: the first element of pairs in `obj.index` is interpreted as an instance index. 
* variables: columns of `obj` correspond to different variables
* variable names: column names `obj.columns`
* time points: rows of `obj` with the same `"timepoints"` index correspond correspond to the same time point; rows of `obj` with different `"timepoints"` index correspond correspond to the different time points.
* time index: the second element of pairs in `obj.index` is interpreted as a time index. 
* capabilities: can represent panels of multivariate series; can represent unequally spaced series; can represent panels of unequally supported series; cannot represent panels of series with different sets of variables.

Example of a panel of multivariate series in `"pd-multiindex"` mtype representation.
The panel contains three multivariate series, with instance indices 0, 1, 2. All series have two variables with names `"var_0"`, `"var_1"`. All series are observed at three time points 0, 1, 2.

In [None]:
get_examples(mtype="pd-multiindex", as_scitype="Panel")[0]

### Section 1.2.2: Hierarchical types
* structure convention: `obj.index` must be a  multi-index of type `(RangeIndex, t)`, where `t` is one of `Int64Index`, `RangeIndex`, `DatetimeIndex`, `PeriodIndex` and monotonous. `obj.index` must multi column names, which the last column being `(""timepoints")`.
* instances: rows with the same `"instances"` index correspond to the same instance; rows with different `"instances"` index correspond to different instances.
* instance index: the first element of pairs in `obj.index` is interpreted as an instance index. 
* variables: columns of `obj` correspond to different variables
* variable names: column names `obj.columns`
* time points: rows of `obj` with the same `"timepoints"` index correspond correspond to the same time point; rows of `obj` with different `"timepoints"` index correspond correspond to the different time points.
* time index: the second element of pairs in `obj.index` is interpreted as a time index. 
* capabilities: can represent panels of multivariate series; can represent unequally spaced series; can represent panels of unequally supported series; cannot represent panels of series with different sets of variables.

In [None]:
"""Example generation for testing.
Example of two-dimensional hierarchy
"""

import pandas as pd

###
# example 0: multivariate, equally sampled

cols = ["instances_0", "instances_1", "timepoints"] + [f"var_{i}" for i in range(2)]

Xlist = [
    pd.DataFrame(
        [["a", 0, 0, 1, 4], ["a", 0, 1, 2, 5], ["a", 0, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["a", 1, 0, 1, 4], ["a", 1, 1, 2, 55], ["a", 1, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["a", 2, 0, 1, 42], ["a", 2, 1, 2, 5], ["a", 2, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["b", 0, 0, 1, 4], ["b", 0, 1, 2, 5], ["b", 0, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["b", 1, 0, 1, 4], ["b", 1, 1, 2, 55], ["b", 1, 2, 3, 6]], columns=cols
    ),
    pd.DataFrame(
        [["b", 2, 0, 1, 42], ["b", 2, 1, 2, 5], ["b", 2, 2, 3, 6]], columns=cols
    ),
]

X = pd.concat(Xlist)
X = X.set_index(["instances_0", "instances_1", "timepoints"])


In [None]:
X

In [None]:
X.index.get_level_values(level=-1)

### Hierarchical forecasting
* data with hierarchical format can be forecast based on one of the different panel
mtypes
* forecasts that are generated this way are independent of each other
* if we add up these forecasts, we get consistent forecasts that sum up across
dimensions.

In [None]:
from sktime.datatypes import check_is_mtype, convert
from sktime.forecasting.arima import ARIMA
from sktime.utils._testing.hierarchical import _make_hierarchical
from sktime.utils._testing.panel import _make_panel_X
n_instances = 10
PANEL_MTYPES = ["pd-multiindex", "nested_univ", "numpy3D"]
HIER_MTYPES = ["pd_multiindex_hier"]

y = _make_panel_X(n_instances=n_instances, random_state=42)
y = convert(y, from_type="nested_univ", to_type=PANEL_MTYPES[0])

y_pred = ARIMA().fit(y).predict([1, 2, 3])
valid, _, metadata = check_is_mtype(y_pred, PANEL_MTYPES[0], return_metadata=True)

### Hierarchical forecasting - challenges
* The above approach is called "bottom up reconciliation"
* There exist other reconciliation approaches 


### Global forecasting vs Univariate Forecasting



"Many businesses nowadays rely on large quantities of time series data making time 
series forecasting an important research area. Global forecasting models [that] are trained
 **across sets of time series** have shown huge potential in providing accurate forecasts compared
 with the univariate forecasting models that work on isolated series."

<img src="./img/flow.png" width="400" alt="arrow heads">

https://robjhyndman.com/publications/monash-forecasting-data/

 Why does global forecasting matter?
 * In practice, we often have time series of limited range
 * Estimation is difficult, and we cannot model complex dependencies
 * Assumption of global forecasting: We can observe the identical data generating process (DGP) multiple times
 * Non-identical DGPs can be fine too, as long as the degree of dissimilarity is captured by exogeneous information
 * Now we have much more information and can estimate more reliably and more complex models (caveat: unless complexity is purely driven by time dynamics)
 
As a result of these advantages, global forecasting models have been very successful in competition, e.g.
* Rossmann Store Sales
* Walmart Sales in Stormy Weather
* M5 competition

Many business problems in practice are essentially global forecasting problem - often also reflecting hierarchical information (see above)
* Product sales in different categories (e.g. M5 time series competition)
* Balance sheet structures across cost centers / accounts
* Dynamics of pandemics observed at different points in time

Distinction to multivariate forecasting
* Multivariate forecasting focuses on modeling interdependence between time series
* Global can model interdependence, but focus lies on enhancing observation space

Implementation in sktime
* Multivariate forecasting models are supported in sktime via ? VAR...* 
* Global forecasting

For the following example we will use the `"pd-multiindex"` representation of the `"Panel"` scitype discussed in Section 1.2. 

In that case, `"instances"` is the unique identifier for the individual time series, while `"timepoints"` captures the time index.

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline

from sktime.forecasting.compose import ForecastingPipeline, make_reduction
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.transformations.series.summarize import WindowSummarizer
from sktime.transformations.series.date import DateTimeFeatures


pd.options.mode.chained_assignment = None
pd.set_option("display.max_columns", None)

# %%
# Load M5 Data and prepare
y = pd.read_pickle("global_fc/y.pkl")
X = pd.read_pickle("global_fc/X.pkl")


The dataset is based on the M5 competition. The data features sales of products in 
different stores, different states and different product categories. 

For a detailed analysis of the competition please take a look at the paper 
"M5 accuracy competition: Results, findings, and conclusions".


https://doi.org/10.1016/j.ijforecast.2021.11.013


You can see 
a glimpse of the data here:

In [None]:
print(y.head())
print(X.head())


The data features time series grouped via the instances argument in the first column
of the multiindex. We will focus on modeling individual products. The hierarchical 
information is provided as exgoneous information. 

For the M5 competion, many approaches did not use hierachical reconciliation. They
did, however, use exogeneous like `"dept_id"`, `"store_id"` etc. to capture 
similarities and dissimilarities of the products. 

We can split into test and train set using temporal_train_test_split. SKTIME supports 
splitting instances time series data, i.e. the cut every group individually. Splitting
is a s simple as cutting this:


In [None]:
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

In univariate forecasting, tree based models often do not have have been shown to be great candidates for global forecasting. 
Often, not enough data is available to reliably train the abundance of hyperparameters in tree based models. 
This is typically not true for global forecasting models, so the advantages of those models - being able to exploit non-linear relationship  / dependence between and within time series and covariates can begin to shine.

SKTIME supports all major Python implementations of tree based models. In this example we will use a Random Forest Regressor.

In [None]:
regressor = make_pipeline(
    RandomForestRegressor(random_state=1),
)

However, one big caveat with tree based models is the fact they are not time series models per se and therefore out of the box do not understand concepts like autocorrelation or other time series concepts. Therefore, we need to generate appropriate features that capture the dynamics of the time series. In sktime we have the transformer  `"WindowSummarizer"` to capture that time dependence.

The `"WindowSummarizer"` transforms input series to features based on a provided dictionary of window summarizer, window shifts
and window lengths.

        The summarization function is applied to the window consisting of * and
        potentially z.

        For `window = [1, 3]`, we have a `lag` of 1 and
        `window_length` of 3 to target the three last days (exclusive z) that were
        observed. Summarization is done across windows like this:
        |-------------------------- |
        | x x x x x x x x * * * z x |
        |---------------------------|

See the following example for an application of `"WindowSummarizer"`:

In [None]:
import pandas as pd
from sktime.transformations.series.summarize import WindowSummarizer
from sktime.datasets import load_airline, load_longley
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ForecastingPipeline
from sktime.forecasting.model_selection import temporal_train_test_split
y = load_airline()
kwargs = {
        "lag_feature": {
            "lag": [1],
            "mean": [[1, 3], [3, 6]],
            "std": [[1, 4]],
        }
    }

transformer = WindowSummarizer(**kwargs)
y_transformed = transformer.fit_transform(y)

display(y_transformed.head(10))

By default, `"WindowSummarizer"` uses pandas rolling window functions to allow for a speedy generation of features. 
* "sum",
* "mean",
* "median",
* "std",
* "var",
* "kurt",
* "min",
* "max",
* "corr",
* "cov",
* "skew",
* "sem"

These functions are typically very fast since they are optimized for rolling, grouped operations. 

In the M5 competition, arguably the most relevant features were:

* rolling mean calculations to capture level shifts, e.g. last week sales, sales of the week prior to the last month etc.
* rolling standard deviation to capture increases / decreases in volatility in sales, and how it impacts future sales
* rolling skewness / kurtosis calculations, to capture changes in store sales tendencies.
* various different calculations to capture periods of zero sales (e.g. out of stock scenarios)

Only the first three calculations can be implemented using native pandas functions. You can, however, also provide any other arbitrary function to WindowSummarizer, either programmed by you, or from an external package. In this example, we will provide the function count_gt130 to count how many observations lie above the threshold of 130 within a window of length 3, lagged by 2 periods.

In [None]:
import numpy as np
def count_gt130(x):
    """Count how many observations lie above threshold 130."""
    return np.sum((x > 130)[::-1])

y = load_airline()
kwargs = {
        "lag_feature": {
            "lag": [1],
            count_gt130: [[3, 2]],
            "std": [[1, 4]],
        }
    }

transformer = WindowSummarizer(**kwargs)
y_transformed = transformer.fit_transform(y)

display(y_transformed.head(10))

Other arguments you can provide to `"WindowSummarizer"` are:

    n_jobs : int, optional (default=-1)
        The number of jobs to run in parallel for applying the window functions.
        ``-1`` means using all processors.
    target_cols: list of str, optional (default = None)
        Specifies which columns in X to target for applying the window functions.
        ``None`` will target the first column

In a global forecasting setting, you typically want to target primarly `y`. However, sometimes you also want to apply `"WindowSummarizer"` to some columns in `X`. Use 
`"target_cols"` to specify which columns and apply `"WindowSummarizer"` within a `"ForecastingPipeline"`.

In the M5 competition, lagging of exogeneous features was especially useful for lags around holiday dummies (often sales are affected for a few days before and after major holidays) as well as changes in item prices (discounts as well as persitent price changes)

In [None]:
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
fh = ForecastingHorizon(X_test.index, is_relative=False)
# Example transforming only X
pipe = ForecastingPipeline(
    steps=[
        ("a", WindowSummarizer(n_jobs=-1, target_cols=["POP", "GNPDEFL"])),
        ("b", WindowSummarizer(n_jobs=-1, target_cols=["GNP"], **kwargs)),
        ("forecaster", NaiveForecaster(strategy="drift")),
    ]
)
pipe_return = pipe.fit(y_train, X_train)
y_pred1 = pipe_return.predict(fh=fh, X=X_test)

Global forecasting models are often very performance and resource intensive. To address that, we have implemented an en-bloc approach in sktime to directly compute 
the relevant features in a parallel way. You can use that functionality by passing the WindowSummarizer as a transformer within our make_reduction function

In [None]:
#make_reduction introb

In [None]:
#DateTimeFeatures explanation

In [None]:
forecaster = make_reduction(
    regressor,
    scitype="tabular-regressor",
    transformers=[WindowSummarizer(**kwargs, n_jobs=1)],
    window_length=None,
    strategy="recursive",
)


You can then use this forecaster just like you would continue for univariate forecasting

In [None]:
forecaster.fit(y_train, fh=[1, 2])
y_pred2 = forecaster.predict(fh=[1, 2, 12])

In [None]:
### Forecasting Pi  peline
    #How to chain 
bin    #Why does it makes sense

In [None]:
# fh = ForecastingHorizon(X_test.index, is_relative=False)
pipe = ForecastingPipeline(
    steps=[
        ("a", WindowSummarizer(n_jobs=-1, target_cols=["event_type_1", "snap"])),
        ("forecaster", forecaster),
    ]
)


pipe_return = pipe.fit(y_train, X_train)
y_pred1 = pipe_return.predict(fh=1, X=X_test)

a=0


blab lambda

---

### Hierarchical reconciliation

forecast reconciliation = ensuring that linear hierarchy dependencies are met,\
e.g., "sum of individual shop sales in Berlin must equal sum of total sales in Berlin"\
requires hierarchical (or panel) data, usually involves totals

sktime provides functionality for reconciliation:

* data container convention for node-wise aggregates
* functionality to compute node-wise aggregates - `Aggregator`
* transformer implementing reconiliation logic - `Reconciler`

#### The node-wise aggregate data format

`sktime` uses a special case of the `pd_multiindex_hier` format to store node-wise aggregates:

* a `__total` index element in an instance (non-time-like) level indicates summation over all instances below that level
* the `__total` index element is reserved and cannot be used for anything else
* entries below a `__total` index element are sums of entries over all other instances in the same levels where a `__total` element is found

example:

In [None]:
from utils import load_hier_total_example

load_hier_total_example()

#### The aggregation transformer

The node-wise aggregated format can be obtained by applying the `Aggregator` transformer.

In a pipeline with non-aggregate dinput, this allows making forecasts by totals.

In [None]:
from sktime.datatypes import get_examples

y_hier = get_examples("pd_multiindex_hier")[1]
y_hier

In [None]:
from sktime.transformations.hierarchical.aggregate import Aggregator

Aggregator().fit_transform(y_hier)

If used at the start of a pipeline, forecasts are made for node `__total`-s as well as individual instances.

Note: in general, this does not result in a reconciled forecast, i.e., forecast totals will not add up.

In [None]:
from sktime.forecasting.naive import NaiveForecaster

pipeline_to_forecast_totals = Aggregator() * NaiveForecaster()

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

If used at the end of a pipeline, forecasts are reconciled bottom-up.

That will result in a reconciled forecast, although bottom-up may not be the method of choice.

In [None]:
from sktime.forecasting.naive import NaiveForecaster

pipeline_to_forecast_totals = NaiveForecaster() * Aggregator()

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

In [None]:
pipeline_to_forecast_totals

#### Advanced reconciliation

Fur transformer-like reconciliation, use the `Reconciler`.
It supports advanced techniques such as OLS and WLS:

In [None]:
from sktime.transformations.hierarchical.reconcile import Reconciler

pipeline_with_reconciliation = Aggregator() * NaiveForecaster() * Reconciler(method="ols")

In [None]:
pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

Roadmap items:

* reconciliation of wrapper type
* reconciliatoin & global forecasting
* probabilistic reconciliation