# 0 Introduction

## 0.1 What is `sktime`?
* sktime is a python library for time-series learning tasks!
* If you are interested in a more in-depth introduction to sktime - check out a previous pydata tutorial of ours, and of course visit our website!
* We love new contributors. Even if you are new to open source software developement! Check out our website with some tips on how to get started.
* sktime is a scikit-learn (sklearn)-like library - a popular data science library! Why we like sklearn:
    * unified interface
    * modular design
    * parts are composable
    * simple specification language


To scikit learn-like estimators you need to do 3 things:

* Instantiate your model of choice
* Fit the instance of your model
* Use that fitted instance to predict new data!


## 0.2 What is forecasting?
In forecasting, past data is used to make temporal forward predictions of a time series. This is notably different from tabular prediction tasks supported by scikit-learn and similar libraries.

<img src="img/forecasting.png" width=750 />


sktime provides a common, scikit-learn-like interface to a variety of classical and ML-style forecasting algorithms, together with tools for building pipelines and composite machine learning models, including temporal tuning schemes, or reductions such as walk-forward application of scikit-learn regressors.



## 0.3 Agenda
1. Transformers and Forecasting in `sktime`
    *  Transformers
    *  Forecasting
2. Advanced Forecasting
    *  AutoML
    *  Graphical Pipelines
3. Evaluation and Benchmarking

In [None]:
import warnings

warnings.filterwarnings("ignore")

---

# 1. Transformers in `sktime`
* overview of transformer features

    * types of transformers - input types, output types
    * broadcasting/vectorization to panel, hierarchical, multivariate
    * searching for transformers using `all_estimators`

## 1.1 Wherefore transformers?

Tranformers in `sktime` referr to a catch-all term that encompses modular data processing steps.

We use this term in the `sklearn` sense, so this is unrelated to transformers in NLP or deep learning.

suppose we want to forecast this well-known dataset
(airline passengers by year in a fixed scope)

In [None]:
from sktime.datasets import load_airline
from sktime.utils.plotting import plot_series

y = load_airline()
plot_series(y)

observations:

* there is seasonal periodicity, 12 month period
* seasonal periodicity looks multiplicative (not additive) to trend

idea: forecast might be easier

* with seasonality removed
* on logarithmic value scale (multiplication becomes addition)

### 1.1.1 Manual tranformations: doing things the wrong way

Maybe doing this manually step by step is a good idea?

In [None]:
import numpy as np

y_log = np.log(y)

fig, ax = plot_series(y_log, title="log(y)")

this looks additive now!

ok, what next - deaseasonalization

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

seasonal_result = seasonal_decompose(y_log, period=12)
seasonal = seasonal_result.seasonal
y_log_deseasonalised = y_log - seasonal

fig, ax = plot_series(y_log_deseasonalised, title="log(y) - seasonality")

In [None]:
plot_series(trend)

In [None]:
plot_series(seasonal, resid, labels=["seasonal component", "residual component"])

now:

* forecast on this
* add back seasonal component
* invert logarithm (exponentiate)

start with forecast...

In [None]:
from sktime.forecasting.trend import TrendForecaster

forecaster = TrendForecaster()

fh = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] # Alternatively: list(range(1, 13))
y_pred = forecaster.fit_predict(y_log_deseasonalised, fh=fh)

fig, ax = plot_series(
    y_log_deseasonalised,
    y_pred,
    labels=["log(y) - seasonality", "y_pred"],
    title="Trend forecast over log(y) - seasonality",
)

looks reasonable!

Now to turn this into a forecast of the original y ...

* add seasonal
* invert the logarithm

In [None]:
y_pred_add_seasonality = y_pred + seasonal[0:12].values
y_pred_orig = np.exp(y_pred_add_seasonality)

fig, ax = plot_series(y, y_pred_orig, labels=["y", "y_pred"], title="Final forecast (manual approach)")

ok, done! and it only took us 10 years.

Maybe there is a better way?

### 1.1.2 `sktime` transformers: doing things the right way


Solution: use transformers & pipelines!

Same interface at every step! Easily composable!


In [None]:
from sktime.forecasting.trend import TrendForecaster
from sktime.transformations.series.boxcox import LogTransformer
from sktime.transformations.series.detrend import Deseasonalizer

y = load_airline()

forecaster = LogTransformer() * Deseasonalizer(sp=12) * TrendForecaster()

fh = list(range(1, 13))
y_pred = forecaster.fit_predict(y, fh=fh)

fig, ax = plot_series(y, y_pred, labels=["y", "y_pred"], title="Final forecast with sktime")

what happened here?

The "chain" operator `*` creates a "forecasting pipeline"

Has the same interface as all other forecasters! No additional data fiddling!

Transformers "slot in" as standardized components.

In [None]:
forecaster

Let's look at this in more detail:

* `sktime` transformers interface
* `sktime` pipeline building

## 1.2 Transformers - More Detailed

* transformer interface
* transformer types
* searching transformers by type
* broadcasting/vectorization to panel & hierarchical data
* transformers and pipelines

### 1.2.1 What are transformers? <a class="anchor" id="section_1_1"></a>

Transformer = modular data processing steps commonly used in machine learning

("transformer" used in the sense of `scikit-learn`)

Transformers are estimators that:

* are fitted to a batch of data via `fit(data)`, changing its state
* are applied to another batch of data via `transform(X)`, producing transformed data
* may have an `inverse_transform(X)`

In `sktime`, input `X` to `fit` and `transform` is typically a time series or a panel (collection of time series).

Basic use of an `sktime` time series transformer is as follows:

In [None]:
# 1. prepare the data
from sktime.utils._testing.series import _make_series

X = _make_series()
X_train = X[:7]
X_test = X[7:12]
# X_train and X_test are both pandas.Series

X_train, X_test

In [None]:
# 2. construct the transformer
from sktime.transformations.series.boxcox import BoxCoxTransformer

# trafo is an sktime estimator inheriting from BaseTransformer
# Box-Cox transform with lambda parameter fitted via mle
trafo = BoxCoxTransformer(method="mle")

In [None]:
# 3. fit the transformer to training data
trafo.fit(X_train)

# 4. apply the transformer to transform test data
# Box-Cox transform with lambda fitted on X_train
X_transformed = trafo.transform(X_test)

X_transformed

If the training and test set is the same, step 3 and 4 can be carried out more concisely (and sometimes more efficiently) by using `fit_transform`:

In [None]:
# 3+4. apply the transformer to fit and transform on the same data, X
X_transformed = trafo.fit_transform(X)

### 1.2.2 Different types of transformers <a class="anchor" id="section_1_2"></a>

`sktime` distinguishes different types of transformer, depending on the input type of `fit` and `transform`, and the output type of `transform`.

Common types of transformation in `sktime`:

| from | to | base class | examples (sci) | examples (`sktime`) |
| --- | --- | --- | --- | --- |
| time series | scalar features | `BaseTransformer` (`Primitives` output) | `tsfresh`, or 7-number-summary | `Catch22`,`SummaryTransformer` |
| time series | time series | `BaseTransformer` (`Series`, `instancewise`)  | detrending, smoothing, filtering, lagging | `Detrender`,`Differencer`, `Lag`, `Filter` |
| time series panel | also a panel | `BaseTransformer` (`Series` output)  | principal component projection | `PCATransformer`,`PaddingTransformer` |
| two feature vectors | a scalar | `BasePairwiseTransformer` | Euclidean distance, L1 distance | `ScipyDist`, `AggrDist`, `FlatDist` |
| two time series | a scalar | `BasePairwiseTransformerPanel` | DTW distance, alignment kernel | `DtwDist`, `EditDist` |

To illustrate the difference, we compare two transformers with different output:

* the Box-Cox transformer `BoxCoxTrannsformer`, which transforms a time series to a time series
* the summary transformer `SummaryTransformer`, which transforms a time series to scalars such as the mean


In [None]:
om sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.transformations.series.summarize import SummaryTransformer

y = load_airline()

boxcox = BoxCoxTransformer()
summary = SummaryTransformer()

In [None]:
# BoxCoxTransformer() produces a pd.Series
boxcox.fit_transform(y)

In [None]:
# SummaryTransformer() produces a (set of) scalar (values)
summary.fit_transform(y)

For time series transformers, the metadata tags describe the expected output of `transform`:

In [None]:
boxcox_trafo.get_tag("scitype:transform-output")

In [None]:
summary_trafo.get_tag("scitype:transform-output")

To find transformers, use `all_estimators` and filter by tags:

* `"scitype:transform-output"` - the output scitype. `Series` for time series, `Primitives` for primitive features (float, categories), `Panel` for collections of time series.
* `"scitype:transform-input"` - the input scitype. `Series` for time series.
* `"scitype:instancewise"` - If `True`, vectorized operation per series. If `False`, uses multiple time series non-trivially.

Example: find all transformers that output time series

In [None]:
from sktime.registry import all_estimators

# now subset to transformers that extract scalar features
all_estimators(
    "transformer",
    as_dataframe=True,
    filter_tags={"scitype:transform-output": "Series"},
    suppress_import_stdout=False,
)

A more complete overview on transformer types and tags is given in the `sktime` transformers tutorial.


### 1.2.3 Broadcasting aka vectorization of transformers <a class="anchor" id="section_1_3"></a>

`sktime` transformers may be natively univariate, or apply only to a single time series.

Even if this is the case, they broadcast across variables and instances of time series, where applicable (als known as vectorization in `numpy` parlance).

This ensures that all `sktime` transformers can be applied to multivariate and multi-instance (panel, hierarchical) time series data.

Example 1: broadcasting/vectorization of time series to time series transformer

The `BoxCoxTransformer` from previous sections applies to single instances of univariate time series. When multiple instances or variables are seen, it broadcasts across both:

In [None]:
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.utils._testing.hierarchical import _make_hierarchical

# hierarchical data with 2 variables and 2 levels
X = _make_hierarchical(n_columns=2)

X

In [None]:
# constructing the transformers
boxcox_trafo = BoxCoxTransformer(method="mle")

# applying to X results in hierarchical data
boxcox_trafo.fit_transform(X)

Fitted model components of vectorized transformers can be found in the `transformers_` attribute, or accessed via the universal `get_fitted_params` interface:

In [None]:
boxcox_trafo.transformers_
# this is a pandas.DataFrame that contains the fitted transformers
# one per time series instance and variable

In [None]:
boxcox_trafo.get_fitted_params()
# this returns a dictionary
# the transformers DataFrame is available at the key "transformers"
# individual transformers are available at dataframe-like keys
# it also contains all fitted lambdas as keyed parameters

Example 2: broadcasting/vectorization of time series to scalar features transformer

The `SummaryTransformer` behaves similarly.
Multiple time series instances are transformed to different columns of the resulting data frame.

In [None]:
from sktime.transformations.series.summarize import SummaryTransformer

summary_trafo = SummaryTransformer()

# this produces a pandas DataFrame with more rows and columns
# rows correspond to different instances in X
# columns are multiplied and names prefixed by [variablename]__
# there is one column per variable and transformed feature
summary_trafo.fit_transform(X)

### 1.3 Sequential Pipelines, Combining Forecasters, and Feature Engineering

`sktime` transformers can be pipelined with any other `sktime` estimator type, including forecasters, classifiers, and other transformers.

Pipelines = estimators of the same type, same interface as specialized class

pipeline build operation: `make_pipeline` or via `*` dunder

Pipelining `pipe = trafo * est` produces `pipe` of same type as `est`.

In `pipe.fit`, first `trafo.fit_transform`, then `est.fit` is executed on the result.

In `pipe.predict`, first `trafo.transform`, then `est.predict` is executed.

(the arguments that are piped differ by type and can be looked up in the docstrings of pipeline classes, or specialized tutorials)


transformers are natural pipeline components

* data processing steps
* feature engineering steps
* post processing steps

they can be combined in a number of other ways:

* pipelining = sequential chaining
* feature union = parallel, addition of features
* feature subsetting = selecting columns
* inversion = switch transform and inverse
* multiplexing = switching between transformers
* passthrough = switch on/ off

### 1.3.1 Chaining transformers via `*`

In [None]:
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.summarize import SummaryTransformer

pipe = Differencer() * SummaryTransformer()

# this constructs a TransformerPipeline, which is also a transformer
pipe

In [None]:
from sktime.utils._testing.hierarchical import _bottom_hier_datagen

X = _bottom_hier_datagen(no_levels=1, no_bottom_nodes=2)

# this is a transformer with the same interface
# first applies differencer, then summary transform
pipe.fit_transform(X)

compatible with sklearn transformers!

default applies sklearn transformer per individual time series as a data frame table

In [None]:
from sklearn.preprocessing import StandardScaler

pipe = Differencer() * StandardScaler()

pipe

In [None]:
pipe.fit_transform(X)

pipeline-adaptor chains can be constructed manually:

* `sktime.transformations.compose.TransformerPipeline`
* `sktime.transformations.series.adapt.TabularToSeriesAdaptor` for `sklearn`

composites are compatible with `get_params` / `set_params` parameter interface:

In [None]:
pipe.get_params()

### 1.3.2 Feature union via `+`

In [None]:
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Differencer() + Lag()

# this constructs a FeatureUnion, which is also a transformer
pipe

In [None]:
from sktime.utils._testing.hierarchical import _bottom_hier_datagen

X = _bottom_hier_datagen(no_levels=1, no_bottom_nodes=2)

# applies both Differencer and Lag, returns transformed in different columns
pipe.fit_transform(X)

to retain the original columns, use the `Id` transformer:

In [None]:
from sktime.transformations.compose import Id
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Id() + Differencer() + Lag([1, 2], index_out="original")

pipe.fit_transform(X)

In [None]:
# parameter inspection
pipe.get_params()

### 1.3.3 Subset input columns via `[colname]`

let's say we want to apply `Differencer` to column 0, and `Lag` to column 1

also we keep the original columns for illustration

In [None]:
from sktime.utils._testing.hierarchical import _make_hierarchical

X = _make_hierarchical(
    hierarchy_levels=(2, 2), n_columns=2, min_timepoints=3, max_timepoints=3
)

X

In [None]:
from sktime.transformations.compose import Id
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Id() + Differencer()["c0"] + Lag([1, 2], index_out="original")["c1"]

pipe.fit_transform(X)

auto-generated names can be replaced by using `FeatureUnion` explicitly:

In [None]:
from sktime.transformations.compose import FeatureUnion

pipe = FeatureUnion(
    [
        ("original", Id()),
        ("diff", Differencer()["c0"]),
        ("lag", Lag([1, 2], index_out="original")),
    ]
)

pipe.fit_transform(X)

see more later in part 3 on how to use this with tuning for full structural AutoML!

### 1.3.4 Combining Transformers And Estimators (Example: forecaster pipeline)

we have seen this example above

In [None]:
from sktime.forecasting.trend import PolynomialTrendForecaster
from sktime.transformations.series.boxcox import LogTransformer
from sktime.transformations.series.detrend import Deseasonalizer

y = load_airline()

pipe = LogTransformer() * Deseasonalizer(sp=12) * PolynomialTrendForecaster(degree=2)

pipe

### 1.3.5 ColumnEnsemleTransformer and ColumnEnsembleForecaster

### 1.3.6 Forecasting Exogenous Variables



In [None]:
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.base import ForecastingHorizon
from sktime.utils.plotting import plot_series

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y=y, X=X, test_size=6)
fh = ForecastingHorizon(y_test.index, is_relative=False)

In [None]:
plot_series(y_train, y_test, labels=["y_train", "y_test"]);


In [None]:
X.head()


In [None]:
from sktime.forecasting.compose import ForecastX
from sktime.forecasting.var import VAR

forecaster_X = ForecastX(
    forecaster_y=AutoARIMA(sp=1, suppress_warnings=True),
    forecaster_X=VAR(),
)
forecaster_X.fit(y=y, X=X, fh=fh)
# now in predict() we don't need to pass X
y_pred = forecaster_X.predict(fh=fh)

In [None]:
# this is a forecaster with the same interface as Polynomial Trend Forecaster
pipe.fit(y, fh=[1, 2, 3])
y_pred = pipe.predict()

plot_series(y, y_pred)

## 1.4 Tuning
* Lot of hyperparameters in a pipeline. We want to optimise them:

### 1.4.1 Temporal Cross Validation

In `sktime` there are three different types of temporal cross-validation splitters avilable:
- `SingleWindowSplitter`, which is equivalent to a single train-test-split
- `SlidingWindowSplitter`, which is using a rolling window approach and "forgets" the oldest observations as we move more into the future
- `ExpandingWindowSplitter`, which is using a expanding window approach and keep all observations in the training set as we move more into the future


In [None]:
from sktime.datasets import load_shampoo_sales

y = load_shampoo_sales()
y_train, y_test = temporal_train_test_split(y=y, test_size=6)
plot_series(y_train, y_test, labels=["y_train", "y_test"]);

In [None]:
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import (
    ExpandingWindowSplitter,
    SlidingWindowSplitter,
    SingleWindowSplitter,
)
from sktime.utils.plotting import plot_windows

fh = ForecastingHorizon(y_test.index, is_relative=False).to_relative(
    cutoff=y_train.index[-1]
)

In [None]:
cv = SingleWindowSplitter(fh=fh, window_length=len(y_train) - 6)
plot_windows(cv=cv, y=y_train)

In [None]:
cv = SlidingWindowSplitter(fh=fh, window_length=12, step_length=1)
plot_windows(cv=cv, y=y_train)

In [None]:
cv = ExpandingWindowSplitter(fh=fh, initial_window=12, step_length=1)
plot_windows(cv=cv, y=y_train)

In [None]:
# get number of total splits (folds)
cv.get_n_splits(y=y_train)

### 1.4.2 Grid Search

For tuning parameters with compositions such as pipelines, we can use the \<estimator\>__\<parameter\> syntax known from [scikit-learn](https://scikit-learn.org/stable/modules/grid_search.html#composite-estimators-and-parameter-spaces). For multiple levels of nesting, we can use the same syntax with two underscores, e.g. `forecaster__transformer__parameter`.



In [None]:
from sklearn.preprocessing import PowerTransformer, RobustScaler, MinMaxScaler
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.transformations.series.adapt import TabularToSeriesAdaptor
from sktime.transformations.series.detrend import Deseasonalizer, Detrender

forecaster = TransformedTargetForecaster(
    steps=[
        ("detrender", Detrender()),
        ("deseasonalizer", Deseasonalizer()),
        ("minmax", TabularToSeriesAdaptor(MinMaxScaler((1, 10)))),
        ("power", TabularToSeriesAdaptor(PowerTransformer())),
        ("scaler", TabularToSeriesAdaptor(RobustScaler())),
        ("forecaster", ExponentialSmoothing()),
    ]
)

# using dunder notation to access inner objects/params as in sklearn
param_grid = {
    # deseasonalizer
    "deseasonalizer__model": ["multiplicative", "additive"],
    # power
    "power__transformer__method": ["yeo-johnson", "box-cox"],
    "power__transformer__standardize": [True, False],
    # forecaster
    "forecaster__sp": [4, 6, 12],
    "forecaster__seasonal": ["add", "mul"],
    "forecaster__trend": ["add", "mul"],
    "forecaster__damped_trend": [True, False],
}

gscv = ForecastingGridSearchCV(
    forecaster=forecaster,
    param_grid=param_grid,
    cv=cv,
    n_jobs=-1,
    verbose=1,
    scoring=MeanSquaredError(square_root=True),  # set custom scoring function
)
gscv.fit(y_train)
y_pred = gscv.predict(fh=fh)

In [None]:
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"]);


In [None]:
gscv.best_params_


In [None]:
gscv.best_forecaster_


In [None]:
gscv.cv_results_.head()

## 1.5 Summary<a class="anchor" id="chapter5"></a>

* transformers are data processing steps with unified interface - `fit`, `transform`, and optional `inverse_transform`

* used as pipeline components for any learning task, forecasting, classification

* different types by input/output - time series, primitives, pairs of time series, panels/hierarchical.

* find transformers by tags such as `scitype:transform-output` and `scitype:instancewise` using `all_estimators`

* rich composition syntax - `*` for pipe, `+` for featureunion, `[in, out]` for variable subset, `|` for multiplex/switch

* `sktime` provides easy-to-use extension templates for transformers, build your own, plug and play

## 1.6 Appendix - cheat sheets and extension guie

### dunders glossary

| Type | Dunder | Meaning | `sktime` class |
| --- | --- | --- | --- |
| compose | `*` | chaining/pipeline - also works with other estimator types | type dependent |
| compose | `**` | chaining to secondary input of another estimator | type dependent |
| compose | `+` | feature union | `FeatureUnion` |
| interface | `~` | invert | `InvertTransform` |
| structural | `¦` | multiplexing ("switch") | type dependent |
| structural | `-` | optional passthrough ("on/off") | `OptionalPassthrough` |

### selected useful transformers, compositors, adapters

* delay fitting to `transform` via `sktime.transformations.compose.FitInTransform`
* any `pandas` method via `sktime.transformations.compose.adapt.PandasTransformAdaptor`
* date/time features via `sktime.transformations.series.date.DateTimeFeatures`
* lags via `transformations.series.lag.Lag`
* differences, first and n-th, via `transformations.series.difference.Differencer`
* scaled logit via `transformations.series.scaledlogit.ScaledLogitTransform`

### Extension guide - implementing your own transformer<a class="anchor" id="chapter4"></a>

`sktime` is meant to be easily extensible, for direct contribution to `sktime` as well as for local/private extension with custom methods.

To extend `sktime` with a new local or contributed transformer, a good workflow to follow is:

1. read through the [transformer extension template](https://github.com/alan-turing-institute/sktime/blob/main/extension_templates/transformer.py) - this is a `python` file with `todo` blocks that mark the places in which changes need to be added.
2. optionally, if you are planning any major surgeries to the interface: look at the [base class architecture](https://github.com/alan-turing-institute/sktime/blob/main/sktime/transformations/base.py) - note that "ordinary" extension (e.g., new algorithm) should be easily doable without this.
3. copy the transformer extension template to a local folder in your own repository (local/private extension), or to a suitable location in your clone of the `sktime` or affiliated repository (if contributed extension), inside `sktime.transformations`; rename the file and update the file docstring appropriately.
4. address the "todo" parts. Usually, this means: changing the name of the class, setting the tag values, specifying hyper-parameters, filling in `__init__`, `_fit`, `_transform`, and optional methods such as `_inverse_transform` or `_update` (for details see the extension template). You can add private methods as long as they do not override the default public interface. For more details, see the extension template.
5. to test your estimator manually: import your estimator and run it in the worfklows in Section 2.2; then use it in the compositors in Section 2.3.
6. to test your estimator automatically: call `sktime.tests.test_all_estimators.check_estimator` on your estimator. You can call this on a class or object instance. Ensure you have specified test parameters in the `get_test_params` method, according to the extension template.

In case of direct contribution to `sktime` or one of its affiliated packages, additionally:
* add yourself as an author to the code, and to the `CODEOWNERS` for the new estimator file(s).
* create a pull request that contains only the new estimators (and their inheritance tree, if it's not just one class), as well as the automated tests as described above.
* in the pull request, describe the estimator and optimally provide a publication or other technical reference for the strategy it implements.
* before making the pull request, ensure that you have all necessary permissions to contribute the code to a permissive license (BSD-3) open source project.

---

### Credits: notebook 2 - transformers

notebook creation: fkiraly

transformer pipelines & compositors: fkiraly, mloning, miraep8\
forecaster pipelines: fkiraly, aiwalter\
classifier/regressor pipelines: fkiraly\
transformer base interface: mloning, fkiraly\
dunder interface: fkiraly, miraep8

Based on design ideas: sklearn, magrittr, mlr, mlj