# Pipeline examples for graph pipeline design

Purpose: minimal set of examples that the graph pipeline API should cover.

A more extensive introduction to pipelines in sklearn can be found inthe  `03_transformers` tutorial.

In [None]:
import numpy as np

Note: for the formalism below, we are using simple composition notation for functions.

In deviation from mathematical convention and in line with sktime python dunder convention,
the function that is applied first is on the left.

That is, `a * b` means "first apply `a` to the input, then `b` to the output of `a`",
rather than the other way round.

## 1 Simple pipelines

These are most similar to sklearn:

* pipe.fit = A.fit * A.transform * B.fit
* pipe.end = A.transform * B.end

### 1.1 Transformer Pipeline

In [None]:
from sktime.datasets import load_longley

y, X = load_longley()

In [None]:
from sktime.transformations.series.exponent import ExponentTransformer
from sktime.transformations.series.boxcox import BoxCoxTransformer

trafo_pipe = ExponentTransformer() * BoxCoxTransformer()

In [None]:
trafo_pipe.fit(X)
trafo_pipe.transform(X)

### 1.2 Classifier Pipeline

In [None]:
from sktime.datasets import load_arrow_head

X, y = load_arrow_head(split="train", return_X_y=True)

In [None]:
from sktime.transformations.series.exponent import ExponentTransformer
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier

clf_pipe = ExponentTransformer() * KNeighborsTimeSeriesClassifier()

In [None]:
clf_pipe.fit(X, y)
clf_pipe.predict(X)

## 2 Foreceasting pipelines

Forecasters have two args in `fit` to which transformers can be applied - `X` and `y`.

Further, if a transformer is applied to `y`, it needs to be inverted after prediction.

### 2.1 Endogeneous transform pipeline

* pipe.fit(y, X) = fcst.fit(y = [trafo.fit * trafo.transform] (y), X)
* pipe.predict = fcst.predict * trafo.inverse_transform
* pipe.predict_other = fcst.predict_other * trafo.inverse_transform

In [None]:
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

In [None]:
from sktime.forecasting.sarimax import SARIMAX
from sktime.transformations.series.difference import Differencer

pipe = Differencer() * SARIMAX()

In [None]:
pipe.fit(y=y_train, X=X_train, fh=[1, 2, 3, 4])
pipe.predict(X=X_test)

In [None]:
pipe.predict_interval(X=X_test)

### 2.2 Exogeneous transform pipeline

* pipe.fit(y, X) = fcst.fit(y = y, X = [trafo.fit * trafo.transform] (X))
* pipe.predict = fcst.predict(X = trafo.transform(X))
* pipe.predict_other = fcst.predict_other(X = trafo.transform(X))

In [None]:
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

In [None]:
from sktime.forecasting.sarimax import SARIMAX
from sktime.transformations.series.exponent import ExponentTransformer

pipe = ExponentTransformer() ** SARIMAX()

In [None]:
pipe.fit(y=y_train, X=X_train, fh=[1, 2, 3, 4])
pipe.predict(X=X_test)

In [None]:
pipe.predict_interval(X=X_test)

### 2.3 Combined pipeline

In [None]:
from sktime.datasets import load_longley
from sktime.forecasting.model_selection import temporal_train_test_split

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

In [None]:
from sktime.forecasting.sarimax import SARIMAX
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.exponent import ExponentTransformer

pipe = Differencer() * ExponentTransformer() ** SARIMAX()

In [None]:
pipe.fit(y=y_train, X=X_train, fh=[1, 2, 3, 4])
pipe.predict(X=X_test)

## 3 Feature union, subsetting

feature union to create features from multiple transformers:

In [None]:
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Differencer() + Lag()

# this constructs a FeatureUnion, which is also a transformer
pipe

In [None]:
from sktime.utils._testing.hierarchical import _bottom_hier_datagen

X = _bottom_hier_datagen(no_levels=1, no_bottom_nodes=2)

# applies both Differencer and Lag, returns transformed in different columns
pipe.fit_transform(X)

In [None]:
from sktime.transformations.compose import Id
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Id() + Differencer() + Lag([1, 2], index_out="original")

pipe.fit_transform(X)

(and this can be pipelined with forecasters etc)

column subsetting:

In [None]:
from sktime.utils._testing.hierarchical import _make_hierarchical

X = _make_hierarchical(
    hierarchy_levels=(2, 2), n_columns=2, min_timepoints=3, max_timepoints=3
)

In [None]:
from sktime.transformations.compose import Id
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Id() + Differencer()["c0"] + Lag([1, 2], index_out="original")["c1"]

pipe.fit_transform(X)