### Overview of this notebook

* why transformers? transformers in `sktime`

    * transformers = modular data processing steps
    * simple pipeline example & transformer explained

* overview of transformer features

    * types of transformers - input types, output types
    * broadcasting/vectorization to panel, hierarchical, multivariate
    * searching for transformers using `all_estimators`

In [None]:
# for easy local use of this notebook
from os import sys
sys.path.append("..")

In [None]:
import warnings
warnings.filterwarnings('ignore')

---

# 2. Transformers in `sktime`

## 2.1 Wherefore transformers?

or: why sktime transformers will improve your life!

(disclaimer: not the same product as deep learning transformers)

suppose we want to forecast this well-known dataset
(airline passengers by year in a fixed scope)

In [None]:
from sktime.datasets import load_airline
from sktime.utils.plotting import plot_series

y = load_airline()
plot_series(y)

observations:

* there is seasonal periodicity, 12 month period
* seasonal periodicity looks multiplicative (not additive) to trend

idea: forecast might be easier

* with seasonality removed
* on logarithmic value scale (multiplication becomes addition)

### Naive approach - don't do this at home!

Maybe doing this manually step by step is a good idea?

In [None]:
import numpy as np

# compute the logarithm
logy = np.log(y)

plot_series(logy)

this looks additive now!

ok, what next - deaseasonalization

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

# apply this to y
# wait no, to logy

seasonal_result = seasonal_decompose(logy, period=12)

trend = seasonal_result.trend
resid = seasonal_result.resid
seasonal = seasonal_result.seasonal

In [None]:
plot_series(trend)

In [None]:
plot_series(seasonal, resid, labels=["seasonal component", "residual component"])

ok, now the forecast!

... of what ??

ah yes, residual plus trend, because seasonal just repeats itself

In [None]:
# forecast this:
plot_series(trend + resid)

In [None]:
# this has nans??
trend

In [None]:
# ok, forecast this instead then:
y_to_forecast = logy - seasonal

# phew, no nans!
y_to_forecast

In [None]:
from sktime.forecasting.trend import PolynomialTrendForecaster

f = PolynomialTrendForecaster(degree=2)
f.fit(y_to_forecast, fh=list(range(1, 13)))
y_fcst = f.predict()

plot_series(y_to_forecast, y_fcst)

looks reasonable!

Now to turn this into a forecast of the original y ...

* add seasonal
* invert the logarithm

In [None]:
y_fcst

In [None]:
y_fcst_orig = y_fcst + seasonal[0:12]
y_fcst_orig_orig = np.exp(y_fcst_orig)

y_fcst_orig_orig

ok, that did not work. Something something pandas indices??

In [None]:
y_fcst_orig = y_fcst + seasonal[0:12].values
y_fcst_orig_orig = np.exp(y_fcst_orig)

plot_series(y, y_fcst_orig_orig)

ok, done! and it only took us 10 years.

Maybe there is a better way?

### Slightly less naive approach - use `sktime` transformers (badly)

Ok, surely there is a way where I don't have to fiddle with wildly varying interfaces of every step.

Solution: use transformers!

Same interface at every step!

In [None]:
from sktime.forecasting.trend import PolynomialTrendForecaster
from sktime.transformations.series.boxcox import LogTransformer
from sktime.transformations.series.detrend import Deseasonalizer


y = load_airline()

t_log = LogTransformer()
ylog = t_log.fit_transform(y)

t_deseason = Deseasonalizer(sp=12)
y_deseason = t_deseason.fit_transform(ylog)

f = PolynomialTrendForecaster(degree=2)
f.fit(y_deseason, fh=list(range(1,13)))
y_fcst = f.predict()

hm, but now we need to invert the transformations...

fortunately transformers have an inverse transform, standard interface point

In [None]:
y_fcst_orig = t_deseason.inverse_transform(y_fcst)
# the deseasonalizer remembered the seasonality component! nice!

y_fcst_orig_orig = t_log.inverse_transform(y_fcst_orig)

plot_series(y, y_fcst_orig_orig)

### Expert approach - use `sktime` transformers with pipelines!

Bragging rights included.

In [None]:
from sktime.forecasting.trend import PolynomialTrendForecaster
from sktime.transformations.series.boxcox import LogTransformer
from sktime.transformations.series.detrend import Deseasonalizer

y = load_airline()

f = LogTransformer() * Deseasonalizer(sp=12) * PolynomialTrendForecaster(degree=2)

f.fit(y, fh=list(range(1,13)))
y_fcst = f.predict()

plot_series(y, y_fcst)

what happened here?

The "chain" operator `*` creates a "forecasting pipeline"

Has the same interface as all other forecasters! No additional data fiddling!

Transformers "slot in" as standardized components.

In [None]:
f

Let's look at this in more detail:

* `sktime` transformers interface
* `sktime` pipeline building

### 1.1 What are transformers? <a class="anchor" id="section_1_1"></a>

Transformer = modulari data processing steps commonly used in machine learning

("transformer" used in the sense of `scikit-learn`)

Transformers are estimators that:

* are fitted to a batch of data via `fit(data)`, changing its state
* are applied to another batch of data via `transform(X)`, producing transformed data
* may have an `inverse_transform(X)`

In `sktime`, input `X` to `fit` and `transform` is typically a time series or a panel (collection of time series).

Basic use of an `sktime` time series transformer is as follows:

In [None]:
# 1. prepare the data
from sktime.utils._testing.series import _make_series

X = _make_series()
X_train = X[:30]
X_test = X[30:]
# X_train and X_test are both pandas.Series

X_train, X_test

In [None]:
# 2. construct the transformer
from sktime.transformations.series.boxcox import BoxCoxTransformer

# trafo is an sktime estimator inheriting from BaseTransformer
# Box-Cox transform with lambda parameter fitted via mle
trafo = BoxCoxTransformer(method="mle")

In [None]:
# 3. fit the transformer to training data
trafo.fit(X_train)

# 4. apply the transformer to transform test data
# Box-Cox transform with lambda fitted on X_train
X_transformed = trafo.transform(X_test)

X_transformed

If the training and test set is the same, step 3 and 4 can be carried out more concisely (and sometimes more efficiently) by using `fit_transform`:

In [None]:
# 3+4. apply the transformer to fit and transform on the same data, X
X_transformed = trafo.fit_transform(X)

### 1.2 Different types of transformers <a class="anchor" id="section_1_2"></a>

`sktime` distinguishes different types of transformer, depending on the input type of `fit` and `transform`, and the output type of `transform`.

Transformers differ by:

* making use of an additional `y` argument in `fit` or `transform`
* whether the input to `fit` and `transform` is a single time series, a collection of time series, or scalar values (data frame row)
* whether the output of `transform` is a single time series, a collection of time series, or scalar values (data frame row)
* whether the input to `fit` and `transform` are one object or two. Two objects as input and a scalar output means the transformer is a distance or kernel function.

More detail on this is given in [Section 2](#chapter2).

To illustrate the difference, we compare two transformers with different output:

* the Box-Cox transformer `BoxCoxTrannsformer`, which transforms a time series to a time series
* the summary transformer `SummaryTransformer`, which transforms a time series to scalars such as the mean


In [None]:
# constructing the transformer
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.transformations.series.summarize import SummaryTransformer
from sktime.utils._testing.series import _make_series

# getting some data
# this is one pandas.Series
X = _make_series(n_timepoints=10)

# constructing the transformers
boxcox_trafo = BoxCoxTransformer(method="mle")
summary_trafo = SummaryTransformer()

In [None]:
# this produces a pandas Series
boxcox_trafo.fit_transform(X)

In [None]:
# this produces a pandas.DataFrame row
summary_trafo.fit_transform(X)

For time series transformers, the metadata tags describe the expected output of `transform`:

In [None]:
boxcox_trafo.get_tag("scitype:transform-output")

In [None]:
summary_trafo.get_tag("scitype:transform-output")

To find transformers, use `all_estimators` and filter by tags:

* `"scitype:transform-output"` - the output scitype. `Series` for time series, `Primitives` for primitive features (float, categories), `Panel` for collections of time series.
* `"scitype:transform-input"` - the input scitype. `Series` for time series.
* `"scitype:instancewise"` - If `True`, vectorized operation per series. If `False`, uses multiple time series non-trivially.

Example: find all transformers that output time series

In [None]:
from sktime.registry import all_estimators

# now subset to transformers that extract scalar features
all_estimators(
    "transformer",
    as_dataframe=True,
    filter_tags={"scitype:transform-output": "Series"},
)

A more complete overview on transformer types and tags is given in the `sktime` transformers tutorial.


### 1.3 Broadcasting aka vectorization of transformers <a class="anchor" id="section_1_3"></a>

`sktime` transformers may be natively univariate, or apply only to a single time series.

Even if this is the case, they broadcast across variables and instances of time series, where applicable (als known as vectorization in `numpy` parlance).

This ensures that all `sktime` transformers can be applied to multivariate and multi-instance (panel, hierarchical) time series data.

Example 1: broadcasting/vectorization of time series to time series transformer

The `BoxCoxTransformer` from previous sections applies to single instances of univariate time series. When multiple instances or variables are seen, it broadcasts across both:

In [None]:
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.utils._testing.hierarchical import _make_hierarchical

# hierarchical data with 2 variables and 2 levels
X = _make_hierarchical(n_columns=2)

X

In [None]:
# constructing the transformers
boxcox_trafo = BoxCoxTransformer(method="mle")

# applying to X results in hierarchical data
boxcox_trafo.fit_transform(X)

Fitted model components of vectorized transformers can be found in the `transformers_` attribute, or accessed via the universal `get_fitted_params` interface:

In [None]:
boxcox_trafo.transformers_
# this is a pandas.DataFrame that contains the fitted transformers
# one per time series instance and variable

In [None]:
boxcox_trafo.get_fitted_params()
# this returns a dictionary
# the transformers DataFrame is available at the key "transformers"
# individual transformers are available at dataframe-like keys
# it also contains all fitted lambdas as keyed parameters

Example 2: broadcasting/vectorization of time series to scalar features transformer

The `SummaryTransformer` behaves similarly.
Multiple time series instances are transformed to different columns of the resulting data frame.

In [None]:
from sktime.transformations.series.summarize import SummaryTransformer

summary_trafo = SummaryTransformer()

# this produces a pandas DataFrame with more rows and columns
# rows correspond to different instances in X
# columns are multiplied and names prefixed by [variablename]__
# there is one column per variable and transformed feature
summary_trafo.fit_transform(X)

### 1.4 Transformers as pipeline components <a class="anchor" id="section_1_4"></a>

`sktime` transformers can be pipelined with any other `sktime` estimator type, including forecasters, classifiers, and other transformers.

Pipelines = estimators of the same type, same interface as specialized class

pipeline build operation: `make_pipeline` or via `*` dunder

Pipelining `pipe = trafo * est` produces `pipe` of same type as `est`.

In `pipe.fit`, first `trafo.fit_transform`, then `est.fit` is executed on the result.

In `pipe.predict`, first `trafo.transform`, then `est.predict` is executed.

(the arguments that are piped differ by type and can be looked up in the docstrings of pipeline classes, or specialized tutorials)

#### Example 1: forecaster pipeline

we have seen this example above

In [None]:
from sktime.forecasting.trend import PolynomialTrendForecaster
from sktime.transformations.series.boxcox import LogTransformer
from sktime.transformations.series.detrend import Deseasonalizer

y = load_airline()

pipe = LogTransformer() * Deseasonalizer(sp=12) * PolynomialTrendForecaster(degree=2)

pipe

In [None]:
# this is a forecaster with the same interface as Polynomial Trend Forecaster
pipe.fit(y, fh=[1, 2, 3])
y_pred = pipe.predict()

plot_series(y, y_pred)

#### Example 2: classifier pipeline

works the same with classifiers or other estimator types!

In [None]:
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.transformations.series.exponent import ExponentTransformer

pipe = ExponentTransformer() * KNeighborsTimeSeriesClassifier()

# this constructs a ClassifierPipeline, which is also a classifier
pipe

In [None]:
from sktime.datasets import load_unit_test

X_train, y_train = load_unit_test(split="TRAIN")
X_test, _ = load_unit_test(split="TEST")

# this is a forecaster with the same interface as knn-classifier
# first applies exponent transform, then knn-classifier
pipe.fit(X_train, y_train)
pipe.predict(X_test)

## 2.2 Combining transformers, feature engineering

transformers are natural pipeline components

* data processing steps
* feature engineering steps
* post processing steps

they can be combined in a number of other ways:

* pipelining = sequential chaining
* feature union = parallel, addition of features
* feature subsetting = selecting columns
* inversion = switch transform and inverse
* multiplexing = switching between transformers
* passthrough = switch on/ off

### Chaining transformers via `*`

In [None]:
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.summarize import SummaryTransformer

pipe = Differencer() * SummaryTransformer()

# this constructs a TransformerPipeline, which is also a transformer
pipe

In [None]:
from sktime.utils._testing.hierarchical import _bottom_hier_datagen

X = _bottom_hier_datagen(no_levels=1, no_bottom_nodes=2)

# this is a transformer with the same interface
# first applies differencer, then summary transform
pipe.fit_transform(X)

compatible with sklearn transformers!

default applies sklearn transformer per individual time series as a data frame table

In [None]:
from sklearn.preprocessing import StandardScaler

pipe = Differencer() * StandardScaler()

pipe

In [None]:
pipe.fit_transform(X)

pipeline-adaptor chains can be constructed manually:

* `sktime.transformations.compose.TransformerPipeline`
* `sktime.transformations.series.adapt.TabularToSeriesAdaptor` for `sklearn`

composites are compatible with `get_params` / `set_params` parameter interface:

In [None]:
pipe.get_params()

### Feature union via `+`

In [None]:
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Differencer() + Lag()

# this constructs a FeatureUnion, which is also a transformer
pipe

In [None]:
from sktime.utils._testing.hierarchical import _bottom_hier_datagen

X = _bottom_hier_datagen(no_levels=1, no_bottom_nodes=2)

# applies both Differencer and Lag, returns transformed in different columns
pipe.fit_transform(X)

to retain the original columns, use the `Id` transformer:

In [None]:
from sktime.transformations.compose import Id
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Id() + Differencer() + Lag([1, 2], index_out="original")

pipe.fit_transform(X)

In [None]:
# parameter inspection
pipe.get_params()

### Subset input columns via `[colname]`

let's say we want to apply `Differencer` to column 0, and `Lag` to column 1

also we keep the original columns for illustration

In [None]:
from sktime.utils._testing.hierarchical import _make_hierarchical

X = _make_hierarchical(
    hierarchy_levels=(2, 2), n_columns=2, min_timepoints=3, max_timepoints=3
)

X

In [None]:
from sktime.transformations.compose import Id
from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Id() + Differencer()["c0"] + Lag([1, 2], index_out="original")["c1"]

pipe.fit_transform(X)

auto-generated names can be replaced by using `FeatureUnion` explicitly:

In [None]:
from sktime.transformations.compose import FeatureUnion

pipe = FeatureUnion(
    [
        ("original", Id()),
        ("diff", Differencer()["c0"]),
        ("lag", Lag([1, 2], index_out="original")),
    ]
)

pipe.fit_transform(X)

### turning log transform into exp transform via invert `~`

In [None]:
import numpy as np

from sktime.transformations.series.boxcox import LogTransformer

log = LogTransformer()

exp = ~log

# this behaves like an "e to the power of" transformer now
exp.fit_transform(np.array([1, 2, 3]))

### autoML structure compositors: multiplexer switch `¦` and on/off switch `-`

expose decisions as parameter

* do we want differencer *or* lag? for tuning later
* do we want [differencer and lag] or [original features and lag] ? for tuning later

In [None]:
# differencer or lag

from sktime.transformations.series.difference import Differencer
from sktime.transformations.series.lag import Lag

pipe = Differencer() | Lag()

pipe.get_params()

the `selected_transformer` parameter exposes the choice:

does this behave as `Lag` or `Differencer`?

In [None]:
# switch = Lag -> this is a Lag transformer now!
pipe.set_params(selected_transformer="Lag")

In [None]:
# switch = Lag -> this is a Differencer now!
pipe.set_params(selected_transformer="Differencer")

similar, on/off switch with `~`

same as multiplexer between wrapped transformer and `Id`

In [None]:
optional_differencer = - Differencer()

# this behaves as Differencer now
optional_differencer

In [None]:
# this is now just the identity transformer
optional_differencer.set_params(passthrough=True)

see more later in part 3 on how to use this with tuning for full structural AutoML!

### dunders glossary

| Type | Dunder | Meaning | `sktime` class |
| --- | --- | --- | --- |
| compose | `*` | chaining/pipeline - also works with other estimator types | type dependent |
| compose | `**` | chaining to secondary input of another estimator | type dependent |
| compose | `+` | feature union | `FeatureUnion` |
| interface | `~` | invert | `InvertTransform` |
| structural | `¦` | multiplexing ("switch") | type dependent |
| structural | `-` | optional passthrough ("on/off") | `OptionalPassthrough` |

### selected useful transformers, compositors, adapters

* delay fitting to `transform` via `sktime.transformations.compose.FitInTransform`
* any `pandas` method via `sktime.transformations.compose.adapt.PandasTransformAdaptor`
* date/time features via `sktime.transformations.series.date.DateTimeFeatures`
* lags via `transformations.series.lag.Lag`
* differences, first and n-th, via `transformations.series.difference.Differencer`
* scaled logit via `transformations.series.scaledlogit.ScaledLogitTransform`

### Transformer type glossary

Common types of transformation in `sktime`:

| from | to | base class | examples (sci) | examples (`sktime`) |
| --- | --- | --- | --- | --- |
| time series | scalar features | `BaseTransformer` (`Primitives` output) | `tsfresh`, or 7-number-summary | `Catch22`, `SummaryTransformer` |
| time series | time series | `BaseTransformer` (`Series`, `instancewise`)  | detrending, smoothing, filtering, lagging | `Detrender`, `Differencer`, `Lag`, `Filter` |
| time series panel | also a panel | `BaseTransformer` (`Series` output)  | principal component projection | `PCATransformer`, `PaddingTransformer` |
| two feature vectors | a scalar | `BasePairwiseTransformer` | Euclidean distance, L1 distance | `ScipyDist`, `AggrDist`, `FlatDist` |
| two time series | a scalar | `BasePairwiseTransformerPanel` | DTW distance, alignment kernel | `DtwDist`, `EditDist` |

first three = "time series transformers", or, simply, "transformers"

all "transformers" follow the same base interface.

"pairwise transformers" have separate base interface (due to two inputs)

include distances and kernels between time series or feature vectors

all inherit `BaseObject` and follow unified `skbase` interface with `get_params`, `get_fitted_params`, etc

## 2.3 Summary<a class="anchor" id="chapter5"></a>

* `sktime` comes with transformation algorithms (or transformers), all of which share a common interface. The interface is fully interoperable with the `scikit-learn` interface.

* Transformers exist in several categories: series being transformed to series; series being transformed to primitive features (floats, categories); pairwise transformers where pairs of series or vectors are transformed to a float output, such as distance functions and kernel functions.

* Transformers are typically used as components of other algorithms across learning tasks, for instance as feature extraction steps in pipelines, or as distances in a distance-based classification algorithm. Composition using `sktime` transformers is fully modular.

* `sktime` provides easy-to-use extension templates for all the above.