# Advanced extension patterns in `sktime`

### Overview of this notebook

* using the advanced extension templates

    * example: forecaster with update and proba functionality
    * a closer look at tags, internal data formats
    * hierarchical data, automated vectorization
    * example: min-max scaler, but across multiple series
    * composite estimators
    * example: MA-of-transformed-data


* automated testing

    * using `check_estimator` as part of a test suite
    * using `sktime` test classes

In [None]:
import warnings

warnings.filterwarnings('ignore')

### `sktime` tags explained

all estimators have tags.

three types of tags:

* capability tags, e.g., "has `predict_proba`"
* property and type tags, e.g., "is a tree-based method", or "outputs series"
* behavioural tags, e.g., "instruction: convert to `numpy` for inner `_fit` method"

Tag related methods, via `BaseObject`:
* `get_tags`, `get_tag` - retrieve tag values
* `set_tags`, `clone_tags` - set tags, *developer use only* for implementing estimators

Tag values may depend on estimator parameter values!

In [None]:
# example: tags of ARIMA forecaster
from sktime.forecasting.arima import ARIMA

ARIMA().get_tags()

**Forecaster tags:**

Capability tags:
* `scitype:y`: which y are fine? univariate/multivariate/both
* `handles-missing-data`: can the estimator handle missing data? boolean, True or False
* `capability:pred_int`: does forecaster implement proba forecasts? boolean, True or False.

Property and type tags:
* `ignores-exogeneous-X`: does the estimator ignore the exogeneous X? boolean, True or False
* `requires-fh-in-fit`: is the forecasting horizon already required in fit? boolean, True or False.  
* `X-y-must-have-same-index`: can the estimator handle different X/y index? boolean, True or False.

Behavioural tags:
* `y_inner_mtype`: `sktime` data format (mtype) used in internal methods `_fit`, `_predict`. Example: `"pd.Series"`
* `X_inner_mtype`: `sktime` data format (mtype) used in internal methods `_fit`, `_predict`.  Example: `"pd.DataFrame"`
* `enforce_index_type`: index type that needs to be enforced in X/y. None if index type is not enforced.
* `fit_is_empty`: is fit empty and can be skipped? boolean, True or False

**Transformer tags:**

Capability tags:
* `capability:inverse_transform`: can the transformer inverse transform? boolean, True or False
* `univariate-only`: can the transformer handle multivariate X? boolean, True or False
* `capability:unequal_length`: can the transformer handle unequal length time series (if passed Panel)? boolean, True or False
* `capability:unequal_length:removes`: is transform result always guaranteed to be equal length (and series)? not relevant for transformers that return Primitives in transform-output. boolean, True or False
* `handles-missing-data`: can estimator handle missing data? boolean, True or False
* `capability:missing_values:removes`: is transform result always guaranteed to contain no missing values? boolean, True or False

Property and type tags:
* `scitype:transform-input`: what is the scitype of X: Series, or Panel
* `scitype:transform-output`: what scitype is returned: Primitives, Series, Panel
* `scitype:transform-labels`: what is the scitype of y: None (not needed), Primitives, Series, Panel
* `scitype:instancewise`: is this an instance-wise transform? boolean, True or False  
    for example the [LogTransformer](https://github.com/alan-turing-institute/sktime/blob/4bf649b9a55861f8e7f61f017384d3e035a7d689/sktime/transformations/series/boxcox.py#L211) is applied on each time point individually.
* `requires_y`: does y need to be passed in fit? boolean, True or False
* `X-y-must-have-same-index`: can estimator handle different X/y index? boolean, True or False
* `transform-returns-same-time-index`: does transform return have the same time index as input X boolean, True or False

Behavioural tags:
* `X_inner_mtype`: `sktime` data format (mtype) used in internal methods `_fit`, `_predict`.  Example: `"pd.DataFrame"`
* `y_inner_mtype`: `sktime` data format (mtype) used in internal methods `_fit`, `_predict`. Should be `"None"` if `y` is not used.
* `enforce_index_type`: index type that needs to be enforced in X/y. None if no idex type is enforced
* `fit_is_empty`: is fit empty and can be skipped? boolean, True or False
* `skip-inverse-transform`: is inverse-transform skipped when called? boolean, True or False

## Sktime scitypes and mtypes

Sktime distinguishes between time different time series data containers in 2 levels:
* **scitypes:** Short for scientific types. These are collections of multiple ways to represent the same information.
* **mtypes:** Short for machine types. These are specific, machine-readable representations of a scitype. Each scitype will typically have multiple mtypes.

The currently supported scitypes are:
* **Series:** uni- or multivariate time series
* **Panel:** panel of uni- or multivariate time series
* **Hierarchical:** hierarchical panel of time series with 3 or more levels
* **Alignment:** series or sequence alignment
* **Table:** data table with primitive column types
* **Proba:** probability distribution or distribution statistics, return types

For example the Panel scitype can be represented by the following mtypes:
* **nested_univ:**`pd.DataFrame` with one column per variable, pd.Series in cells
* **numpy3D:**3D `np.array` of format (n_instances, n_columns, n_timepoints)
* **pd-multiindex:**`pd.DataFrame` with multi-index (instances, timepoints)
* **pd-wide**:`pd.DataFrame` in wide format, cols = (instance * timepoints)
* **pd-long:**`pd.DataFrame` in long format, cols = (index, time_index, column)
* **df-list:**`list` of `pd.DataFrame`

You can convert a panel from one mtype to the other using the [convert](https://www.sktime.org/en/latest/api_reference/auto_generated/sktime.datatypes.convert.html) function.


In [None]:
from sktime.datatypes import convert
from sktime.datatypes import get_examples

print("nested_univ")
example_panel = get_examples(mtype="nested_univ")[0]
display(example_panel)
print("")

print("numpy3D")
example_panel = convert(obj=example_panel ,from_type="nested_univ", to_type="numpy3D")
display(example_panel)
print("")

print("pd-multiindex")
example_panel = convert(obj=example_panel ,from_type="numpy3D", to_type="pd-multiindex")
display(example_panel)
print("")

print("nested_univ")
example_panel = convert(obj=example_panel ,from_type="pd-multiindex", to_type="nested_univ")
display(example_panel)
print("")