### Overview of this notebook

* motivating example with modular building blocks
    * connecting distances, aligners, classifiers
* pairwise transformers - the "type" of time series distances and kernels
* time series alignment and alignment distances, e.g., time warping
* composition patterns for distances, kernels, aligners

In [3]:
import warnings

warnings.filterwarnings("ignore")

## 3.1 Motivating example

In [4]:
from sktime.alignment.dtw_python import AlignerDTWfromDist
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.scipy_dist import ScipyDist

# Mahalanobis distance on R^n
mahalanobis_dist = ScipyDist(metric="mahalanobis")  # uses scipy distances

# pairwise multivariate aligner from dtw-python with Mahalanobis distance
mw_aligner = AlignerDTWfromDist(mahalanobis_dist)  # uses dtw-python

# turning this into alignment distance on time series
dtw_dist = DistFromAligner(mw_aligner)  # interface mutation to distance

# and using this distance in a k-nn classifier
clf = KNeighborsTimeSeriesClassifier(distance=dtw_dist)  # uses sklearn knn

In [5]:
clf.get_params()

{'algorithm': 'brute',
 'distance': DistFromAligner(aligner=AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis'))),
 'distance_mtype': None,
 'distance_params': None,
 'leaf_size': 30,
 'n_jobs': None,
 'n_neighbors': 1,
 'pass_train_distances': False,
 'weights': 'uniform',
 'distance__aligner': AlignerDTWfromDist(dist_trafo=ScipyDist(metric='mahalanobis')),
 'distance__aligner__dist_trafo': ScipyDist(metric='mahalanobis'),
 'distance__aligner__open_begin': False,
 'distance__aligner__open_end': False,
 'distance__aligner__step_pattern': 'symmetric2',
 'distance__aligner__window_type': 'none',
 'distance__aligner__dist_trafo__colalign': 'intersect',
 'distance__aligner__dist_trafo__metric': 'mahalanobis',
 'distance__aligner__dist_trafo__metric_kwargs': None,
 'distance__aligner__dist_trafo__p': 2,
 'distance__aligner__dist_trafo__var_weights': None}

### 3.4.3 General transformer signature - pairwise series transformers<a class="anchor" id="section_3_4_3"></a>

Pairwise series transformers model mathematical objects of signature `(Series, Series) -> float`, or, in mathematical notation, $$\texttt{series} \times\texttt{series}\rightarrow\mathbb{R}$$
Common examples are distances between series, or (positive definite) kernels on series.

Pairwise transformers have a parametric constructor, like any other `sktime` object. The transformation is achieved by the method `transform`, or, equivalently, for brevity, by a call to the constructed object.

The method `transform` always returns a 2D `numpy.ndarray`, and can be called in multiple ways:
* with two `Series` arguments `X, X2`, in which case a 1 x 1 array is returned. Denote this function by `t(X, X2)`
* with two `Panel` arguments `X`, `X2`, in which case an `m x n` array is returned, where `m` is the number of instances in `X` and `n` is the number of instances in `X2`. The `(i,j)`-th entry corresponds to `t(Xi, X2j)`, where `Xi` is ths `i`-th `Series` in the `Panel` `X`, and `X2j` is the `j`-th `Series` in the `Panel` `X2`.
* with one `Series` and one `Panel` argument, in which case the `Series` is interpreted as a 1-element `Panel`, with return as above.
* with one single argument, `Series` or `Panel`, in which case `X` and `X2` are assumed to be the same as the one argument, with behaviour as above.

We show these in a few examples below.

In [None]:
from sktime.datatypes import get_examples

# unviariate series used in the examples
X_series = get_examples("pd.Series", "Series")[0]
X2_series = get_examples("pd.Series", "Series")[1]
# panel used in the examples
X_panel = get_examples("pd-multiindex", "Panel")[0]

First, we construct the pairwise transformer with parameters. In this case, the pairwise transformer is a distance (the mean Euclidean distance):

In [None]:
# constructing the transformer
from sktime.dists_kernels import AggrDist, ScipyDist

# mean of paired Euclidean distances
my_series_dist = AggrDist(ScipyDist(metric="euclidean"))

We can then evaluate the distance by `transform` or direct call:

In [None]:
# evaluate the metric on two series, via transform
my_series_dist.transform(X_series, X2_series)

In [None]:
# evaluate the metric on two series, by direct call - this is the same
my_series_dist(X_series, X2_series)

In [None]:
# evaluate the metric on two identical panels of three series
my_series_dist(X_panel, X_panel)

In [None]:
# this is the same as providing only one argument
my_series_dist(X_panel)

In [None]:
# one series, one panel
# we subset X_panel to univariate, since the distance in question
#     cannot compare series with different number of variables
my_series_dist(X_series, X_panel[["var_1"]])

Pairwise transformers are composable, and use the familiar `get_params` interface, just like any other `sktime` object and `scikit-learn` estimator:

In [None]:
my_series_dist.get_params()

### 3.4.4 General transformer signature - pairwise transformers<a class="anchor" id="section_3_4_4"></a>

`sktime` also provides functionality for pairwise transformers on tabular data, i.e., mathematical objects of signature `(DataFrame-row, DataFrame-row) -> float`, or, in mathematical notation, $$\mathbb{R}^n \times\mathbb{R}^n\rightarrow\mathbb{R}$$.
Common examples are distances between series, or (positive definite) kernels on series.

The behaviour is as for series transformers, evaluation is callable by `transform(X, X2)` or a direct call.

Inputs to `transform` of a pairwise (tabular) transformer must always be `pandas.DataFrame`.
The output is an `m x n` matrix, a 2D `np.ndarray`, with `m = len(X), n=len(X2)`. 
The `(i,j)`-th entry corresponds to `t(Xi, X2j)`, where `Xi` is ths `i`-th row of `X`, and `X2j` is the `j`-th row of `X2`.
If `X2` is not passed, it defaults to `X`.

Example:

In [None]:
from sktime.datatypes import get_examples

# we retrieve some DataFrame examples
X_tabular = get_examples("pd.DataFrame", "Series")[1]
X2_tabular = get_examples("pd.DataFrame", "Series")[1][0:3]

In [None]:
# constructing the transformer
from sktime.dists_kernels import ScipyDist

# mean of paired Euclidean distances
my_tabular_dist = ScipyDist(metric="euclidean")

In [None]:
# obtain matrix of distances between each pair of rows in X_tabular, X2_tabular
my_tabular_dist(X_tabular, X2_tabular)

### 3.4.5 Searching for transformers<a class="anchor" id="section_3_4_5"></a>

As with all `sktime` objects, we can use the `registry.all_estimators` utility to display all transformers in `sktime`.

The relevant scitypes are:
* `"transformer"` for all transformers (as in Section 2.2)
* `"transformer-pairwise"` for all pairwise transformers on tabular data (as in Section 2.4)
* `"transformer-panel"` for all pairwise transformers on panel data (as in Section 2.3)

To filter transformers (`"transformer"` scitype) further by input and output, use tags, most importantly:
* `"scitype:transform-output"` - the output scitype that the transform produces. `Series` for time series, `Primitives` for primitive features (float, categories).
* `"scitype:instancewise"` - whether transform uses all samples or acts by instance. If `True`, this is simply a vectorized operation per series. If `False`, then fitting on a single series does not have the same result as fitting on multiple.

These and further tags will be explained in more detail in Section 2.

In [None]:
from sktime.registry import all_estimators

In [None]:
# listing all pairwise panel transformers - distances, kernels on time series
all_estimators("transformer", as_dataframe=True)

In [None]:
# now subset to transformers that extract scalar features
all_estimators(
    "transformer",
    as_dataframe=True,
    filter_tags={"scitype:transform-output": "Primitives"},
)

In [None]:
# listing all pairwise (tabular) transformers - distances, kernels on vectors/df-rows
all_estimators("transformer-pairwise", as_dataframe=True)

In [None]:
# listing all pairwise panel transformers - distances, kernels on time series
all_estimators("transformer-pairwise-panel", as_dataframe=True)

## 3.6 Summary

---

### Credits: notebook 3 - distances, kernels, alignment

notebook creation: 

---

## Join sktime!

* openly governed - users, developers, early career data scientists
* world-wide contributor and user footprint

**EVERYONE CAN JOIN! EVERYONE CAN BECOME A COMMUNITY LEADER!**

* join our discord (developers and community)!
    * regular **community collaboration sessions** and stand-ups on Fridays
    * next **onboarding session**: June 2023
    * next **developer sprint**: July 2023

Opportunities:

* sktime **mentoring programme**: github.com/sktime/mentoring