### Overview of this notebook

* motivating example with modular building blocks
    * connecting distances, aligners, classifiers
* pairwise transformers - the "type" of time series distances and kernels
* time series alignment and alignment distances, e.g., time warping
* composition patterns for distances, kernels, aligners
* outlook, roadmap, opportunities

In [None]:
import warnings

warnings.filterwarnings("ignore")

## 3.1 Motivating example

Rich component relationships between object types!

* many classifiers, regressors, clusterers use distances or kernels
* distances and kernels are often composite, e.g., sum-of-distance, independent distance
* TS distances are often based on scalar multivariate distances (e.g., Euclidean)
* TS distances are often based on alignment, TS aligners are an estimator type!
* aligners internally typically use scalar uni/multivariate distances

example:

* 1-nn using `sklearn` nearest neighbors
* with multivariate dynamic time warping distance, from `dtw-python` library 
* on multivariate `"mahalanobis"` distance from `scipy`
* in `sktime` compatible interface, constructed from custom components

so, conceptually:

* we build an sequence alignment algorithm (`dtw-python`) using `scipy` Mahalanobis dist
* we get the distance matrix computation from alignment algorithm
* we use that distance matrix in `sklearn` knn
* together this is a time series classifier!

In [None]:
from sktime.alignment.dtw_python import AlignerDTWfromDist
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.scipy_dist import ScipyDist

# Mahalanobis distance on R^n
mahalanobis_dist = ScipyDist(metric="mahalanobis")  # uses scipy distances

# pairwise multivariate aligner from dtw-python with Mahalanobis distance
mw_aligner = AlignerDTWfromDist(mahalanobis_dist)  # uses dtw-python

# turning this into alignment distance on time series
dtw_dist = DistFromAligner(mw_aligner)  # interface mutation to distance

# and using this distance in a k-nn classifier
clf = KNeighborsTimeSeriesClassifier(distance=dtw_dist)  # uses sklearn knn

In [None]:
clf.get_params()

what are all the objects in this chain?

* `ScipyDist` - pairwise distance between *scalars* - `transformer-pairwise` type
* `AlignerDtwFromDist` - time series alignment algorithm - `aligner` type
* `DistFromAligner`- pairwise distance between *time series* - `transformer-pairwise-panel` type
* `KNeighborsTimeSeriesClassifier` - time series classifier

In [None]:
from sktime.registry import scitype

scitype(mw_aligner)  # prints the type of estimator (as a string)
# same for other components

let's go through these - we've already seen classifiers.

## 3.2 Time series distances and kernels - pairwise panel transformers

### 3.2.1 Distances, kernels - general interface

pairwise panel transformers produce one distance per pair of series in the panel:

In [None]:
from sktime.datasets import load_osuleaf

# load an example time series panel in numpy mtype
X, _ = load_osuleaf(return_type="numpy3D")

X1 = X[:3]
X2 = X[5:10]

In [None]:
# constructing the transformer
from sktime.dists_kernels import AggrDist

# mean of paired Euclidean distances, over time points
mean_euc_dist = AggrDist.create_test_instance()

In [None]:
X1.shape

In [None]:
X2.shape

X1 is panel with 3 series
X2 is panel with 5 series

so a matrix of pairwise distances from X1 to X2 should have shape (3, 5)

In [None]:
distmat = mean_euc_dist(X1, X2)

# alternatively, via the transform method
distmat = mean_euc_dist.transform(X1, X2)
distmat

In [None]:
distmat.shape

call or `transform` with a single arg is the same as passing twice:

In [None]:
distmat_symm = mean_euc_dist.transform(X1)
distmat_symm

pairwise panel transformers are `scikit-learn` / `scikit-base` interface compatible like everything else:

In [None]:
mean_euc_dist.get_params()

Pairwise transformers are composable, and use the familiar `get_params` interface, just like any other `sktime` object and `scikit-learn` estimator:

In [None]:
mean_euc_dist.get_params()

### 3.2.2 Time series distances, kernels - composition

pairwise transformers can be composed in a number of ways:

* arithmetics, e.g., addition, multiplication - use dunder `+`, `*` etc, or `CombinedDistance`
* subset to one or multiple columns - use `my_dist[colnames]` dunder
* sum or aggregate over univariate distance in multivariate panel, using `IndepDist` (also known as "independent distance")
* compose with series-to-series transformers - use `*` dunder or `make_pipeline`

In [None]:
from sktime.datasets import load_basic_motions

# load an example time series panel in numpy mtype
X, _ = load_basic_motions(return_type="numpy3D")
X = X[:3]
X.shape

In [None]:
# example 1: variable subsetting and arithmetic combinations

# first, mean euclidean distance as before
from sktime.dists_kernels import AggrDist

mean_euc_dist = AggrDist.create_test_instance()

# product of the distances on variables 2 and 5
prod_med_25 = mean_euc_dist[2] * mean_euc_dist[5]
prod_med_25

In [None]:
prod_med_25(X)

In [None]:
# example 2: independent dynamic time warping distance
from sktime.alignment.dtw_python import AlignerDTW
from sktime.dists_kernels.compose_from_align import DistFromAligner
from sktime.dists_kernels.indep import IndepDist

# dynamic time warping distance - this is multivariate
dtw_dist = DistFromAligner(AlignerDTW())

# independent distance - by default IndepDist sums over univariate distances
indep_dtw_dist = IndepDist(dtw_dist)

In [None]:
indep_dtw_dist(X)

In [None]:
# example 3: dynamic time warping distance on first differences
from sktime.transformations.series.difference import Differencer

diff_dtw_distance = Differencer() * dtw_dist

In [None]:
diff_dtw_distance(X)

some combinations may be available as efficient `numba` based distances.

E.g., difference-then-dtw is available as the "fixed" `sktime` native implementation
`DtwDist(derivative=True)` in `sktime.dists_kernels.dtw`.

### 3.3 pairwise tabular transformers

### 3.3.1 pairwise tabular transformers - general interface

pairwise tabular transformers transform pairs of ordinary tabular data, e.g., plain `pd.DataFrame`

produce one distance per pair of rows

In [None]:
from sktime.datatypes import get_examples

# we retrieve some DataFrame examples
X_tabular = get_examples("pd.DataFrame", "Series")[1]
X2_tabular = get_examples("pd.DataFrame", "Series")[1][0:3]

In [None]:
# just an ordinary DataFrame, no time series
X_tabular

In [None]:
X2_tabular

example: pairwise Euclidean distance between rows

In [None]:
# constructing the transformer
from sktime.dists_kernels import ScipyDist

# mean of paired Euclidean distances
my_tabular_dist = ScipyDist(metric="euclidean")

In [None]:
# obtain matrix of distances between each pair of rows in X_tabular, X2_tabular
my_tabular_dist(X_tabular, X2_tabular)

### 3.3.2 constructing pairwise time series transformers from tabular ones

"simple" time series distances can be obtained directly from tabular transformers:

* aggregating the tabular distance matrix, from two individual time series - `AggrDist`
* flattening the time series to tabular, and then computing the distance - `FlatDist`

these are important "baseline" distances!

Both can be used on `sktime` pairwise transformers and `sklearn` pairwise transformers.

the classes are called "dist" but all apply to kernels.

In [None]:
from sktime.datasets import load_basic_motions

# load an example time series panel in numpy mtype
X, _ = load_basic_motions(return_type="numpy3D")
X = X[:3]
X.shape

In [None]:
# example 1: flat Gaussian RBF kernel between time series
from sklearn.gaussian_process.kernels import RBF
from sktime.dists_kernels import FlatDist

flat_gaussian_tskernel = FlatDist(RBF(length_scale=10))
flat_gaussian_tskernel.get_params()

In [None]:
flat_gaussian_tskernel(X)

In [None]:
# example 2: mean pairwise cosine distance - we've already seen AggrDist a couple times
from sktime.dists_kernels import AggrDist, ScipyDist

mean_cos_tsdist = AggrDist(ScipyDist(metric="cosine"))
mean_cos_tsdist.get_params()

In [None]:
mean_cos_tsdist(X)

## 3.4 Searching for pairwise transformers

As with all `sktime` objects, we can use the `registry.all_estimators` utility to display all transformers in `sktime`.

The relevant scitypes are:
* `"transformer"` for all transformers
* `"transformer-pairwise"` for all pairwise transformers on tabular data
* `"transformer-panel"` for all pairwise transformers on panel data

In [None]:
from sktime.registry import all_estimators

In [None]:
# listing all pairwise (tabular) transformers - distances, kernels on vectors/df-rows
all_estimators("transformer-pairwise", as_dataframe=True)

In [None]:
# listing all pairwise panel transformers - distances, kernels on time series
all_estimators("transformer-pairwise-panel", as_dataframe=True)

WIP FROM HERE

## 3.5 Outlook, roadmap

## 3.6 Summary

---

### Credits: notebook 3 - distances, kernels, alignment

notebook creation: 

---

## Join sktime!

* openly governed - users, developers, early career data scientists
* world-wide contributor and user footprint

**EVERYONE CAN JOIN! EVERYONE CAN BECOME A COMMUNITY LEADER!**

* join our discord (developers and community)!
    * regular **community collaboration sessions** and stand-ups on Fridays
    * next **onboarding session**: June 2023
    * next **developer sprint**: July 2023

Opportunities:

* sktime **mentoring programme**: github.com/sktime/mentoring