Agenda:

1. General introduction to `sktime`

2. panel tasks - classification, regression, clustering

3. distances, kernels, alignment - and use for panel tasks

## 1 - Introduction to ``sktime``

### 1.1 What is ``sktime``?

- `sktime` is a python library for time series learning tasks!
  - check [our website](https://www.sktime.net/en/latest/index.html)! 
  - integrative framework layer in the time series space

- `sklearn` / `sktime` interface:
  - unified interface for objects/estimators
  - modular design, strategy pattern
  - composable, composites are interface homogenous
  - simple specification language and parameter interface
  - visually informative pretty printing

- `sktime` is a vibrant, welcoming community with mentoring opportunities!
  - We *love* new contributors. Even if you are new to open source software developement!
  - Check out the ``sktime`` [new contributors guide](https://www.sktime.net/en/latest/get_involved/contributing.html)
  - join our [discord](https://discord.com/invite/54ACzaFsn7) and/or one of our regular meetups!
  - follow us on [LinkedIn](https://www.linkedin.com/company/scikit-time/)!

### 1.2 sklearn unified interface - the strategy pattern

`sklearn` provides a unified interface to multiple learning tasks including classification, regression.

any (supervised) estimator has the following interface points

1. **Instantiate** your model of choice, with parameter settings
2. **Fit** the instance of your model
3. Use that fitted instance to **predict** new data!

![](./img/estimator-conceptual-model.jpg)

In [None]:
# get data to use the model on
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
from sklearn.svm import SVC

# 1. Instantiate SVC with parameters gamma, C
clf = SVC(gamma=0.001, C=100.)

# 2. Fit clf to training data
clf.fit(X_train, y_train)

# 3. Predict labels on test data
y_test_pred = clf.predict(X_test)

y_test_pred

IMPORTANT: to use another classifier, only the specification line, part 1 changes!

`SVC` could have been `RandomForest`, steps 2 and 3 remain the same - unified interface:

In [None]:
from sklearn.ensemble import RandomForestClassifier

# 1. Instantiate SVC with parameters gamma, C
clf = RandomForestClassifier(n_estimators=100)

# 2. Fit clf to training data
clf.fit(X_train, y_train)

# 3. Predict labels on test data
y_test_pred = clf.predict(X_test)

y_test_pred

in object oriented design terminology, this is called **"strategy pattern"**

= different estimators can be switched out without change to the interface

= like a power plug adapter, it's plug&play if it conforms with the interface

Pictorial summary:
![](./img/sklearn-unified-interface.jpg)

parameters can be accessed and set via `get_params`, `set_params`:

In [None]:
clf.get_params()

### 1.3 `sktime` is devoted to time-series data analysis

Richer space of time series tasks, compared to "tabular":

- **Forecasting** - predict energy consumption tomorrow, based on past weeks
- **Classification** - classify electrocardiograms to healthy/sick, based on prior examples
- **Regression** - predict compound purity in bioreactor based on temperature/pressure profile
- **Clustering** - sort outlines of tree leaves into a small number of similar classes
- **Annotation** - identify jumps, anomalies, events in a data stream

`sktime` aims to provide `sklearn`-like, modular, composable, interfaces for these!

Example - forecasting

In [None]:
from sktime.datasets import load_airline
from sktime.forecasting.naive import NaiveForecaster
import numpy as np

# step 1: data specification
y = load_airline()

# step 2: specifying forecasting horizon
fh = np.arange(1, 37)

# step 3: specifying the forecasting algorithm
forecaster = NaiveForecaster(strategy="last", sp=12)

# step 4: fitting the forecaster
forecaster.fit(y)

# step 5: querying predictions
y_pred = forecaster.predict(fh)

Example - classification

In [None]:
from sktime.datasets import load_osuleaf
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.dists_kernels.compose_tab_to_panel import AggrDist
from sktime.dists_kernels import ScipyDist

# step 1 - specify training data
X_train, y_train = load_osuleaf(split="train", return_type="numpy3D")

# step 2 - specify data to predict labels for
X_new, _ = load_osuleaf(split="test", return_type="numpy3D")
X_new = X_new[:2]

# step 3 - specify the classifier
mean_eucl_dist = AggrDist(ScipyDist())
clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, distance=mean_eucl_dist)

# step 4 - fitting the classifier
clf.fit(X_train, y_train)

# step 5 - predict labels on new data
y_pred = clf.predict(X_new)

### 1.4 Summary/What is next!

- `sklearn` interface: unified interface (strategy pattern), modular, composition stable, easy specification language
- `sktime` evolves the interface for time series learning tasks, with ecosystem integration
- today we will look at: learning tasks related to collections of time series - classification, regression, clustering
- and modular construction of estimators using time series distances, kernels, aligners = objects in their own right