<img src="./img/sktime-logo-text-horizontal.jpg" alt="sktime logo" style="width: 100%; max-width: 600px;">

<span style="font-size: 5em;"> `skchange` </span>

### Agenda for today

1. **general introduction** to `skchange` & `sktime`

2. **detection**

    * change detection and segmentation
    * segment anomaly detection
    * detector API
    * algorithm framework
    * costs and scores

3. **use cases**

    * failure detection - HVAC
    * health monitoring - type 1 diabetes meal detection
    * open challenge - contribute!

### Running the notebooks

all notebooks available on [github.com/sktime](https://github.com/sktime) 

repository: [github.com/sktime/sktime-tutorial-pydata-global-2024](https://github.com/sktime/sktime-tutorial-pydata-global-2024)

* README instructions to run notebooks locally
* binder to run notebooks in the cloud (if wifi allows)

## 1 - Introduction to `sktime` and `skchange`

### 1.1 What is `sktime`? What is `skchange`?

- `sktime` is a python library for time series learning tasks!
  - check [our website](https://www.sktime.net/en/latest/index.html)! 
  - integrative framework layer in the time series space via `scikit-base`

  **Easy to integrate with 2nd party libraries!**

  `skchange` - unified API for composable detection algorithms
  - check [the website](https://skchange.readthedocs.io/)!

- `sklearn` / `sktime` interface:
  - unified interface for objects/estimators
  - modular design, strategy pattern
  - composable, composites are interface homogeneous
  - simple specification language and parameter interface
  - visually informative pretty printing

- `sktime` is a vibrant, welcoming community with mentoring opportunities!
  - We *love* new contributors. Even if you are new to open source software development!
  - Check out the ``sktime`` [new contributors guide](https://www.sktime.net/en/latest/get_involved/contributing.html)
  - join our [discord](https://discord.com/invite/54ACzaFsn7) and/or one of our regular meetups!
  - follow us on [LinkedIn](https://www.linkedin.com/company/scikit-time/)!

### 1.2 `sklearn` unified interface - the strategy pattern

`sklearn` provides a unified interface to multiple learning tasks including classification, regression.

any (supervised) estimator has the following interface points

1. **Instantiate** your model of choice, with parameter settings
2. **Fit** the instance of your model
3. Use that fitted instance to **predict** new data!

<img src="./img/estimator-conceptual-model.jpg" alt="Estimator conceptual model" style="width: 100%; max-width: 1200px;">

In [1]:
import warnings

warnings.filterwarnings("ignore")

In [2]:
# get data to use the model on
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [3]:
X_train.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
70,5.9,3.2,4.8,1.8
140,6.7,3.1,5.6,2.4
73,6.1,2.8,4.7,1.2
31,5.4,3.4,1.5,0.4
56,6.3,3.3,4.7,1.6


In [4]:
y_train.head()

70     1
140    2
73     1
31     0
56     1
Name: target, dtype: int64

In [5]:
X_test.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
76,6.8,2.8,4.8,1.4
47,4.6,3.2,1.4,0.2
78,6.0,2.9,4.5,1.5
66,5.6,3.0,4.5,1.5
106,4.9,2.5,4.5,1.7


In [6]:
from sklearn.svm import SVC

# 1. Instantiate SVC with parameters gamma, C
clf = SVC(gamma=0.001, C=100.0)

# 2. Fit clf to training data
clf.fit(X_train, y_train)

# 3. Predict labels on test data
y_test_pred = clf.predict(X_test)

y_test_pred

array([1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 2, 1, 0, 0, 1, 1, 0,
       0, 1, 0, 0, 1, 2, 2, 0, 0, 1, 2, 2, 0, 1, 1, 1])

IMPORTANT: to use another classifier, only the specification line, part 1 changes!

`SVC` could have been `RandomForest`, steps 2 and 3 remain the same - unified interface:

In [7]:
from sklearn.ensemble import RandomForestClassifier

# 1. Instantiate RandomForest with parameters n_estimators
clf = RandomForestClassifier(n_estimators=100)

# 2. Fit clf to training data
clf.fit(X_train, y_train)

# 3. Predict labels on test data
y_test_pred = clf.predict(X_test)

y_test_pred

array([1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 2, 1, 0, 0, 1, 1, 0,
       0, 2, 0, 0, 1, 2, 2, 0, 0, 1, 1, 2, 0, 1, 1, 1])

in object oriented design terminology, this is called **"strategy pattern"**

= different estimators can be switched out without change to the interface

= like a power plug adapter, it's plug&play if it conforms with the interface

Pictorial summary:

<img src="./img/sklearn-unified-interface.jpg" alt="Unified estimator interface" style="width: 100%; max-width: 1200px;">

parameters can be accessed and set via `get_params`, `set_params`:

In [8]:
clf.get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'sqrt',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'monotonic_cst': None,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': None,
 'verbose': 0,
 'warm_start': False}

### 1.3 `sktime` is devoted to time-series data analysis

Richer space of time series tasks, compared to "tabular":

- **Forecasting** - predict energy consumption tomorrow, based on past weeks
- **Classification** - classify electrocardiograms to healthy/sick, based on prior examples
- **Regression** - predict compound purity in bioreactor based on temperature/pressure profile
- **Clustering** - sort outlines of tree leaves into a small number of similar classes
- **Anomaly & changepoint detection, segmentation** - identify jumps, anomalies, events in a data stream

`sktime` aims to provide `sklearn`-like, modular, composable, interfaces for these!

| Task | Status | Links |
|---|---|---|
| **Forecasting** | stable | [Tutorial](https://www.sktime.net/en/latest/examples/01_forecasting.html) · [API Reference](https://www.sktime.net/en/latest/api_reference/forecasting.html) · [Extension Template](https://github.com/sktime/sktime/blob/main/extension_templates/forecasting.py)  |
| **Time Series Classification** | stable | [Tutorial](https://github.com/sktime/sktime/blob/main/examples/02_classification.ipynb) · [API Reference](https://www.sktime.net/en/latest/api_reference/classification.html) · [Extension Template](https://github.com/sktime/sktime/blob/main/extension_templates/classification.py) |
| **Time Series Regression** | stable | [Tutorial](https://github.com/sktime/sktime/blob/main/examples/02_classification.ipynb) · [API Reference](https://www.sktime.net/en/latest/api_reference/regression.html) |
| **Transformations** | stable | [Tutorial](https://github.com/sktime/sktime/blob/main/examples/03_transformers.ipynb) · [API Reference](https://www.sktime.net/en/latest/api_reference/transformations.html) · [Extension Template](https://github.com/sktime/sktime/blob/main/extension_templates/transformer.py)  |
| **Performance metrics for forecasts** | stable | [API Reference](https://www.sktime.net/en/latest/api_reference/performance_metrics.html) |
| **Time series splitting/resampling** | stable | [API Reference](https://www.sktime.net/en/latest/api_reference/split.html) |
| **Parameter fitting** | maturing | [API Reference](https://www.sktime.net/en/latest/api_reference/param_est.html) · [Extension Template](https://github.com/sktime/sktime/blob/main/extension_templates/transformer.py)  |
| **Time Series Alignment** | maturing | [API Reference](https://www.sktime.net/en/latest/api_reference/alignment.html) ·  [Extension Template](https://github.com/sktime/sktime/blob/main/extension_templates/alignment.py) |
| **Time Series Clustering** | maturing | [API Reference](https://www.sktime.net/en/latest/api_reference/clustering.html) ·  [Extension Template](https://github.com/sktime/sktime/blob/main/extension_templates/clustering.py) |
| **Time Series Distances/Kernels** | maturing | [Tutorial](https://github.com/sktime/sktime/blob/main/examples/03_transformers.ipynb) · [API Reference](https://www.sktime.net/en/latest/api_reference/dists_kernels.html) · [Extension Template](https://github.com/sktime/sktime/blob/main/extension_templates/dist_kern_panel.py) |
| **Anomalies, changepoints** (with [`skchange`](https://github.com/NorskRegnesentral/skchange)) | experimental | [Extension Template](https://github.com/sktime/sktime/blob/main/extension_templates/annotation.py) |


In the [skpro](https://github.com/sktime/skpro) companion package:

| Module | Status | Links |
|---|---|---|
| **Probabilistic tabular regression** | maturing | [Tutorial](https://github.com/sktime/skpro/blob/main/examples/01_skpro_intro.ipynb) · [API Reference](https://skpro.readthedocs.io/en/latest/api_reference/regression.html) · [Extension Template](https://github.com/sktime/skpro/blob/main/extension_templates/regression.py) |
| **Time-to-event (survival) prediction** | maturing |  [Tutorial](https://github.com/sktime/skpro/blob/main/examples/02_skpro_survival.ipynb) · [API Reference](https://skpro.readthedocs.io/en/latest/api_reference/survival.html) · [Extension Template](https://github.com/sktime/skpro/blob/main/extension_templates/survival.py) |
| **Performance metrics for proba predictions** | maturing | [API Reference](https://skpro.readthedocs.io/en/latest/api_reference/metrics.html) |
| **Probability distributions** | maturing | [Tutorial](https://github.com/sktime/skpro/blob/main/examples/03_skpro_distributions.ipynb) · [API Reference](https://skpro.readthedocs.io/en/latest/api_reference/distributions.html) · [Extension Template](https://github.com/sktime/skpro/blob/main/extension_templates/distributions.py) |

#### Example of change detection
**Change detection** - identify points in time series where properties of the data changes.

"Change" or "change point" detection are used interchangeably.

In [13]:
import skchange

skchange.__version__

'0.12.0'

In [17]:
from skchange.datasets import generate_alternating_data
from utils import plot_multivariate_time_series, add_changepoint_vlines

from skchange.change_detectors import MovingWindow

df = generate_alternating_data(n_segments=10, segment_length=50, mean=5, random_state=1)

detector = MovingWindow(bandwidth=15)
cpts = detector.fit_predict(df)

cpt_fig = plot_multivariate_time_series(df)
cpt_fig = add_changepoint_vlines(cpt_fig, cpts)
cpt_fig.update_layout(
    showlegend=False, xaxis_title=None
)
cpt_fig.show()

### 1.4 `sktime` integrates the time series modelling ecosystem!

the package space for time series is highly fragmented:

* lots of great implementations and methods out there!
* but many different interfaces, not composable like `sklearn`

<img src="./img/ts-fragmentation.png" alt="The fragmented time series ecosystem" style="width: 100%; max-width: 1200px;">


`sktime` integrates the ecosystem - in friendly collaboration with all the packages out there!

<img src="./img/sktime-interoperable.png" alt="sktime integration" style="width: 100%; max-width: 1200px;">

<img src="./img/sktime-composable.png" alt="sktime composition" style="width: 100%; max-width: 1200px;">

easy search for plug&play components across the ecosystem!

Try the [`sktime` estimator search](https://www.sktime.net/en/latest/estimator_overview.html)

<img src="./img/estimator-search.png" alt="sktime estimator search" style="width: 100%; max-width: 1200px;">

### 1.5 `skchange` - fast change and anomaly detection in time series

A 2nd party extension to `sktime`s maturing detection module.

* **Fast**: Numba is used for performance.
* **Easy to use**: Follows the conventions of sktime and scikit-learn.
* **Easy to extend**: 

  - Make your own detectors by inheriting from the base class templates. 
  - Create custom detection scores and cost functions.
* **Segment anomaly detection**: Detect intervals of anomalous behaviour in time series data.
* **Subset anomaly detection**: Detect intervals of anomalous behaviour in time series data, and infer the subset of variables that cause the anomaly.

### 1.6 Summary/What is next!

- `sklearn` interface: unified interface (strategy pattern), modular, composition stable, easy specification language
- `sktime` evolves the interface for time series learning tasks
- `sktime` integrates a fragmented ecosytem with interface, composability, dependency management
- `skchange` extends `sktime` with fast and up-to-date change and anomaly detection methods

- Next:
    * general detection intro (50 min)
    * advanced detection patterns (20 min)
    * use case/competition (10 min)

---
### Credits: notebook 1 - `skchange` and `sktime` intro

notebook creation: fkiraly, tveten

some vignettes based on existing `sktime` tutorials, credit: fkiraly, miraep8, marrov

slides (png/jpg):

* from fkiraly's postgraduate course at UCL, Principles and Patterns in Data Scientific Software Engineering
* ecosystem slide: fkiraly, mloning
* learning tasks: fkiraly, mloning

General credit also to `sklearn`, `sktime` and `skchange` contributors