# Interval based time series classification in sktime

Interval based approaches look at phase dependant intervals of the full series, calculating summary statistics from selected subseries to be used in classification.

Currently 2 interval based approaches are implemented in sktime. Time Series Forest (TSF) and the Canonical Interval Forest (CIF).

In this notebook, we will demonstrate how to use TSF and CIF on the ItalyPowerDemand dataset. Both algorithms are currently only compatible with univariate time series datasets.

#### References:

\[1\] Deng, H., Runger, G., Tuv, E., & Vladimir, M. (2013). A time series forest for classification and feature extraction. Information Sciences, 239, 142-153.

\[2\] Middlehurst, M., Large, J., & Bagnall, A. (2020). The Canonical Interval Forest (CIF) Classifier for Time Series Classification. arXiv preprint arXiv:2008.09172.

\[3\] Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., & Jones, N. S. (2019). catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery, 33(6), 1821-1852.

## 1. Imports

In [1]:
from sktime.classification.interval_based import CanonicalIntervalForest
from sktime.classification.interval_based import TimeSeriesForest

from sklearn import metrics
from sktime.datasets import load_italy_power_demand



## 2. Load data

In [2]:
X_train, y_train = load_italy_power_demand(split='train', return_X_y=True)
X_test, y_test = load_italy_power_demand(split='test', return_X_y=True)
X_test = X_test[:50]
y_test = y_test[:50]

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(67, 1) (67,) (50, 1) (50,)


## 3. Time Series Forest (TSF)

TSF is an ensemble of tree classifiers built on the summary statistics of randomly selected intervals.
For each tree sqrt(series_length) intervals are randomly selected.
From each of these intervals the mean, standard deviation and slope is extracted from each time series and concatenated into a feature vector.
These new features are then used to build a tree, which is added to the ensemble.

In [3]:
tsf = TimeSeriesForest(n_estimators=200)
tsf.fit(X_train, y_train)

tsf_preds = tsf.predict(X_test)
print("TSF Accuracy: " + str(metrics.accuracy_score(y_test, tsf_preds)))

TSF Accuracy: 1.0


## 4. Canonical Interval Forest (CIF)

CIF extends from the TSF algorithm. In addition to the 3 summary statistics used by TSF, CIF makes use of the features from the `Catch22` transform.
To increase the diversity of the ensemble, the number of TSF and catch22 attributes is randomly subsampled per tree.

In [4]:
cif = CanonicalIntervalForest(n_estimators=500, att_subsample_size=8)
cif.fit(X_train, y_train)

cif_preds = cif.predict(X_test)
print("CIF Accuracy: " + str(metrics.accuracy_score(y_test, cif_preds)))

TypeError: randint() missing 1 required positional argument: 'b'