# Time series classification with sktime

Early time series classification (eTSC) is the problem of classifying a time series after as few measurements as possible with the highest possible accuracy. The most critical issue of any eTSC method is to decide when enough data of a time series has been seen to take a decision: Waiting for more data points usually makes the classification problem easier but delays the time in which a classification is made; in contrast, earlier classification has to cope with less input data, often leading to inferior accuracy.

This notebook gives a quick guide to get you started

# Data sets and problem types
The UCR/UEA [time series classification archive](https://timeseriesclassification.com/) contains a large number of example TSC problems that have been used thousands of times in the literature to assess TSC algorithms. These dataset have certain characteristics that influence what data structure we use to store them in memory.

In [86]:
# Imports used in this notebook
import matplotlib

matplotlib.rcParams["pdf.fonttype"] = 42
matplotlib.rcParams["ps.fonttype"] = 42
%matplotlib inline
%config InlineBackend.figure_formats = {'png', 'retina'}

import warnings

from sktime.classification.early_classification._teaser import TEASER
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.datasets import (  # load_basic_motions,; load_japanese_vowels,; load_plaid,
    load_arrow_head,
)

warnings.simplefilter("ignore")

In [93]:
# Load all arrow head
arrow_X, arrow_y = load_arrow_head(return_type="numpy2d")

# Load default train/test splits from sktime/datasets/data
arrow_train_X, arrow_train_y = load_arrow_head(split="train", return_type="numpy2d")
arrow_test_X, arrow_test_y = load_arrow_head(split="test", return_type="numpyflat")

# Building the TEASER Classifier

TEASER is a two-tier model using a slave and a master classifier. As a first tier, TEASER requires a TSC, such as WEASEL, which produces class probabilities as output. As a second tier is an anomaly detector, such as a one-class SVM.

In [None]:
teaser = TEASER(
    random_state=0,
    classification_points=[6, 10, 16, 24],
    estimator=TimeSeriesForestClassifier(n_estimators=10, random_state=0),
    return_safety_decisions=False,
)
teaser.fit(arrow_train_X, arrow_train_y)

# Determine the accuracy and earliness on the train data

In [94]:
print("Earliness on Train Data %2.2f" % teaser._train_earliness)
print("Accuracy on Train Data %2.2f" % teaser._train_accuracy)

Earliness on Train Data 0.74
Accuracy on Train Data 0.78


# Determine the accuracy and earliness on the test data

In [95]:
hm, acc, earl = teaser.score(arrow_test_X, arrow_test_y)

In [96]:
print("Earliness on Test Data %2.2f" % earl)
print("Accuracy on Test Data %2.2f" % acc)
print("Harmonic Mean on Test Data %2.2f" % hm)

Earliness on Test Data 0.40
Accuracy on Test Data 0.51
Harmonic Mean on Test Data 0.55


I.e. using just 40% of the full test data, we were able to get an accuracy of 51%

# Comparison to Classification on full Test Data

In [97]:
accuracy = (
    TimeSeriesForestClassifier(n_estimators=10, random_state=0)
    .fit(arrow_train_X, arrow_train_y)
    .score(arrow_test_X, arrow_test_y)
)
print("Accuracy on the full Test Data %2.2f" % accuracy)

Accuracy on the full Test Data 0.68


With the full test data, we would obtain 68% accuracy with the same classifier.