# Time series classification with Mr-SEQL

Mr-SEQL\[1\] is a univariate time series classifier which train linear classification models (logistic regression) with features extracted from multiple symbolic representations of time series (SAX, SFA). The features are extracted by using SEQL\[2\].

\[1\] T. L. Nguyen, S. Gsponer, I. Ilie, M. O'reilly and G. Ifrim Interpretable Time Series Classification using Linear Models and Multi-resolution Multi-domain Symbolic Representations in Data Mining and Knowledge Discovery (DAMI), May 2019, https://link.springer.com/article/10.1007/s10618-019-00633-3

\[2\] G. Ifrim, C. Wiuf
“Bounded Coordinate-Descent for Biological Sequence Classification in High Dimensional Predictor Space” (KDD 2011)

In this notebook, we will demonstrate how to use Mr-SEQL for univariate time series classification with the ArrowHead dataset, multivariate dataset BasicMotion and a variety of datasets from UEA/UCR time series datasets repository. Further information related to UEA/UCR repository can be found [here](http://www.timeseriesclassification.com/).

## Imports

In [None]:
from sklearn import metrics
from sklearn.model_selection import train_test_split

from sktime.classification.shapelet_based import MrSEQLClassifier
from sktime.datasets import load_arrow_head, load_basic_motions, load_UCR_UEA_dataset

## Load data: Univariate time series
For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/main/examples/02_classification_univariate.ipynb).

In [19]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


## Train and Test

Mr-SEQL can be configured to run in different mode with different symbolic representation.

seql_mode can be either 'clf' (SEQL as classifier) or 'fs' (SEQL as feature selection). If 'fs' mode is chosen, a logistic regression classifier will be trained with the features extracted by SEQL.
'fs' mode is more accurate in general.

symrep can include either 'sax' or 'sfa' or both. Using both usually produces a better result.

In [20]:
# Create mrseql object
# use sax by default
ms = MrSEQLClassifier(seql_mode="clf")
# use sfa representations
# ms = MrSEQLClassifier(seql_mode='fs', symrep=['sfa'])
# use sax and sfa representations
# ms = MrSEQLClassifier(seql_mode='fs', symrep=['sax', 'sfa'])

# fit training data
ms.fit(X_train, y_train)

# prediction
predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))

Accuracy with mr-seql: 0.887


## Load data: Multivariate time series
Mr-SEQL also supports multivariate time series. Mr-SEQL extracts features from each dimension of the data independently.

In [21]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


## Train and Test

In [12]:
ms = MrSEQLClassifier()

# fit training data
ms.fit(X_train, y_train)

predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))

Accuracy with mr-seql: 0.934


## Load data: ItalyPowerDemand
The dataset consists of twelve monthly electrical power demand time series from Italy. The classification task is to distinguish days from Oct to March (inclusive) from April to September. [Details here](http://www.timeseriesclassification.com/description.php?Dataset=ItalyPowerDemand)

In [10]:
X, y = load_UCR_UEA_dataset("ItalyPowerDemand", return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(822, 1) (822,) (274, 1) (274,)


## Train and Test

In [26]:
# Create mrseql object
# use sax by default
ms = MrSEQLClassifier(seql_mode="clf")

# fit training data
ms.fit(X_train, y_train)

# prediction
predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))

Accuracy with mr-seql: 0.942


## Load data: GunPoint
This dataset involves one female actor and one male actor making a motion with their hand. The two classes are: Gun-Draw and Point.[Details here](http://www.timeseriesclassification.com/description.php?Dataset=GunPoint)

In [28]:
X, y = load_UCR_UEA_dataset("GunPoint", return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(150, 1) (150,) (50, 1) (50,)


## Train and Test

In [29]:
# Create mrseql object
# use sax by default
ms = MrSEQLClassifier(seql_mode="clf")

# fit training data
ms.fit(X_train, y_train)

# prediction
predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))

Accuracy with mr-seql: 1.000


## Load data: Earthquakes
The earthquake classification problem involves predicting whether a major event is about to occur based on the most recent readings in the surrounding area.[Details here](http://www.timeseriesclassification.com/description.php?Dataset=Earthquakes)

In [4]:
X, y = load_UCR_UEA_dataset("Earthquakes", return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(345, 1) (345,) (116, 1) (116,)


## Train and Test

In [5]:
# Create mrseql object
# use sax by default
ms = MrSEQLClassifier(seql_mode="clf")

# fit training data
ms.fit(X_train, y_train)

# prediction
predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))

Accuracy with mr-seql: 0.767


## Load data: JapaneseVowels
The dataset includes 9 Japanese-male speakers recorded saying the vowels 'a' and 'e'. The classification task is to predict the speaker. [Details here](http://www.timeseriesclassification.com/description.php?Dataset=JapaneseVowels)

In [6]:
X, y = load_UCR_UEA_dataset("JapaneseVowels", return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(405, 12) (405,) (135, 12) (135,)


## Train and Test

In [7]:
# Create mrseql object
# use sax by default
ms = MrSEQLClassifier(seql_mode="clf")
# use sfa representations
# ms = MrSEQLClassifier(seql_mode='fs', symrep=['sfa'])
# use sax and sfa representations
# ms = MrSEQLClassifier(seql_mode='fs', symrep=['sax', 'sfa'])

# fit training data
ms.fit(X_train, y_train)

# prediction
predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))

Accuracy with mr-seql: 0.956


## Load data: Yoga
The dataset involves two actors transiting between yoga poses in front of a green screen and the learning task is to classify between one actor (male) and another (female). [Details here](http://www.timeseriesclassification.com/description.php?Dataset=Yoga)

In [8]:
X, y = load_UCR_UEA_dataset("Yoga", return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(2475, 1) (2475,) (825, 1) (825,)


## Train and Test

In [13]:
# Create mrseql object
# use sax by default
ms = MrSEQLClassifier(seql_mode="clf")

# fit training data
ms.fit(X_train, y_train)

# prediction
predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))

Accuracy with mr-seql: 0.942
