# Univariate time series classification with sktime

In this notebook, we will use `sktime` for univariate time series classification. You can find an example notebook for multivariate time series classification [here]((https://github.com/alan-turing-institute/sktime/blob/main/examples/multivariate_time_series_classification.ipynb)).

## Preliminaries

Import classes and functions from `sktime`, which extend the `sklearn` interface to the time series classification setting.

In [1]:
from sktime.highlevel.tasks import TSCTask
from sktime.highlevel.strategies import TSCStrategy

from sktime.transformations.compose import RowwiseTransformer
from sktime.transformations.compose import ColumnTransformer
from sktime.transformations.compose import Tabulariser
from sktime.transformations.segment import RandomIntervalSegmenter

from sktime.pipeline import Pipeline
from sktime.pipeline import FeatureUnion
from sktime.classifiers.distance_based import ProximityForest 

from sktime.classifiers.compose import TimeSeriesForestClassifier
from sktime.classifiers.distance_based import KNeighborsTimeSeriesClassifier

from sktime.datasets import load_gunpoint
from sktime.utils.time_series import time_series_slope

from statsmodels.tsa.stattools import acf
from statsmodels.tsa.ar_model import AR

from sklearn.preprocessing import FunctionTransformer
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

import numpy as np
import pandas as pd

## Load data
You can find more information on the dataset in the docstring of the loading function. 

In [2]:
X_train, y_train = load_gunpoint(split='TRAIN', return_X_y=True)
X_test, y_test = load_gunpoint(split='TEST', return_X_y=True)

Throughout `sktime`, the expected data format is a pandas DataFrame, in which single columns can contain not only primitives as for the classification labels, but also pandas Series and numpy arrays as for the time series observations. 

In [3]:
# univariate time series input data
X_train.head()

Unnamed: 0,dim_0
0,0 -0.64789 1 -0.64199 2 -0.63819 3...
1,0 -0.64443 1 -0.64540 2 -0.64706 3...
2,0 -0.77835 1 -0.77828 2 -0.77715 3...
3,0 -0.75006 1 -0.74810 2 -0.74616 3...
4,0 -0.59954 1 -0.59742 2 -0.59927 3...


In [4]:
# binary target variable
np.unique(y_train)

array(['1', '2'], dtype='<U1')

## Low level interface

### K-nearest-neighbours classifier for time series
For time series, the most popular k-nearest-neighbours algorithm is based on dynamic time warping (dtw) distance measure. 

In [5]:
knn = KNeighborsTimeSeriesClassifier(metric='dtw')
knn.fit(X_train, y_train)
knn.score(X_test, y_test)

0.9066666666666666

### Fully modular time-series forest classifier (TSF)

We can specify the time-series tree classifier as a fully modular pipeline using series-to-primitive feature extraction transformers and a final decision tree classifier.

In [6]:
steps = [
    ('segment', RandomIntervalSegmenter(n_intervals='sqrt')),
    ('_transform', FeatureUnion([
        ('mean', RowwiseTransformer(FunctionTransformer(func=np.mean, validate=False))),
        ('std', RowwiseTransformer(FunctionTransformer(func=np.std, validate=False))),
        ('slope', RowwiseTransformer(FunctionTransformer(func=time_series_slope, validate=False)))
    ])),
    ('clf', DecisionTreeClassifier())
]
base_estimator = Pipeline(steps, random_state=1)

We can direclty fit and evaluate the single tree, which itself is simply a pipeline.

In [7]:
base_estimator.fit(X_train, y_train)
base_estimator.score(X_test, y_test)

0.8266666666666667

For time series forest, we can simply use the single tree as the base estimator in the forest ensemble.

In [8]:
tsf = TimeSeriesForestClassifier(base_estimator=base_estimator, 
                                 n_estimators=100,
                                 criterion='entropy',
                                 bootstrap=True, 
                                 oob_score=True, 
                                 random_state=1)

Fit and obtain the out-of-bag score:

In [9]:
tsf.fit(X_train, y_train)
if tsf.oob_score:
    print(tsf.oob_score_)

1.0


In [10]:
tsf.score(X_test, y_test)

0.9533333333333334

### RISE

Another popular variant of time series forest is the so-called Random Interval Spectral Ensemble (RISE), which makes use of several series-to-series feature extraction transformers, including:

* Fitted auto-regressive coefficients,  
* Estimated autocorrelation coefficients,
* Power spectrum coefficients.

In [11]:
def ar_coefs(x, maxlag=100):
    x = np.asarray(x).ravel()
    nlags = np.minimum(len(x) - 1, maxlag)
    model = AR(endog=x) 
    return model.fit(maxlag=nlags, trend="nc").params.ravel()

def acf_coefs(x, maxlag=100):
    x = np.asarray(x).ravel() 
    nlags = np.minimum(len(x) - 1, maxlag)
    return acf(x, nlags=nlags).ravel()

def powerspectrum(x, **kwargs):
    x = np.asarray(x).ravel()
    fft = np.fft.fft(x)
    ps = fft.real * fft.real + fft.imag * fft.imag
    return ps[:ps.shape[0] // 2].ravel()

The full pipeline of a single tree in RISE is then specified as follows:

In [12]:
steps = [
    ('segment', RandomIntervalSegmenter(n_intervals=1, min_length=5)),
    ('_transform', FeatureUnion([
        ('ar', RowwiseTransformer(FunctionTransformer(func=ar_coefs, validate=False))),
        ('acf', RowwiseTransformer(FunctionTransformer(func=acf_coefs, validate=False))),
        ('ps', RowwiseTransformer(FunctionTransformer(func=powerspectrum, validate=False)))
    ])),
    ('tabularise', Tabulariser()),
    ('clf', DecisionTreeClassifier())
]
base_estimator = Pipeline(steps)

In [13]:
rise = TimeSeriesForestClassifier(base_estimator=base_estimator,
                                  n_estimators=50, 
                                  bootstrap=True,
                                  oob_score=True)

In [14]:
rise.fit(X_train, y_train)
if rise.oob_score:
    print(rise.oob_score_)

0.92


In [15]:
rise.score(X_test, y_test)

0.9733333333333334

### Proximity Forest (PF)
A variant of Elastic Ensemble (EE) is Proximity Forest (PF), which promises to be more scalable and faster than EE by utilitising trees.

A proximity forest consists of 3 main components:

* A **proximity stump** (PS) is simply a 1-nearest-neighbour classifier which uses *n* exemplar instances picked from the train set (_n_ is usually the number of classes in the problem, with one exemplar picked per class). A PS has a distance measure and accompanying parameters to find the proximity of each instance in the test set to the exemplars. The closest exemplar's class label is used as the prediction.

* A **proximity tree** (PT) is a classic decision tree, but uses a PS at each node to define the split of train instances among exemplar instances. A sub-PT is constructed for each exemplar instance and trained on the closest instances. This continues until reaching leaf status (pure by default).

* A **proximity forest** (PF) is an ensemble of PT, using majority voting to predict class labels. 

The pipeline of a proximity forest is as follows:

In [16]:
pf = ProximityForest(n_trees=10)
pf.fit(X_train, y_train)
pf.score(X_test, y_test)

0.9266666666666666

## High level interface 

The high level create a unified interface between different but related time series methods, while still closely following the `sklearn` estimator design whenever possible. On the high level, two new classes are introduced: 

* A *task*, which encapsulates the information about the learning task, for example the name of the target variable, and any additional necessary instructions on how to run fit and predict.

* A *strategy* which wraps the low level estimators and takes a task and the whole dataframe as input in fit. 

In [17]:
train = load_gunpoint(split='TRAIN')
test = load_gunpoint(split='TEST')

In [18]:
task = TSCTask(target='class_val', metadata=train)

In [19]:
clf = TimeSeriesForestClassifier(n_estimators=50)
strategy = TSCStrategy(clf)

* Fit using task and training data 
* Predict and evaluate fitted strategy on test data

In [20]:
strategy.fit(task, train)

y_pred = strategy.predict(test)
y_test = test[task.target]
accuracy_score(y_test, y_pred)

0.94