# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_basic_motions
from sktime.datasets import load_arrow_head
from sktime.transformers.series_as_features.summarize import \
    TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
17,0 -2.1788 1 -2.1751 2 -2.1550 3 ...
97,0 -2.1468 1 -2.1483 2 -2.1332 3 ...
135,0 -1.7096 1 -1.7071 2 -1.6632 3 ...
132,0 -1.8902 1 -1.9055 2 -1.8857 3 ...
36,0 -1.9298 1 -1.9371 2 -1.8988 3 ...


In [5]:
# binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  warn("Found non-unique index, replaced with unique index.")
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.10s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-5e-05,250.00035,0.351749,0.004856,-0.000231,-0.05782,...,0.08151,0.08151,0.138673,0.250609,1.340724,1.568692,2.482612,3.225589,3.78913,4.198932
1,0.0,0.0,0.0,1.0,0.000313,249.999331,0.325861,0.007624,-0.000147,0.46379,...,0.046288,0.046288,0.186791,0.289658,1.390145,1.558835,2.483325,3.384662,3.98668,4.434445
2,0.0,0.0,0.0,1.0,0.000442,249.999798,0.309698,0.005661,-9e-05,0.24615,...,0.046288,0.092513,0.092513,0.204643,1.075257,1.49018,2.326191,3.018421,3.560983,3.965161
3,0.0,0.0,0.0,1.0,0.000449,250.0001,0.330346,0.00603,-0.000182,0.27091,...,0.08151,0.092513,0.138673,0.219798,1.212223,1.531335,2.395672,3.163465,3.705489,4.086064
4,0.0,0.0,0.0,1.0,-3.2e-05,250.000016,0.343001,0.005626,-3.6e-05,-0.15239,...,0.08151,0.08151,0.127671,0.173767,1.207704,1.47394,2.257032,2.958696,3.540908,3.996657


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  warn("Found non-unique index, replaced with unique index.")
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.09s/it]
  warn("Found non-unique index, replaced with unique index.")
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.44it/s]


0.8301886792452831

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
# multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
14,0 -0.947424 1 -0.947424 2 14.53912...,0 0.572681 1 0.572681 2 -10.32130...,0 -0.529822 1 -0.529822 2 -4.144042 3...,0 -0.098545 1 -0.098545 2 2.138688 3...,0 0.596595 1 0.596595 2 -1.259775 3...,0 0.772378 1 0.772378 2 7.21774...
36,0 1.686827 1 1.686827 2 0.88247...,0 -3.375054 1 -3.375054 2 1.149305 3...,0 -1.295042 1 -1.295042 2 -0.97372...,0 -0.711121 1 -0.711121 2 -1.861697 3...,0 0.013317 1 0.013317 2 0.20508...,0 -0.207743 1 -0.207743 2 0.114525 3...
4,0 -0.123238 1 -0.123238 2 -0.249547 3...,0 0.379341 1 0.379341 2 0.541501 3...,0 -0.286006 1 -0.286006 2 0.208420 3...,0 -0.098545 1 -0.098545 2 -0.023970 3...,0 0.058594 1 0.058594 2 0.175783 3...,0 -0.074574 1 -0.074574 2 0.114525 3...
31,0 0.036607 1 0.036607 2 0.265778 3...,0 0.341686 1 0.341686 2 -0.164943 3...,0 -0.694948 1 -0.694948 2 -0.635560 3...,0 -0.253020 1 -0.253020 2 -0.354229 3...,0 -0.082565 1 -0.082565 2 -0.516694 3...,0 -0.090555 1 -0.090555 2 1.470182 3...
9,0 0.126160 1 0.126160 2 1.771871 3...,0 0.102733 1 0.102733 2 -3.798484 3...,0 0.308964 1 0.308964 2 0.141369 3...,0 0.002663 1 0.002663 2 -1.427568 3...,0 0.000000 1 0.000000 2 -0.167792 3...,0 -0.007990 1 -0.007990 2 -1.643301 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  warn("Found non-unique index, replaced with unique index.")
Feature Extraction: 100%|██████████| 5/5 [00:18<00:00,  3.74s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,1.0,0.0,0.0,1.0,640.744882,13000.226236,6.736276,0.110064,0.0,10.510441,...,0.096509,0.096509,0.192626,0.288342,1.408967,1.620571,2.642871,3.39133,3.96916,4.281449
1,1.0,1.0,0.0,1.0,515.268898,8211.351444,4.744957,0.144781,0.051054,2.82904,...,0.320753,0.613085,1.096637,1.736308,3.383226,1.720558,2.921829,3.861208,4.364173,4.513799
2,0.0,1.0,0.0,1.0,-26.674802,8.912128,0.11776,-0.00223,-0.00093,-0.272428,...,0.165443,0.165443,0.288342,0.288342,1.485889,1.62259,2.639934,3.563248,4.150507,4.434494
3,1.0,0.0,0.0,1.0,409.281059,5923.622075,3.554568,0.046896,0.002163,2.145581,...,0.274921,0.443757,0.80654,1.302333,3.102853,1.712262,2.887057,3.921602,4.417036,4.528547
4,0.0,0.0,0.0,1.0,-19.802918,9.735453,0.15356,-0.003656,-0.000147,-0.248964,...,0.567657,0.866986,1.222843,1.690099,3.119424,1.58541,2.516382,3.382925,3.975397,4.316511
