# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_basic_motions
from sktime.datasets import load_arrow_head
from sktime.transformers.series_as_features.summarize import \
    TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
17,0 -2.1788 1 -2.1751 2 -2.1550 3 ...
57,0 -1.8031 1 -1.8010 2 -1.7880 3 ...
62,0 -1.9471 1 -1.9405 2 -1.9224 3 ...
91,0 -2.0374 1 -2.0400 2 -2.0374 3 ...
37,0 -2.0220 1 -2.0166 2 -2.0074 3 ...


In [5]:
# binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.08s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-5e-05,250.00035,0.351749,0.004856,-0.000231,-0.05782,...,0.08151,0.08151,0.138673,0.250609,1.340724,1.568692,2.482612,3.225589,3.78913,4.198932
1,0.0,0.0,0.0,1.0,-0.000191,249.999931,0.331637,0.005117,-0.000176,0.096789,...,0.08151,0.08151,0.092513,0.173767,1.228095,1.488603,2.299505,3.039764,3.599489,4.009416
2,0.0,0.0,0.0,1.0,0.000305,249.999405,0.340806,0.005083,-1.7e-05,-0.20715,...,0.08151,0.08151,0.127671,0.208796,1.236015,1.44211,2.19867,2.839174,3.320156,3.675849
3,0.0,0.0,0.0,1.0,2.6e-05,249.99903,0.373889,0.005229,-9.5e-05,-0.14189,...,0.046288,0.092513,0.127671,0.138673,0.81049,1.558589,2.46968,3.259965,3.867521,4.284652
4,0.0,0.0,0.0,1.0,-4.3e-05,250.000079,0.355106,0.003685,-5.5e-05,-0.054243,...,0.08151,0.08151,0.092513,0.173767,1.245888,1.478945,2.291585,2.962568,3.538545,4.000477


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.07s/it]
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.47it/s]


0.8867924528301887

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
# multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
3,0 0.289855 1 0.289855 2 -0.669185 3...,0 0.284130 1 0.284130 2 -0.210466 3...,0 0.213680 1 0.213680 2 0.252267 3...,0 -0.314278 1 -0.314278 2 0.018644 3...,0 0.074574 1 0.074574 2 0.007990 3...,0 -0.079901 1 -0.079901 2 0.237040 3...
16,0 1.370472 1 1.370472 2 8.98811...,0 -1.054298 1 -1.054298 2 7.71701...,0 -0.451409 1 -0.451409 2 -6.073897 3...,0 -0.306288 1 -0.306288 2 0.458100 3...,0 -0.423476 1 -0.423476 2 0.761725 3...,0 0.292971 1 0.292971 2 2.159995 3...
25,0 -0.044205 1 -0.044205 2 -0.878387 3...,0 -0.496912 1 -0.496912 2 -1.725143 3...,0 -0.428723 1 -0.428723 2 1.558894 3...,0 0.620566 1 0.620566 2 0.082565 3...,0 0.229050 1 0.229050 2 0.098545 3...,0 0.649863 1 0.649863 2 -0.191763 3...
30,0 -0.623875 1 -0.623875 2 -1.081529 3...,0 -2.123436 1 -2.123436 2 -0.121519 3...,0 -0.513654 1 -0.513654 2 0.809464 3...,0 -0.143822 1 -0.143822 2 -1.081329 3...,0 0.058594 1 0.058594 2 -0.127842 3...,0 1.086656 1 1.086656 2 0.066584 3...
10,0 0.300413 1 0.300413 2 -1.96499...,0 0.727580 1 0.727580 2 -0.30055...,0 0.878731 1 0.878731 2 -1.226914 3...,0 -0.082565 1 -0.082565 2 -0.631219 3...,0 -0.055931 1 -0.055931 2 0.039951 3...,0 0.668507 1 0.668507 2 0.130505 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

Feature Extraction: 100%|██████████| 5/5 [00:17<00:00,  3.58s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-13.702903,6.150112,0.197931,-0.005907,-0.001008,-0.139846,...,0.223718,0.329286,0.481199,0.810494,2.345604,1.687964,2.744059,3.583255,4.040172,4.296197
1,1.0,0.0,0.0,1.0,525.281957,16841.431717,9.634983,0.122669,0.091556,10.755665,...,0.096509,0.096509,0.26116,0.26116,1.629072,1.569105,2.571916,3.406333,4.023954,4.346007
2,1.0,0.0,0.0,1.0,63.89783,193.996354,0.993834,0.007989,-0.001804,0.348779,...,0.165443,0.192626,0.288342,0.288342,1.250512,1.642304,2.651924,3.509557,4.070391,4.3692
3,1.0,0.0,0.0,1.0,328.868616,4402.264342,3.723417,0.0222,0.005257,1.670312,...,0.494918,0.835471,1.36354,1.959055,3.383226,1.750982,2.856852,3.800627,4.312957,4.463989
4,1.0,0.0,1.0,1.0,419.211878,15733.291175,8.393145,-0.141187,-0.03583,10.172421,...,0.165443,0.165443,0.192626,0.288342,1.609035,1.608532,2.688711,3.497213,3.988532,4.240084
