# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_basic_motions
from sktime.datasets import load_arrow_head
from sktime.transformers.series_as_features.summarize import \
    TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
6,0 -1.8538 1 -1.8492 2 -1.8341 3 ...
79,0 -2.0399 1 -2.0382 2 -2.0384 3 ...
68,0 -1.9245 1 -1.9210 2 -1.9066 3 ...
98,0 -2.0262 1 -2.0080 2 -1.9726 3 ...
15,0 -2.1645 1 -2.1785 2 -2.0660 3 ...


In [5]:
# binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  warn("Found non-unique index, replaced with unique index.")
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.18s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-0.000287,250.000456,0.337567,0.004818,-0.000133,-0.082664,...,0.08151,0.08151,0.08151,0.204643,1.26752,1.508455,2.337061,3.055206,3.629753,4.083496
1,0.0,0.0,0.0,0.0,-0.000137,250.000586,0.388643,0.005909,-5.2e-05,0.005708,...,0.08151,0.092513,0.138673,0.219798,1.164315,1.648669,2.733544,3.672073,4.261213,4.613355
2,0.0,1.0,0.0,1.0,0.00022,249.99969,0.347404,0.004419,-4.9e-05,-0.19532,...,0.08151,0.08151,0.138673,0.219798,1.251029,1.522701,2.303313,2.951307,3.455873,3.889778
3,0.0,0.0,0.0,1.0,-0.000471,250.000159,0.331807,0.006407,-0.000155,0.12214,...,0.08151,0.092513,0.092513,0.204643,1.212645,1.539104,2.470422,3.332919,3.910968,4.279043
4,0.0,0.0,0.0,1.0,0.000113,250.000147,0.363012,0.005342,0.000134,0.21519,...,0.08151,0.092513,0.092513,0.138673,0.920937,1.549348,2.467173,3.293513,3.905186,4.35784


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  warn("Found non-unique index, replaced with unique index.")
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.10s/it]
  warn("Found non-unique index, replaced with unique index.")
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.47it/s]


0.8490566037735849

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
# multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
6,0 1.236069 1 1.236069 2 0.118106 3...,0 -0.569532 1 -0.569532 2 0.264725 3...,0 1.536733 1 1.536733 2 0.712643 3...,0 0.143822 1 0.143822 2 -2.018837 3...,0 0.061258 1 0.061258 2 -0.111862 3...,0 0.905547 1 0.905547 2 -0.990775 3...
0,0 0.079106 1 0.079106 2 -0.903497 3...,0 0.394032 1 0.394032 2 -3.666397 3...,0 0.551444 1 0.551444 2 -0.282844 3...,0 0.351565 1 0.351565 2 -0.095881 3...,0 0.023970 1 0.023970 2 -0.319605 3...,0 0.633883 1 0.633883 2 0.972131 3...
32,0 -0.592124 1 -0.592124 2 0.33036...,0 -0.392740 1 -0.392740 2 0.14477...,0 -1.411327 1 -1.411327 2 -0.98216...,0 -0.306288 1 -0.306288 2 -0.133169 3...,0 0.354229 1 0.354229 2 0.221060 3...,0 -0.143822 1 -0.143822 2 0.213070 3...
38,0 -2.178746 1 -2.178746 2 -0.448056 3...,0 -0.385371 1 -0.385371 2 -2.08943...,0 -0.805837 1 -0.805837 2 1.04617...,0 -0.039951 1 -0.039951 2 1.946925 3...,0 0.484734 1 0.484734 2 -0.524684 3...,0 1.054696 1 1.054696 2 2.436986 3...
29,0 0.118553 1 0.118553 2 -0.545332 3...,0 0.419456 1 0.419456 2 0.371223 3...,0 -0.283447 1 -0.283447 2 0.707172 3...,0 0.135832 1 0.135832 2 0.159802 3...,0 -0.079901 1 -0.079901 2 -0.090555 3...,0 0.050604 1 0.050604 2 0.474080 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  warn("Found non-unique index, replaced with unique index.")
Feature Extraction: 100%|██████████| 5/5 [00:19<00:00,  3.97s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,0.0,1.0,0.0,1.0,2.755709,11.439311,0.21202,-0.015356,0.000324,-0.100108,...,0.223718,0.26116,0.26116,0.26116,1.544731,1.692198,2.744708,3.654497,4.190708,4.419746
1,0.0,0.0,0.0,1.0,-8.618429,10.629914,0.229193,-0.002871,-6.1e-05,-0.164268,...,0.320753,0.647776,1.124025,1.459587,3.130035,1.599592,2.614086,3.516918,4.094068,4.316511
2,1.0,1.0,0.0,1.0,335.029358,5296.984407,3.557896,0.001664,0.0,0.858254,...,0.165443,0.329286,0.620218,1.236372,3.034594,1.687628,2.778014,3.711684,4.170608,4.369936
3,1.0,0.0,0.0,1.0,250.59917,4083.098033,3.585311,0.062136,0.0,0.733762,...,0.165443,0.26116,0.509247,0.717834,2.532652,1.656582,2.730943,3.719318,4.280195,4.528547
4,1.0,0.0,0.0,1.0,75.777011,232.319298,1.388404,0.013368,0.0,0.648085,...,0.096509,0.192626,0.192626,0.288342,1.156045,1.592283,2.550087,3.310667,3.823423,4.16513
