# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_basic_motions
from sktime.datasets import load_arrow_head
from sktime.transformers.series_as_features.summarize import \
    TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
32,0 -1.8955 1 -1.8963 2 -1.8802 3 ...
69,0 -1.7998 1 -1.7987 2 -1.7942 3 ...
14,0 -1.9134 1 -1.9116 2 -1.8902 3 ...
76,0 -1.8888 1 -1.8850 2 -1.8562 3 ...
127,0 -1.8034 1 -1.7962 2 -1.7728 3 ...


In [5]:
# binary classification task
np.unique(y_train)

array([&#39;0&#39;, &#39;1&#39;, &#39;2&#39;], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

Feature Extraction: 100%|██████████| 5/5 [00:11&lt;00:00,  2.26s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-8.2e-05,249.998916,0.353078,0.004516,-4.8e-05,-0.010786,...,0.08151,0.08151,0.08151,0.173767,1.329162,1.551664,2.362918,3.090174,3.636004,4.061991
1,0.0,0.0,0.0,1.0,-8e-05,249.998516,0.334229,0.004226,-0.0002,-0.024066,...,0.08151,0.08151,0.127671,0.138673,1.175797,1.574929,2.472788,3.211234,3.750249,4.129999
2,0.0,1.0,0.0,1.0,0.000866,250.000664,0.337399,0.005047,-0.000102,0.044839,...,0.08151,0.08151,0.127671,0.208796,1.306718,1.502383,2.317553,3.061353,3.656811,4.101524
3,0.0,1.0,0.0,1.0,8.9e-05,249.999746,0.334852,0.005419,-0.000184,-0.043601,...,0.08151,0.08151,0.092513,0.173767,1.100223,1.511217,2.341969,3.056166,3.629556,4.061251
4,0.0,1.0,0.0,1.0,0.000654,250.000867,0.312775,0.007564,-0.000128,0.21895,...,0.046288,0.092513,0.092513,0.204643,0.955431,1.472832,2.326372,3.071825,3.66905,4.116237


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

Feature Extraction: 100%|██████████| 5/5 [00:11&lt;00:00,  2.20s/it]
Feature Extraction: 100%|██████████| 5/5 [00:03&lt;00:00,  1.34it/s]


0.8867924528301887

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
# multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
1,0 0.377751 1 0.377751 2 2.952965 3...,0 -0.610850 1 -0.610850 2 0.970717 3...,0 -0.147376 1 -0.147376 2 -5.962515 3...,0 -0.103872 1 -0.103872 2 -7.593275 3...,0 -0.109198 1 -0.109198 2 -0.697804 3...,0 -0.037287 1 -0.037287 2 -2.865789 3...
38,0 -2.178746 1 -2.178746 2 -0.448056 3...,0 -0.385371 1 -0.385371 2 -2.08943...,0 -0.805837 1 -0.805837 2 1.04617...,0 -0.039951 1 -0.039951 2 1.946925 3...,0 0.484734 1 0.484734 2 -0.524684 3...,0 1.054696 1 1.054696 2 2.436986 3...
30,0 -0.623875 1 -0.623875 2 -1.081529 3...,0 -2.123436 1 -2.123436 2 -0.121519 3...,0 -0.513654 1 -0.513654 2 0.809464 3...,0 -0.143822 1 -0.143822 2 -1.081329 3...,0 0.058594 1 0.058594 2 -0.127842 3...,0 1.086656 1 1.086656 2 0.066584 3...
32,0 -0.179131 1 -0.179131 2 0.461767 3...,0 -1.108077 1 -1.108077 2 -1.187180 3...,0 0.012600 1 0.012600 2 2.360390 3...,0 0.066584 1 0.066584 2 -0.463427 3...,0 -0.095881 1 -0.095881 2 0.639209 3...,0 0.396843 1 0.396843 2 -0.383526 3...
36,0 -1.801504 1 -1.801504 2 -0.480725 3...,0 2.344990 1 2.344990 2 -0.994385 3...,0 0.281253 1 0.281253 2 0.378807 3...,0 0.716447 1 0.716447 2 -0.870923 3...,0 0.162466 1 0.162466 2 0.095881 3...,0 0.921527 1 0.921527 2 -0.474080 3...


In [None]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()