# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_basic_motions
from sktime.datasets import load_arrow_head
from sktime.transformers.series_as_features.summarize import \
    TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
104,0 -1.9698 1 -1.9818 2 -1.9434 3 ...
0,0 -1.9078 1 -1.9049 2 -1.8886 3 ...
73,0 -1.8132 1 -1.8255 2 -1.8166 3 ...
13,0 -2.1395 1 -2.1189 2 -2.1044 3 ...
27,0 -2.5471 1 -2.5494 2 -2.4694 3 ...


In [5]:
# binary classification task
np.unique(y_train)

array([&#39;0&#39;, &#39;1&#39;, &#39;2&#39;], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

Feature Extraction: 100%|██████████| 5/5 [00:20&lt;00:00,  4.06s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,7.9e-05,250.00203,0.337407,0.007011,-0.000196,0.042356,...,0.08151,0.092513,0.092513,0.204643,1.197663,1.509541,2.380924,3.141125,3.695552,4.128537
1,0.0,0.0,0.0,1.0,-0.000602,249.998915,0.341486,0.004775,-5.3e-05,-0.1087,...,0.08151,0.08151,0.127671,0.208796,1.26222,1.471214,2.266157,2.958169,3.492229,3.922921
2,0.0,0.0,0.0,1.0,1.5e-05,249.999269,0.352382,0.0057,-1.3e-05,-0.020305,...,0.08151,0.092513,0.092513,0.204643,1.224637,1.488539,2.241789,2.913085,3.391484,3.778937
3,0.0,1.0,0.0,1.0,-8e-06,249.999228,0.331796,0.004104,-4.3e-05,0.15336,...,0.08151,0.127671,0.173767,0.265764,1.250099,1.554067,2.472764,3.264834,3.89171,4.357888
4,0.0,0.0,0.0,1.0,-0.000113,250.000974,0.34324,0.004114,-0.00018,0.44982,...,0.112516,0.158612,0.239606,0.366101,2.033926,1.582302,2.572386,3.49479,4.177374,4.620139


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

Feature Extraction: 100%|██████████| 5/5 [00:19&lt;00:00,  3.93s/it]
Feature Extraction: 100%|██████████| 5/5 [00:07&lt;00:00,  1.59s/it]


0.7735849056603774

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
# multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
9,0 0.126160 1 0.126160 2 1.771871 3...,0 0.102733 1 0.102733 2 -3.798484 3...,0 0.308964 1 0.308964 2 0.141369 3...,0 0.002663 1 0.002663 2 -1.427568 3...,0 0.000000 1 0.000000 2 -0.167792 3...,0 -0.007990 1 -0.007990 2 -1.643301 3...
12,0 2.221946 1 2.221946 2 -7.70417...,0 -0.783638 1 -0.783638 2 -4.56992...,0 0.142401 1 0.142401 2 2.447367 3...,0 0.055931 1 0.055931 2 -0.442120 3...,0 0.071911 1 0.071911 2 0.010653 3...,0 0.226387 1 0.226387 2 -1.978886 3...
12,0 0.841063 1 0.841063 2 -7.98006...,0 -0.711477 1 -0.711477 2 -10.17192...,0 -0.070385 1 -0.070385 2 3.868311 3...,0 0.314278 1 0.314278 2 1.483499 3...,0 0.055931 1 0.055931 2 -0.559308 3...,0 0.926854 1 0.926854 2 -1.297062 3...
21,0 0.648833 1 0.648833 2 0.076985 3...,0 -0.996722 1 -0.996722 2 -0.897264 3...,0 -0.644136 1 -0.644136 2 0.970515 3...,0 -0.101208 1 -0.101208 2 -0.407496 3...,0 0.055931 1 0.055931 2 -0.157139 3...,0 -0.031960 1 -0.031960 2 -0.343575 3...
35,0 1.102297 1 1.102297 2 0.73238...,0 -1.790773 1 -1.790773 2 0.661191 3...,0 0.001413 1 0.001413 2 -1.57956...,0 0.258347 1 0.258347 2 -0.127842 3...,0 -0.165129 1 -0.165129 2 -0.16779...,0 0.516694 1 0.516694 2 -0.58860...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

Feature Extraction: 100%|██████████| 5/5 [00:32&lt;00:00,  6.57s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-19.802918,9.735453,0.15356,-0.003656,-0.000147,-0.248964,...,0.567657,0.866986,1.222843,1.690099,3.119424,1.58541,2.516382,3.382925,3.975397,4.316511
1,1.0,0.0,0.0,1.0,325.639063,10701.446629,7.666626,0.050743,-0.010312,7.955648,...,0.096509,0.096509,0.192626,0.288342,1.745525,1.586272,2.622613,3.452515,3.979871,4.30733
2,1.0,0.0,0.0,1.0,405.510867,12978.929156,8.200125,-0.142016,-0.091939,8.420331,...,0.165443,0.165443,0.192626,0.288342,0.853617,1.555516,2.517397,3.302638,3.867741,4.202144
3,1.0,1.0,0.0,1.0,57.045746,172.027276,0.807892,0.001584,0.003131,0.4221,...,0.165443,0.165443,0.165443,0.165443,1.241657,1.494736,2.333086,3.047524,3.577109,3.928619
4,1.0,1.0,0.0,1.0,517.444434,8664.94077,3.280944,0.14933,0.06651,2.17525,...,0.320753,0.493681,0.862575,1.428808,3.190711,1.725114,2.887194,3.883615,4.34765,4.463989
