# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformers.panel.tsfresh import TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
13,0 -2.1395 1 -2.1189 2 -2.1044 3 ...
148,0 -1.7022 1 -1.6888 2 -1.6789 3 ...
24,0 -2.1448 1 -2.1654 2 -2.1562 3 ...
146,0 -1.5894 1 -1.6010 2 -1.5789 3 ...
6,0 -1.8538 1 -1.8492 2 -1.8341 3 ...


In [5]:
#  binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:12<00:00,  2.43s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,1.0,0.0,1.0,-8e-06,249.999228,0.331796,0.004104,-4.3e-05,0.15336,...,0.08151,0.127671,0.173767,0.265764,1.250099,1.554067,2.472764,3.264834,3.89171,4.357888
1,0.0,1.0,0.0,1.0,-0.000207,249.99917,0.315776,0.005196,-9.1e-05,0.17458,...,0.08151,0.092513,0.138673,0.219798,1.19182,1.422543,2.147471,2.732982,3.166641,3.509018
2,0.0,0.0,0.0,1.0,-0.000367,249.999548,0.36715,0.005578,0.00023,0.06397,...,0.08151,0.08151,0.092513,0.173767,1.176942,1.4866,2.270286,3.003073,3.581199,4.017706
3,0.0,0.0,0.0,1.0,0.000135,250.000423,0.305118,0.006031,-0.000189,0.32476,...,0.046288,0.046288,0.092513,0.173767,0.966537,1.503879,2.320062,2.957126,3.471799,3.869972
4,0.0,0.0,0.0,1.0,-0.000287,250.000456,0.337567,0.004818,-0.000133,-0.082664,...,0.08151,0.08151,0.08151,0.204643,1.26752,1.508455,2.337061,3.055206,3.629753,4.083496


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.18s/it]
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.40it/s]


0.9056603773584906

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
#  multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
14,0 -0.947424 1 -0.947424 2 14.53912...,0 0.572681 1 0.572681 2 -10.32130...,0 -0.529822 1 -0.529822 2 -4.144042 3...,0 -0.098545 1 -0.098545 2 2.138688 3...,0 0.596595 1 0.596595 2 -1.259775 3...,0 0.772378 1 0.772378 2 7.21774...
7,0 -0.352746 1 -0.352746 2 -1.354561 3...,0 0.316845 1 0.316845 2 0.490525 3...,0 -0.473779 1 -0.473779 2 1.454261 3...,0 -0.327595 1 -0.327595 2 -0.269001 3...,0 0.106535 1 0.106535 2 0.021307 3...,0 0.197090 1 0.197090 2 0.460763 3...
26,0 -0.098166 1 -0.098166 2 -0.665304 3...,0 -0.117578 1 -0.117578 2 -1.194660 3...,0 -0.401143 1 -0.401143 2 1.228442 3...,0 -0.061258 1 -0.061258 2 -0.567298 3...,0 0.090555 1 0.090555 2 0.029297 3...,0 0.018644 1 0.018644 2 -0.005327 3...
20,0 -0.294498 1 -0.294498 2 -0.050044 3...,0 0.540218 1 0.540218 2 -0.515245 3...,0 0.218114 1 0.218114 2 -0.301108 3...,0 -0.045277 1 -0.045277 2 0.103872 3...,0 -0.002663 1 -0.002663 2 -0.183773 3...,0 0.031960 1 0.031960 2 0.037287 3...
12,0 2.221946 1 2.221946 2 -7.70417...,0 -0.783638 1 -0.783638 2 -4.56992...,0 0.142401 1 0.142401 2 2.447367 3...,0 0.055931 1 0.055931 2 -0.442120 3...,0 0.071911 1 0.071911 2 0.010653 3...,0 0.226387 1 0.226387 2 -1.978886 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:23<00:00,  4.77s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,1.0,0.0,0.0,1.0,640.744882,13000.226236,6.736276,0.110064,0.0,10.510441,...,0.096509,0.096509,0.192626,0.288342,1.408967,1.620571,2.642871,3.39133,3.96916,4.281449
1,0.0,0.0,0.0,1.0,-17.42876,7.940863,0.177152,0.002326,-0.000244,-0.152038,...,0.223718,0.26116,0.26116,0.424177,1.889808,1.556425,2.42499,3.29674,3.888758,4.230903
2,1.0,0.0,0.0,1.0,46.178983,175.315925,1.085591,0.00809,0.003293,0.367424,...,0.096509,0.096509,0.192626,0.288342,1.225333,1.429212,2.268585,2.989103,3.56596,3.952548
3,0.0,0.0,0.0,1.0,33.334188,110.735119,0.86722,0.000639,0.001751,0.164096,...,0.165443,0.192626,0.192626,0.288342,1.14048,1.490159,2.315444,3.006914,3.540782,3.984732
4,1.0,0.0,0.0,1.0,325.639063,10701.446629,7.666626,0.050743,-0.010312,7.955648,...,0.096509,0.096509,0.192626,0.288342,1.745525,1.586272,2.622613,3.452515,3.979871,4.30733


## Using tsfresh for forecasting
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.

In [11]:
from sklearn.ensemble import RandomForestRegressor

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ReducedTimeSeriesRegressionForecaster
from sktime.forecasting.model_selection import temporal_train_test_split

y = load_airline()
y_train, y_test = temporal_train_test_split(y)

regressor = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
    RandomForestRegressor(),
)
forecaster = ReducedTimeSeriesRegressionForecaster(regressor, window_length=12)
forecaster.fit(y_train)

fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)