# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
60,0 -1.9674 1 -1.9672 2 -1.9512 3 ...
94,0 -1.8737 1 -1.8547 2 -1.8149 3 ...
154,0 -1.6106 1 -1.6097 2 -1.5854 3 ...
86,0 -1.8289 1 -1.8160 2 -1.8127 3 ...
40,0 -2.0324 1 -2.0386 2 -2.0182 3 ...


In [5]:
#  binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.03s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,1.0,0.0,1.0,0.000308,249.998592,0.350753,0.005208,-0.000144,-0.012606,...,0.08151,0.08151,0.08151,0.162765,1.272323,1.569282,2.442663,3.225317,3.860587,4.337462
1,0.0,0.0,0.0,1.0,-0.000241,250.000327,0.316645,0.005752,-0.000149,0.027237,...,0.08151,0.092513,0.092513,0.173767,1.079108,1.529463,2.374854,3.135582,3.67616,4.032813
2,0.0,0.0,0.0,1.0,-0.000391,250.000382,0.306603,0.005735,-0.000162,0.2969,...,0.046288,0.092513,0.092513,0.173767,1.082391,1.5131,2.353699,3.027695,3.479268,3.876186
3,0.0,0.0,0.0,1.0,5.2e-05,249.99962,0.342999,0.005383,-0.00027,0.013756,...,0.08151,0.092513,0.092513,0.138673,1.044844,1.470631,2.242107,2.932506,3.485394,3.892775
4,0.0,0.0,0.0,1.0,-0.000114,249.999518,0.340265,0.004327,-0.000129,0.010656,...,0.08151,0.08151,0.127671,0.138673,1.269559,1.504913,2.326319,2.975176,3.475012,3.882785


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.03s/it]
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.45it/s]


0.9622641509433962

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
#  multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
32,0 -0.179131 1 -0.179131 2 0.461767 3...,0 -1.108077 1 -1.108077 2 -1.187180 3...,0 0.012600 1 0.012600 2 2.360390 3...,0 0.066584 1 0.066584 2 -0.463427 3...,0 -0.095881 1 -0.095881 2 0.639209 3...,0 0.396843 1 0.396843 2 -0.383526 3...
18,0 0.951708 1 0.951708 2 6.22747...,0 -1.304853 1 -1.304853 2 -1.22245...,0 -0.944935 1 -0.944935 2 0.682350 3...,0 -0.386189 1 -0.386189 2 -0.346238 3...,0 0.308951 1 0.308951 2 0.298298 3...,0 0.098545 1 0.098545 2 -1.408924 3...
22,0 -0.697643 1 -0.697643 2 -0.199924 3...,0 -0.561693 1 -0.561693 2 -0.820724 3...,0 -0.950458 1 -0.950458 2 1.146612 3...,0 -1.158567 1 -1.158567 2 -0.479407 3...,0 0.727101 1 0.727101 2 -0.410159 3...,0 -1.376964 1 -1.376964 2 0.130505 3...
25,0 -0.185181 1 -0.185181 2 -1.319727 3...,0 0.059288 1 0.059288 2 -1.194247 3...,0 0.250270 1 0.250270 2 0.418052 3...,0 0.154476 1 0.154476 2 0.047941 3...,0 0.167792 1 0.167792 2 -0.215733 3...,0 0.732428 1 0.732428 2 -0.050604 3...
2,0 -0.663284 1 -0.663284 2 5.393924 3...,0 0.273010 1 0.273010 2 -3.079673 3...,0 -0.160963 1 -0.160963 2 -3.175911 3...,0 -0.245030 1 -0.245030 2 -6.408074 3...,0 -0.077238 1 -0.077238 2 0.471417 3...,0 -0.018644 1 -0.018644 2 -3.592890 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:18<00:00,  3.73s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,1.0,0.0,0.0,1.0,354.190525,5467.09735,3.747415,0.016783,0.0,1.083522,...,0.274921,0.319026,0.776909,1.333894,2.983513,1.695073,2.810517,3.745986,4.228978,4.442939
1,1.0,0.0,0.0,1.0,292.068012,11792.713884,8.246383,-0.139636,0.018494,6.285126,...,0.096509,0.096509,0.26116,0.26116,0.985953,1.623656,2.64476,3.475038,4.10673,4.395817
2,1.0,1.0,0.0,1.0,52.882361,185.780037,0.974373,0.01135,-0.00048,0.280859,...,0.096509,0.192626,0.192626,0.288342,0.97249,1.567849,2.563935,3.375331,3.896197,4.239776
3,1.0,0.0,0.0,1.0,54.45523,182.497205,0.921471,0.011501,0.006315,0.515937,...,0.096509,0.192626,0.26116,0.288342,1.148247,1.490467,2.391433,3.187255,3.750715,4.16307
4,0.0,0.0,0.0,1.0,0.819737,50.035273,0.43011,0.00443,-0.000655,-0.09129,...,0.223718,0.223718,0.223718,0.615767,2.553334,1.677763,2.710091,3.546291,4.015786,4.355188


## Using tsfresh for forecasting
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.

In [11]:
from sklearn.ensemble import RandomForestRegressor

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ReducedTimeSeriesRegressionForecaster
from sktime.forecasting.model_selection import temporal_train_test_split

y = load_airline()
y_train, y_test = temporal_train_test_split(y)

regressor = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
    RandomForestRegressor(),
)
forecaster = ReducedTimeSeriesRegressionForecaster(regressor, window_length=12)
forecaster.fit(y_train)

fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)