# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformers.panel.tsfresh import TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
140,0 -1.6633 1 -1.6207 2 -1.5787 3 ...
37,0 -2.0220 1 -2.0166 2 -2.0074 3 ...
62,0 -1.9471 1 -1.9405 2 -1.9224 3 ...
8,0 -1.7170 1 -1.7281 2 -1.6833 3 ...
60,0 -1.9674 1 -1.9672 2 -1.9512 3 ...


In [5]:
#  binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.05s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-1.1e-05,249.998751,0.302434,0.005708,-0.000204,0.18794,...,0.046288,0.092513,0.092513,0.138673,0.91581,1.550064,2.430002,3.216936,3.762036,4.147996
1,0.0,0.0,0.0,1.0,-4.3e-05,250.000079,0.355106,0.003685,-5.5e-05,-0.054243,...,0.08151,0.08151,0.092513,0.173767,1.245888,1.478945,2.291585,2.962568,3.538545,4.000477
2,0.0,0.0,0.0,1.0,0.000305,249.999405,0.340806,0.005083,-1.7e-05,-0.20715,...,0.08151,0.08151,0.127671,0.208796,1.236015,1.44211,2.19867,2.839174,3.320156,3.675849
3,0.0,0.0,0.0,1.0,-4.9e-05,250.000242,0.305713,0.004894,-0.000134,0.0971,...,0.08151,0.092513,0.092513,0.173767,1.159577,1.450579,2.154047,2.717186,3.115631,3.454651
4,0.0,1.0,0.0,1.0,0.000308,249.998592,0.350753,0.005208,-0.000144,-0.012606,...,0.08151,0.08151,0.08151,0.162765,1.272323,1.569282,2.442663,3.225317,3.860587,4.337462


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.04s/it]
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.48it/s]


0.8301886792452831

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
#  multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
3,0 0.289855 1 0.289855 2 -0.669185 3...,0 0.284130 1 0.284130 2 -0.210466 3...,0 0.213680 1 0.213680 2 0.252267 3...,0 -0.314278 1 -0.314278 2 0.018644 3...,0 0.074574 1 0.074574 2 0.007990 3...,0 -0.079901 1 -0.079901 2 0.237040 3...
13,0 2.580342 1 2.580342 2 -7.26891...,0 -0.850954 1 -0.850954 2 -6.06223...,0 -0.150030 1 -0.150030 2 0.96421...,0 -0.005327 1 -0.005327 2 0.002663 3...,0 0.050604 1 0.050604 2 -0.364882 3...,0 0.311615 1 0.311615 2 -0.772378 3...
4,0 0.354481 1 0.354481 2 0.449142 3...,0 -0.567671 1 -0.567671 2 -1.899854 3...,0 -0.084270 1 -0.084270 2 0.913056 3...,0 -0.223723 1 -0.223723 2 0.692477 3...,0 -0.247694 1 -0.247694 2 0.149149 3...,0 0.050604 1 0.050604 2 0.849616 3...
6,0 1.236069 1 1.236069 2 0.118106 3...,0 -0.569532 1 -0.569532 2 0.264725 3...,0 1.536733 1 1.536733 2 0.712643 3...,0 0.143822 1 0.143822 2 -2.018837 3...,0 0.061258 1 0.061258 2 -0.111862 3...,0 0.905547 1 0.905547 2 -0.990775 3...
9,0 -0.407421 1 -0.407421 2 2.355158 3...,0 1.413374 1 1.413374 2 -3.928032 3...,0 0.092782 1 0.092782 2 -0.211622 3...,0 -0.066584 1 -0.066584 2 -3.630177 3...,0 0.223723 1 0.223723 2 -0.026634 3...,0 0.135832 1 0.135832 2 -1.946925 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:17<00:00,  3.56s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-13.702903,6.150112,0.197931,-0.005907,-0.001008,-0.139846,...,0.223718,0.329286,0.481199,0.810494,2.345604,1.687964,2.744059,3.583255,4.040172,4.296197
1,1.0,0.0,0.0,1.0,321.402722,10764.169856,6.780527,0.136657,0.091813,7.22729,...,0.165443,0.165443,0.192626,0.192626,1.339437,1.556814,2.559083,3.37957,3.95365,4.375502
2,0.0,0.0,1.0,1.0,-10.372383,8.82363,0.24117,-0.004874,0.000593,-0.160754,...,0.096509,0.26116,0.288342,0.288342,1.646945,1.552473,2.522931,3.325329,3.896197,4.285064
3,0.0,1.0,0.0,1.0,2.755709,11.439311,0.21202,-0.015356,0.000324,-0.100108,...,0.223718,0.26116,0.26116,0.26116,1.544731,1.692198,2.744708,3.654497,4.190708,4.419746
4,0.0,0.0,0.0,1.0,-9.960058,12.860153,0.227847,0.001722,8e-06,-0.172619,...,0.434431,0.700274,1.038413,1.405987,2.477663,1.63652,2.641136,3.39252,3.925077,4.281449


## Using tsfresh for forecasting
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.

In [11]:
from sklearn.ensemble import RandomForestRegressor

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ReducedTimeSeriesRegressionForecaster
from sktime.forecasting.model_selection import temporal_train_test_split

y = load_airline()
y_train, y_test = temporal_train_test_split(y)

regressor = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
    RandomForestRegressor(),
)
forecaster = ReducedTimeSeriesRegressionForecaster(regressor, window_length=12)
forecaster.fit(y_train)

fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)