# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
2,0 -1.8660 1 -1.8420 2 -1.8350 3 ...
105,0 -1.6758 1 -1.6742 2 -1.6674 3 ...
143,0 -1.7677 1 -1.7506 2 -1.7444 3 ...
8,0 -2.0484 1 -2.0432 2 -1.9759 3 ...
159,0 -1.8235 1 -1.8376 2 -1.8274 3 ...


In [5]:
#  binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.18s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,0.000169,250.000681,0.313347,0.005425,-9.1e-05,0.072201,...,0.08151,0.08151,0.138673,0.250609,1.237113,1.499378,2.233276,2.801751,3.294982,3.70476
1,0.0,0.0,0.0,1.0,-7.7e-05,250.000532,0.313268,0.005533,-1.4e-05,0.010407,...,0.08151,0.092513,0.138673,0.138673,0.926719,1.454743,2.223152,2.831917,3.346462,3.758022
2,0.0,0.0,0.0,1.0,0.000139,250.000649,0.324567,0.004842,-0.000171,0.063572,...,0.08151,0.08151,0.193641,0.285506,1.383081,1.485566,2.227998,2.849109,3.286938,3.636564
3,0.0,0.0,0.0,1.0,0.000245,250.000404,0.415808,0.004581,0.000274,0.24673,...,0.046288,0.127671,0.127671,0.173767,1.248591,1.540039,2.343871,3.053631,3.634831,4.121959
4,0.0,0.0,0.0,1.0,-0.000304,249.999314,0.329316,0.006287,-0.000101,0.19846,...,0.08151,0.08151,0.138673,0.250609,1.332141,1.47421,2.301035,3.006794,3.545796,3.986441


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.17s/it]
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.38it/s]


0.7735849056603774

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
#  multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
6,0 1.275129 1 1.275129 2 -0.273185 3...,0 -1.024406 1 -1.024406 2 0.095152 3...,0 -0.545722 1 -0.545722 2 0.023203 3...,0 -0.463427 1 -0.463427 2 0.042614 3...,0 -0.367545 1 -0.367545 2 -0.109198 3...,0 -0.159802 1 -0.159802 2 0.183773 3...
17,0 0.324449 1 0.324449 2 9.29442...,0 -0.977516 1 -0.977516 2 -6.96322...,0 -1.260218 1 -1.260218 2 -2.498493 3...,0 -0.788358 1 -0.788358 2 2.434323 3...,0 0.316941 1 0.316941 2 -0.079901 3...,0 0.588605 1 0.588605 2 6.535916 3...
1,0 0.377751 1 0.377751 2 2.952965 3...,0 -0.610850 1 -0.610850 2 0.970717 3...,0 -0.147376 1 -0.147376 2 -5.962515 3...,0 -0.103872 1 -0.103872 2 -7.593275 3...,0 -0.109198 1 -0.109198 2 -0.697804 3...,0 -0.037287 1 -0.037287 2 -2.865789 3...
22,0 0.175924 1 0.175924 2 0.194403 3...,0 0.548757 1 0.548757 2 -3.699192 3...,0 -1.191314 1 -1.191314 2 -0.554051 3...,0 0.039951 1 0.039951 2 0.042614 3...,0 0.263674 1 0.263674 2 -0.178446 3...,0 0.937507 1 0.937507 2 0.071911 3...
26,0 -0.098166 1 -0.098166 2 -0.665304 3...,0 -0.117578 1 -0.117578 2 -1.194660 3...,0 -0.401143 1 -0.401143 2 1.228442 3...,0 -0.061258 1 -0.061258 2 -0.567298 3...,0 0.090555 1 0.090555 2 0.029297 3...,0 0.018644 1 0.018644 2 -0.005327 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:23<00:00,  4.72s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,0.0,1.0,1.0,1.0,-25.268568,14.11858,0.167822,-0.015561,-5.2e-05,-0.281991,...,0.096509,0.319026,0.575647,0.982356,2.784117,1.740979,2.966896,3.97289,4.382343,4.499051
1,1.0,0.0,0.0,1.0,505.902373,13876.020277,7.436936,-0.174782,-0.087916,9.463268,...,0.096509,0.192626,0.192626,0.288342,0.61267,1.533172,2.40487,3.130376,3.719884,4.120886
2,0.0,0.0,1.0,1.0,-14.06187,48.609672,0.384872,-0.007186,0.0,-0.287704,...,0.494918,0.651609,1.005666,1.32444,2.804199,1.657453,2.679145,3.584624,4.055798,4.325692
3,1.0,0.0,0.0,1.0,116.036704,383.560959,1.283012,-0.009261,-0.001879,1.073343,...,0.165443,0.165443,0.165443,0.165443,1.364309,1.581237,2.526022,3.27454,3.793132,4.160042
4,1.0,0.0,0.0,1.0,46.178983,175.315925,1.085591,0.00809,0.003293,0.367424,...,0.096509,0.096509,0.192626,0.288342,1.225333,1.429212,2.268585,2.989103,3.56596,3.952548


## Using tsfresh for forecasting
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.

In [11]:
from sklearn.ensemble import RandomForestRegressor

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ReducedTimeSeriesRegressionForecaster
from sktime.forecasting.model_selection import temporal_train_test_split

y = load_airline()
y_train, y_test = temporal_train_test_split(y)

regressor = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
    RandomForestRegressor(),
)
forecaster = ReducedTimeSeriesRegressionForecaster(regressor, window_length=12)
forecaster.fit(y_train)

fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)