# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
42,0 -1.9921 1 -2.0144 2 -1.9611 3 ...
84,0 -1.7624 1 -1.7583 2 -1.7420 3 ...
171,0 -1.6578 1 -1.6647 2 -1.6326 3 ...
21,0 -1.8127 1 -1.8257 2 -1.7844 3 ...
148,0 -1.7022 1 -1.6888 2 -1.6789 3 ...


In [5]:
#  binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:17<00:00,  3.41s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_2,dim_0__fourier_entropy__bins_3,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1
0,0.0,0.0,0.0,1.0,-0.000408,249.999669,0.368617,0.004858,-1.4e-05,-0.18541,...,0.08151,0.08151,0.138673,0.184769,1.268258,1.52631,2.327442,3.066192,3.616213,3.993559
1,0.0,0.0,0.0,1.0,-4.4e-05,249.999347,0.315263,0.005005,-0.000184,0.049636,...,0.08151,0.092513,0.092513,0.173767,1.14771,1.453198,2.220161,2.881779,3.357966,3.730387
2,0.0,0.0,0.0,1.0,0.000274,250.001429,0.300611,0.004589,-0.000124,0.23518,...,0.046288,0.092513,0.092513,0.173767,0.98617,1.451213,2.268776,2.960641,3.459308,3.893599
3,0.0,0.0,0.0,0.0,-8.1e-05,250.000555,0.329773,0.003916,0.000161,-0.006648,...,0.08151,0.092513,0.092513,0.138673,0.935344,1.464792,2.237984,2.861479,3.33278,3.734658
4,0.0,1.0,0.0,1.0,-0.000207,249.99917,0.315776,0.005196,-9.1e-05,0.17458,...,0.08151,0.092513,0.138673,0.219798,1.19182,1.422543,2.147471,2.732982,3.166641,3.509018


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:18<00:00,  3.63s/it]
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:04<00:00,  1.19it/s]


0.7735849056603774

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
#  multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
30,0 -0.771623 1 -0.771623 2 -2.32382...,0 0.372042 1 0.372042 2 -0.29603...,0 -0.145753 1 -0.145753 2 1.71501...,0 -0.031960 1 -0.031960 2 0.383526 3...,0 0.167792 1 0.167792 2 0.229050 3...,0 -0.362219 1 -0.362219 2 -0.23970...
21,0 0.648833 1 0.648833 2 0.076985 3...,0 -0.996722 1 -0.996722 2 -0.897264 3...,0 -0.644136 1 -0.644136 2 0.970515 3...,0 -0.101208 1 -0.101208 2 -0.407496 3...,0 0.055931 1 0.055931 2 -0.157139 3...,0 -0.031960 1 -0.031960 2 -0.343575 3...
31,0 0.130669 1 0.130669 2 0.06882...,0 -0.119724 1 -0.119724 2 -4.08360...,0 -1.019916 1 -1.019916 2 5.39025...,0 0.684487 1 0.684487 2 0.394179 3...,0 0.290308 1 0.290308 2 0.617902 3...,0 0.679160 1 0.679160 2 1.595360 3...
27,0 -0.188742 1 -0.188742 2 -1.077880 3...,0 -0.317179 1 -0.317179 2 0.424980 3...,0 -0.332557 1 -0.332557 2 -0.283946 3...,0 -0.122515 1 -0.122515 2 -0.364882 3...,0 -0.106535 1 -0.106535 2 0.426140 3...,0 -0.093218 1 -0.093218 2 0.002663 3...
24,0 0.383922 1 0.383922 2 -0.272575 3...,0 0.302612 1 0.302612 2 -1.381236 3...,0 -0.398075 1 -0.398075 2 -0.681258 3...,0 0.071911 1 0.071911 2 -0.761725 3...,0 0.175783 1 0.175783 2 -0.114525 3...,0 -0.087891 1 -0.087891 2 -0.503377 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:21<00:00,  4.32s/it]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_2,dim_5__fourier_entropy__bins_3,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1
0,1.0,0.0,0.0,1.0,680.848161,12647.878199,5.481374,0.08044,-0.052293,3.983688,...,0.223718,0.437095,0.80654,1.424715,3.226796,1.65941,2.827616,3.720341,4.294787,4.499051
1,1.0,1.0,0.0,1.0,57.045746,172.027276,0.807892,0.001584,0.003131,0.4221,...,0.165443,0.165443,0.165443,0.165443,1.241657,1.494736,2.333086,3.047524,3.577109,3.928619
2,1.0,0.0,0.0,1.0,486.267207,7638.280878,4.995886,0.14787,0.055557,2.151706,...,0.465999,0.695363,1.102984,1.567163,3.253978,1.749566,2.997992,3.826876,4.344073,4.528547
3,1.0,0.0,0.0,1.0,72.353623,265.345783,1.34185,0.023994,0.0,0.551162,...,0.165443,0.192626,0.192626,0.288342,1.379875,1.582407,2.561448,3.377921,3.93701,4.355188
4,1.0,0.0,0.0,1.0,109.991851,354.117244,1.1521,-0.008727,-0.007213,0.945874,...,0.096509,0.096509,0.192626,0.288342,1.427455,1.504452,2.441925,3.173603,3.736839,4.150552


## Using tsfresh for forecasting
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.

In [11]:
from sklearn.ensemble import RandomForestRegressor

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ReducedTimeSeriesRegressionForecaster
from sktime.forecasting.model_selection import temporal_train_test_split

y = load_airline()
y_train, y_test = temporal_train_test_split(y)

regressor = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
    RandomForestRegressor(),
)
forecaster = ReducedTimeSeriesRegressionForecaster(regressor, window_length=12)
forecaster.fit(y_train)

fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)