# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [2]:
# !pip install --upgrade tsfresh



In [3]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/sktime/sktime/blob/main/examples/02_classification_univariate.ipynb).

In [4]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [5]:
X_train.head()

Unnamed: 0,dim_0
61,0 -1.854100 1 -1.833880 2 -1.81870...
2,0 -1.866021 1 -1.841991 2 -1.83502...
22,0 -1.754535 1 -1.777870 2 -1.75141...
99,0 -2.034054 1 -2.029912 2 -1.98799...
164,0 -1.672881 1 -1.683678 2 -1.66429...


In [6]:
#  binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype='<U1')

## Using tsfresh to extract features

In [7]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

Feature Extraction: 100%|██████████| 158/158 [00:03<00:00, 51.24it/s]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_0__fourier_entropy__bins_5,dim_0__fourier_entropy__bins_10,dim_0__fourier_entropy__bins_100,dim_0__permutation_entropy__dimension_3__tau_1,dim_0__permutation_entropy__dimension_4__tau_1,dim_0__permutation_entropy__dimension_5__tau_1,dim_0__permutation_entropy__dimension_6__tau_1,dim_0__permutation_entropy__dimension_7__tau_1,dim_0__query_similarity_count__query_None__threshold_0.0,dim_0__mean_n_absolute_max__number_of_maxima_7
0,0.0,0.0,0.0,1.0,-1.94e-07,250.0,0.052711,-6e-05,-6.9e-05,0.097725,...,0.092513,0.092513,0.254761,1.214914,1.688635,2.052006,2.340419,2.597944,0.0,1.846627
1,0.0,0.0,0.0,1.0,-1.9759e-07,249.999999,0.047546,7.8e-05,-1.4e-05,0.072201,...,0.092513,0.092513,0.138673,1.184494,1.503192,1.779889,2.025692,2.221817,0.0,1.855891
2,0.0,0.0,0.0,0.0,-3.4e-07,249.999999,0.050945,-5e-06,9e-05,-0.008452,...,0.092513,0.092513,0.250609,1.339717,1.816194,2.196909,2.494238,2.756343,0.0,1.756628
3,0.0,0.0,0.0,1.0,4.61e-07,250.000001,0.057137,7.7e-05,2.7e-05,-0.157094,...,0.092513,0.138673,0.296508,1.154028,1.613771,2.014342,2.354278,2.670915,0.0,2.022369
4,0.0,1.0,0.0,1.0,-4.67e-07,249.999999,0.046877,2.1e-05,4.9e-05,0.287964,...,0.127671,0.138673,0.184769,1.338461,1.865376,2.291622,2.559976,2.749413,0.0,1.670713


## Using tsfresh with sktime

In [8]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

Feature Extraction: 100%|██████████| 158/158 [00:03<00:00, 51.61it/s]
Feature Extraction: 100%|██████████| 53/53 [00:01<00:00, 51.68it/s]


0.9056603773584906

## Multivariate time series classification data

In [9]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [10]:
#  multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
2,0 -0.813905 1 -0.813905 2 -0.424628 3...,0 0.825666 1 0.825666 2 -1.305033 3...,0 0.032712 1 0.032712 2 0.826170 3...,0 0.021307 1 0.021307 2 -0.372872 3...,0 0.122515 1 0.122515 2 -0.045277 3...,0 0.775041 1 0.775041 2 0.383526 3...
43,0 -1.088052 1 -1.088052 2 -0.683620 3...,0 0.183832 1 0.183832 2 -2.909047 3...,0 -0.260871 1 -0.260871 2 1.507042 3...,0 -0.284981 1 -0.284981 2 0.415486 3...,0 0.487397 1 0.487397 2 0.013317 3...,0 1.081329 1 1.081329 2 0.820319 3...
0,0 0.079106 1 0.079106 2 -0.903497 3...,0 0.394032 1 0.394032 2 -3.666397 3...,0 0.551444 1 0.551444 2 -0.282844 3...,0 0.351565 1 0.351565 2 -0.095881 3...,0 0.023970 1 0.023970 2 -0.319605 3...,0 0.633883 1 0.633883 2 0.972131 3...
34,0 0.052231 1 0.052231 2 -0.54804...,0 -0.730486 1 -0.730486 2 0.70700...,0 -0.518104 1 -0.518104 2 -1.179430 3...,0 -0.159802 1 -0.159802 2 -0.239704 3...,0 -0.045277 1 -0.045277 2 0.023970 3...,0 -0.029297 1 -0.029297 2 0.29829...
60,0 -0.294498 1 -0.294498 2 -0.050044 3...,0 0.540218 1 0.540218 2 -0.515245 3...,0 0.218114 1 0.218114 2 -0.301108 3...,0 -0.045277 1 -0.045277 2 0.103872 3...,0 -0.002663 1 -0.002663 2 -0.183773 3...,0 0.031960 1 0.031960 2 0.037287 3...


In [11]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

Feature Extraction: 100%|██████████| 360/360 [00:05<00:00, 67.11it/s]


Unnamed: 0,dim_0__variance_larger_than_standard_deviation,dim_0__has_duplicate_max,dim_0__has_duplicate_min,dim_0__has_duplicate,dim_0__sum_values,dim_0__abs_energy,dim_0__mean_abs_change,dim_0__mean_change,dim_0__mean_second_derivative_central,dim_0__median,...,dim_5__fourier_entropy__bins_5,dim_5__fourier_entropy__bins_10,dim_5__fourier_entropy__bins_100,dim_5__permutation_entropy__dimension_3__tau_1,dim_5__permutation_entropy__dimension_4__tau_1,dim_5__permutation_entropy__dimension_5__tau_1,dim_5__permutation_entropy__dimension_6__tau_1,dim_5__permutation_entropy__dimension_7__tau_1,dim_5__query_similarity_count__query_None__threshold_0.0,dim_5__mean_n_absolute_max__number_of_maxima_7
0,0.0,0.0,0.0,1.0,-7.982351,10.309371,0.239052,0.002724,0.0,-0.056579,...,0.165443,0.165443,0.810494,1.464695,2.222689,2.936018,3.542399,4.012085,0.0,0.846192
1,0.0,0.0,1.0,1.0,-15.850238,9.885558,0.163062,0.008776,0.0,-0.173755,...,0.26116,0.288342,0.853617,1.509548,2.323756,3.100984,3.720801,4.141201,0.0,1.19243
2,0.0,0.0,0.0,1.0,-8.618429,10.629914,0.16445,-0.002871,-6.1e-05,-0.164268,...,0.585488,0.745016,1.866089,1.520317,2.407729,3.222908,3.878028,4.281449,0.0,1.337393
3,1.0,0.0,0.0,1.0,307.637735,5948.915061,4.407277,0.17775,0.084208,0.722776,...,1.197552,1.78495,3.313297,1.724168,2.931721,3.927239,4.382343,4.499051,0.0,11.957401
4,0.0,0.0,0.0,1.0,33.334188,110.735119,0.822452,0.000639,0.001751,0.164096,...,0.165443,0.192626,0.545824,1.279774,1.910772,2.565051,3.096812,3.567632,0.0,1.613623


## Using tsfresh for forecasting
You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.

In [12]:
from sklearn.ensemble import RandomForestRegressor

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import make_reduction
from sktime.forecasting.model_selection import temporal_train_test_split

y = load_airline()
y_train, y_test = temporal_train_test_split(y)

regressor = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
    RandomForestRegressor(),
)
forecaster = make_reduction(
    regressor, scitype="time-series-regressor", window_length=12
)
forecaster.fit(y_train)

fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)