# Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with [tsfresh](https://tsfresh.readthedocs.io) to first extract features from time series, so that we can then use any scikit-learn estimator.

## Preliminaries
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:

In [1]:
# !pip install --upgrade tsfresh

In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sktime.datasets import load_basic_motions
from sktime.datasets import load_arrow_head
from sktime.transformers.series_as_features.summarize import \
    TSFreshFeatureExtractor

## Univariate time series classification data

For more details on the data set, see the [univariate time series classification notebook](https://github.com/alan-turing-institute/sktime/blob/master/examples/02_classification_univariate.ipynb).

In [3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(158, 1) (158,) (53, 1) (53,)


In [4]:
X_train.head()

Unnamed: 0,dim_0
169,0 -1.7767 1 -1.7786 2 -1.7501 3 ...
26,0 -2.2551 1 -2.2337 2 -2.2292 3 ...
4,0 -1.9591 1 -1.9749 2 -1.9714 3 ...
116,0 -2.2003 1 -2.1845 2 -2.1533 3 ...
21,0 -1.9888 1 -2.0247 2 -1.9262 3 ...


In [5]:
# binary classification task
np.unique(y_train)

array(['0', '1', '2'], dtype=object)

## Using tsfresh to extract features

In [6]:
# tf = TsFreshTransfomer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  warn("Found non-unique index, replaced with unique index.")


Feature Extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Feature Extraction:  20%|██        | 1/5 [00:04<00:19,  4.76s/it]

Feature Extraction:  40%|████      | 2/5 [00:09<00:14,  4.72s/it]

Feature Extraction:  60%|██████    | 3/5 [00:14<00:09,  4.73s/it]

Feature Extraction:  80%|████████  | 4/5 [00:18<00:04,  4.69s/it]

Feature Extraction: 100%|██████████| 5/5 [00:23<00:00,  4.57s/it]

Feature Extraction: 100%|██████████| 5/5 [00:23<00:00,  4.61s/it]




variable,dim_0__abs_energy,dim_0__absolute_sum_of_changes,"dim_0__agg_autocorrelation__f_agg_""mean""__maxlag_40","dim_0__agg_autocorrelation__f_agg_""median""__maxlag_40","dim_0__agg_autocorrelation__f_agg_""var""__maxlag_40","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""max""","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""mean""","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""min""","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""var""","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_50__f_agg_""max""",...,dim_0__symmetry_looking__r_0.9500000000000001,dim_0__time_reversal_asymmetry_statistic__lag_1,dim_0__time_reversal_asymmetry_statistic__lag_2,dim_0__time_reversal_asymmetry_statistic__lag_3,dim_0__value_count__value_-1,dim_0__value_count__value_0,dim_0__value_count__value_1,dim_0__variance,dim_0__variance_larger_than_standard_deviation,dim_0__variation_coefficient
0,250.000845,82.01538,0.26879,0.287141,0.083965,-0.285361,-0.523476,-1.19342,0.100949,0.647236,...,1.0,0.049885,0.008217,-0.016709,0.0,0.0,0.0,0.996019,0.0,19269220.0
1,250.001216,95.884174,0.097718,0.038851,0.080948,-0.064405,-0.425036,-1.225422,0.150297,0.733005,...,1.0,0.059773,-0.007259,-0.078893,0.0,0.0,0.0,0.996021,0.0,-405340.0
2,250.000892,91.195748,0.099214,0.114457,0.12504,0.237061,-0.172878,-1.220586,0.215094,1.169208,...,1.0,0.072462,0.028787,0.00527,0.0,0.0,0.0,0.996019,0.0,1346774.0
3,249.999631,80.635708,0.29114,0.297845,0.060626,-0.280899,-0.603555,-0.977783,0.073818,0.483202,...,1.0,0.054177,-0.004676,-0.045644,0.0,0.0,0.0,0.996014,0.0,5009986.0
4,250.000612,85.35988,0.198903,0.212676,0.099692,0.084713,-0.273434,-1.06501,0.171719,1.103542,...,1.0,0.080865,0.025874,-0.0042,0.0,0.0,0.0,0.996018,0.0,1972439.0


## Using tsfresh with sktime

In [7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)

  warn("Found non-unique index, replaced with unique index.")


Feature Extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Feature Extraction:  20%|██        | 1/5 [00:05<00:20,  5.17s/it]

Feature Extraction:  40%|████      | 2/5 [00:10<00:16,  5.35s/it]

Feature Extraction:  60%|██████    | 3/5 [00:15<00:10,  5.11s/it]

Feature Extraction:  80%|████████  | 4/5 [00:19<00:04,  4.92s/it]

Feature Extraction: 100%|██████████| 5/5 [00:24<00:00,  4.72s/it]

Feature Extraction: 100%|██████████| 5/5 [00:24<00:00,  4.84s/it]




  warn("Found non-unique index, replaced with unique index.")


Feature Extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Feature Extraction:  20%|██        | 1/5 [00:01<00:06,  1.64s/it]

Feature Extraction:  40%|████      | 2/5 [00:03<00:04,  1.62s/it]

Feature Extraction:  60%|██████    | 3/5 [00:04<00:03,  1.61s/it]

Feature Extraction:  80%|████████  | 4/5 [00:06<00:01,  1.59s/it]

Feature Extraction: 100%|██████████| 5/5 [00:07<00:00,  1.50s/it]

Feature Extraction: 100%|██████████| 5/5 [00:07<00:00,  1.52s/it]




0.8301886792452831

## Multivariate time series classification data

In [8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60, 6) (60,) (20, 6) (20,)


In [9]:
# multivariate input data
X_train.head()

Unnamed: 0,dim_0,dim_1,dim_2,dim_3,dim_4,dim_5
35,0 -0.040961 1 -0.040961 2 0.338414 3...,0 -0.971100 1 -0.971100 2 -3.420216 3...,0 0.203560 1 0.203560 2 -2.053446 3...,0 0.061258 1 0.061258 2 0.250357 3...,0 -0.047941 1 -0.047941 2 -0.639209 3...,0 0.961478 1 0.961478 2 -0.298298 3...
14,0 -0.947424 1 -0.947424 2 14.53912...,0 0.572681 1 0.572681 2 -10.32130...,0 -0.529822 1 -0.529822 2 -4.144042 3...,0 -0.098545 1 -0.098545 2 2.138688 3...,0 0.596595 1 0.596595 2 -1.259775 3...,0 0.772378 1 0.772378 2 7.21774...
9,0 0.126160 1 0.126160 2 1.771871 3...,0 0.102733 1 0.102733 2 -3.798484 3...,0 0.308964 1 0.308964 2 0.141369 3...,0 0.002663 1 0.002663 2 -1.427568 3...,0 0.000000 1 0.000000 2 -0.167792 3...,0 -0.007990 1 -0.007990 2 -1.643301 3...
17,0 3.789469 1 3.789469 2 1.78594...,0 -1.353556 1 -1.353556 2 -10.69460...,0 -0.685072 1 -0.685072 2 -4.465480 3...,0 -0.021307 1 -0.021307 2 2.753927 3...,0 -0.159802 1 -0.159802 2 -0.820319 3...,0 0.133169 1 0.133169 2 2.974987 3...
21,0 -0.171905 1 -0.171905 2 -0.397472 3...,0 0.206276 1 0.206276 2 -3.217950 3...,0 -0.308410 1 -0.308410 2 -0.035401 3...,0 -0.189099 1 -0.189099 2 0.857606 3...,0 0.079901 1 0.079901 2 0.135832 3...,0 0.055931 1 0.055931 2 0.391516 3...


In [10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()

  warn("Found non-unique index, replaced with unique index.")


Feature Extraction:   0%|          | 0/5 [00:00<?, ?it/s]

Feature Extraction:  20%|██        | 1/5 [00:08<00:32,  8.10s/it]

Feature Extraction:  40%|████      | 2/5 [00:16<00:24,  8.06s/it]

Feature Extraction:  60%|██████    | 3/5 [00:23<00:16,  8.01s/it]

Feature Extraction:  80%|████████  | 4/5 [00:31<00:07,  7.99s/it]

Feature Extraction: 100%|██████████| 5/5 [00:39<00:00,  8.00s/it]

Feature Extraction: 100%|██████████| 5/5 [00:39<00:00,  7.99s/it]




variable,dim_0__abs_energy,dim_0__absolute_sum_of_changes,"dim_0__agg_autocorrelation__f_agg_""mean""__maxlag_40","dim_0__agg_autocorrelation__f_agg_""median""__maxlag_40","dim_0__agg_autocorrelation__f_agg_""var""__maxlag_40","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""max""","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""mean""","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""min""","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_10__f_agg_""var""","dim_0__agg_linear_trend__attr_""intercept""__chunk_len_50__f_agg_""max""",...,dim_5__symmetry_looking__r_0.9500000000000001,dim_5__time_reversal_asymmetry_statistic__lag_1,dim_5__time_reversal_asymmetry_statistic__lag_2,dim_5__time_reversal_asymmetry_statistic__lag_3,dim_5__value_count__value_-1,dim_5__value_count__value_0,dim_5__value_count__value_1,dim_5__variance,dim_5__variance_larger_than_standard_deviation,dim_5__variation_coefficient
0,6681.979256,441.093167,-0.003917,-0.024874,0.021016,18.693258,6.340561,0.353609,41.466413,29.321178,...,1.0,-28.438658,3.340718,15.100302,0.0,0.0,0.0,12.710682,1.0,31.065326
1,13000.226236,666.891287,-0.003739,-0.119334,0.176492,16.052055,6.681353,-11.36958,90.618584,18.69674,...,1.0,6.731435,41.51033,58.555077,0.0,0.0,0.0,29.820847,1.0,-30.090259
2,9.735453,15.20245,-0.011942,-0.006515,0.005243,0.542816,-0.133406,-0.447754,0.122703,1.771871,...,1.0,-0.007607,-0.01442,-0.010061,0.0,0.0,0.0,0.123327,0.0,57.578839
3,14668.442452,852.384132,-0.019733,-0.067975,0.140029,18.527246,4.206907,-17.096802,154.651004,20.67378,...,1.0,15.053465,48.867108,70.182013,0.0,0.0,0.0,26.184878,1.0,149.517043
4,535.495127,135.580907,-0.027563,-0.045144,0.078813,4.071817,1.703098,-0.768076,2.699804,5.046151,...,1.0,0.196546,-0.351586,0.113937,0.0,0.0,0.0,3.208048,1.0,59.407542
