tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data.
| command | |
|---|---|
| pip | pip install tsflex |
| conda | conda install -c conda-forge tsflex |
tsflex is built to be intuitive, so we encourage you to copy-paste this code and toy with some parameters!
import pandas as pd; import numpy as np; import scipy.stats as ss
from tsflex.features import MultipleFeatureDescriptors, FeatureCollection
from tsflex.utils.data import load_empatica_data
# 1. Load sequence-indexed data (in this case a time-index)
df_tmp, df_acc, df_ibi = load_empatica_data(['tmp', 'acc', 'ibi'])
# 2. Construct your feature extraction configuration
fc = FeatureCollection(
MultipleFeatureDescriptors(
functions=[np.min, np.mean, np.std, ss.skew, ss.kurtosis],
series_names=["TMP", "ACC_x", "ACC_y", "IBI"],
windows=["15min", "30min"],
strides="15min",
)
)
# 3. Extract features
fc.calculate(data=[df_tmp, df_acc, df_ibi], approve_sparsity=True)Note that the feature extraction is performed on multivariate data with varying sample rates.
| signal | columns | sample rate |
|---|---|---|
| df_tmp | ["TMP"] | 4Hz |
| df_acc | ["ACC_x", "ACC_y", "ACC_z" ] | 32Hz |
| df_ibi | ["IBI"] | irregularly sampled |
Flexible:- handles multivariate/multimodal time series
- versatile function support
=> integrates with many packages for:
- processing (e.g., scipy.signal, statsmodels.tsa)
- feature extraction (e.g., numpy, scipy.stats, antropy, nolds, seglearn¹, tsfresh¹, tsfel¹)
- feature extraction handles multiple strides & window sizes
Efficient:
- view-based operations for processing & feature extraction => extremely low memory peak & fast execution time
- view-based operations for processing & feature extraction => extremely low memory peak & fast execution time
Intuitive:
- maintains the sequence-index of the data
- feature extraction constructs interpretable output column names
- intuitive API
Few assumptionsabout the sequence data:- no assumptions about sampling rate
- able to deal with multivariate asynchronous data
i.e. data with small time-offsets between the modalities
Advanced functionalities:- apply FeatureCollection.reduce after feature selection for faster inference
- use function execution time logging to discover processing and feature extraction bottlenecks
- embedded SeriesPipeline & FeatureCollection serialization
- time series chunking
¹ These integrations are shown in integration-example notebooks.
- scikit-learn integration for both processing and feature extraction
note: is actively developed upon sklearn integration branch. - Support time series segmentation (exposing under the hood strided-rolling functionality) - see this issue
- Support for multi-indexed dataframes
=> Also see the enhancement issues
We are thrilled to see your contributions to further enhance tsflex.
See this guide for more instructions on how to contribute.
If you use tsflex in a scientific publication, we would highly appreciate citing us as:
@article{vanderdonckt2021tsflex,
author = {Van Der Donckt, Jonas and Van Der Donckt, Jeroen and Deprost, Emiel and Van Hoecke, Sofie},
title = {tsflex: flexible time series processing \& feature extraction},
journal = {SoftwareX},
year = {2021},
url = {https://github.com/predict-idlab/tsflex},
publisher={Elsevier}
}Link to the paper: https://www.sciencedirect.com/science/article/pii/S2352711021001904
👤 Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost
