🐛 fix bug with bound_method + ✨ new integrations #62

jvdd · 2022-04-18T20:20:00Z

This PR handles

(1) 🐛 a bug in the bound_method + sequence based strided rolling

check if .loc induces memory peak
agree on what behavior is preferred for segmentation indexing
Agreed behavior:
- make window_idx="begin" default instead of "end"
- sequences should be segmented into n segments if there are exactly n segments possible (e.g., window=2, stride=2 => 5 segments on sequence of length 10)
- we remain the current behavior of the end-index of the segments.
extend tests

(2) ✨ extends integration with other feature extraction packages

add and test catch22 integration wrapper
❌ add and test scikit-learn transformer wrapper
Decided to not do this (for now): have atm an ambiguas implementation -> more details in [Feature Request] Integration with pyts #56

(3): ⚡ faster irregular data check

(4): 🔥 add kaggle TPSAPR2022 notebook to ML examples

jvdd · 2022-04-18T20:21:35Z

tsflex/features/feature_collection.py

@@ -367,7 +367,7 @@ def calculate(
        # determing the bounds of the series dict items and slice on them
        start, end = _determine_bounds(bound_method, list(series_dict.values()))
        series_dict = {
-            n: s[s.index.dtype.type(start) : s.index.dtype.type(end)]
+            n: s.loc[s.index.dtype.type(start) : s.index.dtype.type(end)]  # TODO: check memory efficiency of ths


Check memory efficiency of this

Runtime is the same as previous implementation ✔️

Memory profiling indicates no memory peak ✔️

jvdd · 2022-04-18T20:22:27Z

1 tests will fail (as we need to agree what indexing is preferred).
Until then, this PR stays a draft

tsflex/features/segmenter/strided_rolling.py

tests/test_features_feature_collection.py

codecov-commenter · 2022-04-24T18:58:06Z

Codecov Report

Merging #62 (d19ae40) into main (2df1d47) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main      #62   +/-   ##
=======================================
  Coverage   97.73%   97.73%           
=======================================
  Files          23       23           
  Lines        1103     1106    +3     
=======================================
+ Hits         1078     1081    +3     
  Misses         25       25

Impacted Files	Coverage Δ
tsflex/features/feature_collection.py	`100.00% <ø> (ø)`
tsflex/__init__.py	`100.00% <100.00%> (ø)`
tsflex/features/integrations.py	`100.00% <100.00%> (ø)`
tsflex/features/segmenter/strided_rolling.py	`95.50% <100.00%> (-0.08%)`	⬇️
tsflex/features/utils.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2df1d47...d19ae40. Read the comment docs.

jonasvdd · 2022-05-09T14:49:08Z

@jvdd, might also be interesting too look into:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html?highlight=rolling#pandas.DataFrame.rolling
Maybe we can use
the same naming convention
or even re-use their underlying logic (i.e., don't reimplement the wheel)

Thanks to @jellevhb

jonasvdd · 2022-06-06T17:38:07Z

examples/README.md

@@ -21,6 +21,7 @@ tsflex is a domain independent package for time series processing & feature extr
 | Climate modelling | [Ozone level detection](https://archive.ics.uci.edu/ml/datasets/Ozone%20Level%20Detection) | [example_ozone_level_detection.ipynb](https://github.com/predict-idlab/tsflex/blob/main/examples/example_ozone_level_detection.ipynb) |  
 | Household data | [Electric power consumption](https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption) | [example_power_consumption_estimation.ipynb](example_power_consumption_estimation.ipynb) |
 | Clinical data | [Sleep-EDF Database Expanded](https://physionet.org/content/sleep-edfx/1.0.0/) | [example_sleep_staging.ipynb](example_sleep_staging.ipynb) |
+| kaggle competition | [Tabular Playground Series - Apr 2022](https://www.kaggle.com/competitions/tabular-playground-series-apr-2022)| https://www.kaggle.com/code/jeroenvdd/tpsapr22-best-non-dl-model-tsflex-powershap | 


TODO: maybe state here that the data was already segmented -> so here you can find an example on how to use tsflex on already segmented data?

Hmm not very proud about how we did it (considering the long table as one large series and having a stride that is equal to your sample size)

okay, will create an issue or extend an existing one with this topic

tsflex/features/segmenter/strided_rolling.py

🐛 fix bug with bound_method & sequence based strided rolling

abcf5a0

jvdd requested a review from jonasvdd April 18, 2022 20:21

jvdd commented Apr 18, 2022

View reviewed changes

jvdd added 3 commits April 19, 2022 18:05

✨ integrate with catch22

503c011

🧹

18f1a37

🖊️ document

9866d10

jvdd changed the title ~~🐛 fix bug with bound_method & sequence based strided rolling~~ 🐛 fix bug with bound_method + ✨ new integrations Apr 20, 2022

⚡ faster irregular data check

c656c75

jvdd commented Apr 24, 2022

View reviewed changes

tsflex/features/segmenter/strided_rolling.py Show resolved Hide resolved

♻️ update default window_idx + add & update tests

aab71f8

jvdd commented Apr 24, 2022

View reviewed changes

tests/test_features_feature_collection.py Show resolved Hide resolved

🧹

a377792

jvdd marked this pull request as ready for review April 24, 2022 18:59

jvdd mentioned this pull request Jun 1, 2022

[MRG] Remove deprecated closed argument in pd.daterange #64

Merged

jvdd added 2 commits June 1, 2022 13:58

Merge branch 'main' into bound_bug

9a7484c

🔥 add kaggle TPSAPR22 notebook

f5aef3f

jonasvdd reviewed Jun 6, 2022

View reviewed changes

tsflex/features/segmenter/strided_rolling.py Show resolved Hide resolved

jonasvdd and others added 2 commits June 6, 2022 19:52

📝 review

b50b978

🙈 formatting

d19ae40

jonasvdd merged commit 468f2b3 into main Jun 7, 2022

jvdd mentioned this pull request Jun 7, 2022

Alignment of feature windows when equal stride #59

Open

jvdd mentioned this pull request Jun 16, 2022

Question: Feature extraction on time series batch #67

Open

jvdd mentioned this pull request Jun 24, 2022

♻️ refactor indexing + ✂️ decouple stride & window + ✨ support segment idxs #71

Merged

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 fix bug with bound_method + ✨ new integrations #62

🐛 fix bug with bound_method + ✨ new integrations #62

jvdd commented Apr 18, 2022 •

edited

jvdd Apr 18, 2022

jvdd Apr 24, 2022

jvdd Apr 24, 2022

jvdd commented Apr 18, 2022

codecov-commenter commented Apr 24, 2022 •

edited

jonasvdd commented May 9, 2022 •

edited

jonasvdd Jun 6, 2022

jvdd Jun 6, 2022

jonasvdd Jun 7, 2022

🐛 fix bug with bound_method + ✨ new integrations #62

🐛 fix bug with bound_method + ✨ new integrations #62

Conversation

jvdd commented Apr 18, 2022 • edited

jvdd Apr 18, 2022

Choose a reason for hiding this comment

jvdd Apr 24, 2022

Choose a reason for hiding this comment

jvdd Apr 24, 2022

Choose a reason for hiding this comment

jvdd commented Apr 18, 2022

codecov-commenter commented Apr 24, 2022 • edited

Codecov Report

jonasvdd commented May 9, 2022 • edited

jonasvdd Jun 6, 2022

Choose a reason for hiding this comment

jvdd Jun 6, 2022

Choose a reason for hiding this comment

jonasvdd Jun 7, 2022

Choose a reason for hiding this comment

jvdd commented Apr 18, 2022 •

edited

codecov-commenter commented Apr 24, 2022 •

edited

jonasvdd commented May 9, 2022 •

edited