Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ› fix bug with bound_method + ✨ new integrations #62

Merged
merged 11 commits into from Jun 7, 2022

Conversation

jvdd
Copy link
Member

@jvdd jvdd commented Apr 18, 2022

This PR handles

(1) πŸ› a bug in the bound_method + sequence based strided rolling

  • check if .loc induces memory peak
  • agree on what behavior is preferred for segmentation indexing
  • Agreed behavior:
    • make window_idx="begin" default instead of "end"
    • sequences should be segmented into n segments if there are exactly n segments possible (e.g., window=2, stride=2 => 5 segments on sequence of length 10)
    • we remain the current behavior of the end-index of the segments.
  • extend tests

(2) ✨ extends integration with other feature extraction packages

(3): ⚑ faster irregular data check

(4): πŸ”₯ add kaggle TPSAPR2022 notebook to ML examples

@jvdd jvdd requested a review from jonasvdd April 18, 2022 20:21
@@ -367,7 +367,7 @@ def calculate(
# determing the bounds of the series dict items and slice on them
start, end = _determine_bounds(bound_method, list(series_dict.values()))
series_dict = {
n: s[s.index.dtype.type(start) : s.index.dtype.type(end)]
n: s.loc[s.index.dtype.type(start) : s.index.dtype.type(end)] # TODO: check memory efficiency of ths
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check memory efficiency of this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime is the same as previous implementation βœ”οΈ
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory profiling indicates no memory peak βœ”οΈ
image

@jvdd
Copy link
Member Author

jvdd commented Apr 18, 2022

1 tests will fail (as we need to agree what indexing is preferred).
Until then, this PR stays a draft

@jvdd jvdd changed the title πŸ› fix bug with bound_method & sequence based strided rolling πŸ› fix bug with bound_method + ✨ new integrations Apr 20, 2022
@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2022

Codecov Report

Merging #62 (d19ae40) into main (2df1d47) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main      #62   +/-   ##
=======================================
  Coverage   97.73%   97.73%           
=======================================
  Files          23       23           
  Lines        1103     1106    +3     
=======================================
+ Hits         1078     1081    +3     
  Misses         25       25           
Impacted Files Coverage Ξ”
tsflex/features/feature_collection.py 100.00% <ΓΈ> (ΓΈ)
tsflex/__init__.py 100.00% <100.00%> (ΓΈ)
tsflex/features/integrations.py 100.00% <100.00%> (ΓΈ)
tsflex/features/segmenter/strided_rolling.py 95.50% <100.00%> (-0.08%) ⬇️
tsflex/features/utils.py 100.00% <100.00%> (ΓΈ)

Continue to review full report at Codecov.

Legend - Click here to learn more
Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data
Powered by Codecov. Last update 2df1d47...d19ae40. Read the comment docs.

@jvdd jvdd marked this pull request as ready for review April 24, 2022 18:59
@jonasvdd
Copy link
Member

jonasvdd commented May 9, 2022

@jvdd, might also be interesting too look into:

Thanks to @jellevhb

@@ -21,6 +21,7 @@ tsflex is a domain independent package for time series processing & feature extr
| Climate modelling | [Ozone level detection](https://archive.ics.uci.edu/ml/datasets/Ozone%20Level%20Detection) | [example_ozone_level_detection.ipynb](https://github.com/predict-idlab/tsflex/blob/main/examples/example_ozone_level_detection.ipynb) |
| Household data | [Electric power consumption](https://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption) | [example_power_consumption_estimation.ipynb](example_power_consumption_estimation.ipynb) |
| Clinical data | [Sleep-EDF Database Expanded](https://physionet.org/content/sleep-edfx/1.0.0/) | [example_sleep_staging.ipynb](example_sleep_staging.ipynb) |
| kaggle competition | [Tabular Playground Series - Apr 2022](https://www.kaggle.com/competitions/tabular-playground-series-apr-2022)| https://www.kaggle.com/code/jeroenvdd/tpsapr22-best-non-dl-model-tsflex-powershap |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: maybe state here that the data was already segmented -> so here you can find an example on how to use tsflex on already segmented data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not very proud about how we did it (considering the long table as one large series and having a stride that is equal to your sample size)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, will create an issue or extend an existing one with this topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants