New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix start of train split in TimeGapSplit and added n_split parameter #324
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rpauli, thanks for the PR!
I like the change you're proposing, I think it's a nice addition to the current functionality. I do have a few questions though:
- Would it make sense to rename
n_splits
tomax_splits
? In the current implementation it seems possible to end up with fewer thann_splits
splits, but never more. - I've added some small questions on specific parts of the code
- Could you add some of the checks you do in the notebook as automated
pytest
tests? Let me know if you need help with that.
Fewer shouldnt be possible, I added this check for it:
I'll come up with a suggestion later today, thanks! To make this explicit: |
@rpauli I like these changes. Two quick things though.
|
There is no rush, but just to check; @rpauli are you waiting for feedback from us? |
I added some more explicit assert statements and aded the strict tag to the tests that are supposed to fail. I noticed you handled that differently in test_timegapsplit_too_big_gap should I also change it to catch the except? I ran the ipynb before pushing, is this what you mean with render it? |
Yes 👍 |
I've only got one comment about the tests, but it is starting to look green to me. @MBrouns? |
I reread some pytest docs and it seems I misunderstood how xfail is supposed to be used, changed it to use |
It is merged, I will now also make a release with this feature in it so that you can use it right away. @rpauli just to check, have we met in real life at a PyData by any chance? I'm curious to hear how you discovered this package. |
Also, @rpauli if you have a twitter handle I can mention you when I announce the update. |
And it's live with a version bump: https://pypi.org/project/scikit-lego/0.4.2/#history |
Saw one of your pyData talks on gaussian processes and outlier detection and found this package with things I also implemented at work (although less structured) so I decided to contribute a bit |
Great to hear, first open source contribution I wasn't paid for! |
Addresses changes in #192 and #232
I am currently working on a Time Series problem with vibration data where I needed a functionality like the one suggested in #232 so I decided to add it here.
I tried to explain the changes in functionality in the docstring:
The changes are also added to the docs notebook for visualization.