Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group time series split #915

Merged
merged 32 commits into from
May 27, 2022
Merged

Group time series split #915

merged 32 commits into from
May 27, 2022

Conversation

labdmitriy
Copy link
Contributor

@labdmitriy labdmitriy commented Apr 23, 2022

Code of Conduct

Description

Add group time series cross-validator implementation.
Add tests with 100% coverage using pytest.
I decided to create pull request before creating documentation and change log modification, to discuss current implementation and further steps to implement.

Related issues or pull requests

Fixes #910

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./mlxtend

@labdmitriy labdmitriy changed the title Group time series Group time series split Apr 23, 2022
@labdmitriy
Copy link
Contributor Author

Sorry, mistakenly added 2 files (settings.json and conda_requirements.txt), removed it using additional commits.

Copy link
Owner

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Wow, this looks really good and super professional! While looking over the code, I had these two thoughts here:

@@ -40,4 +41,5 @@
"RandomHoldoutSplit", "PredefinedHoldoutSplit",
"ftest", "combined_ftest_5x2cv",
"proportion_difference", "bias_variance_decomp",
"accuracy_score", "create_counterfactual"]
"accuracy_score", "create_counterfactual",
"time_series"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "time_series" should be "GroupTimeSeriesSplit" here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was wrong, corrected and commited changes.

import numpy as np
import pytest
from mlxtend.evaluate import GroupTimeSeriesSplit

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole unit test suite here looks pretty comprehensive to me. I wonder if we could add a computationally cheap scikit-learn-related test though. E.g., plugging it into cross_val_score or GridSearchCV as cv argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test to check the usage with cross_val_score based on DummyClassifier with "most_frequent" strategy.
So with this split we will have 3 splits and the following values/accuracy for test true and predicted targets:

y y_pred accuracy
[1 1] [0 0] 0
[0 1] [1 1] 0.5
[1 0 0 0] [1 1 1 1] 0.25

Copy link
Owner

@rasbt rasbt May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, thanks! Mainly why I suggested it is to check for API compatibility. I am pretty sure it is compatible with GridSearchCV and cross_val_score, but you never know. It's also to make sure in case we modify it in the future that it still works, or in case the scikit-learn API changes, it still works.

What I had in mind was something like:

from mlxtend.evaluate import GroupTimeSeriesSplit
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

lr = LogisticRegression()

cv = GroupTimeSeriesSplit(...)
cross_val_score(lr, X, y, cv=cv)

@labdmitriy
Copy link
Contributor Author

labdmitriy commented Apr 28, 2022

Hi Sebastian,

During the implementaion of requested changes, I have some questions:

  1. When I changed something in my feature branch, and upstream/master is changed while the feature development, should I merge master branch to my feature branch even if I don't have any conflict changes? Sorry if it is basic question, but I tried to search best practices and didn't find common solution for it. I tried to rebase but as I understand it is not good idea when I already push my feature branch to public, and also I had diverged branches after that. So I decided to just change my code and commit/push it without any merging and rebasing now.

  2. sklearn has the following statement about __init__() method in custom estimators:
    "There should be no logic, not even input validation, and the parameters should not be changed."
    In the current implementation of GroupTimeSeriesSplit, in init() method I checked only parameters that do not require any calculations, and the rest of the logic I implemented in split() method with invocation of the self._calculate_split_params() after these checks. I think that it looks pretty weird.
    What do you think, what will the best option here:

  • To move all checks including self._calculate_split_params() to __init__() method
  • To move all checks to split() method
  • To keep everything "as is"
    Update: I remember that groups are used only in split() method, not __init__(), that's why I implemented several checks in split() method.
  1. Where is the best place to discuss this feature - here in pull request or in related issue?

  2. When you have time, will it be possible to see the questions from related issue? Probably I had too many questions, and I am ready to group the questions somehow if it is required.

Thank you.

@labdmitriy labdmitriy requested a review from rasbt April 28, 2022 13:43
@pep8speaks
Copy link

pep8speaks commented May 1, 2022

Hello @labdmitriy! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-05-25 06:50:32 UTC

@labdmitriy
Copy link
Contributor Author

I've implemented additional changes:

  • Lessen the restrictions for the group sequences. Now it is not required to be sorted in increasing order, only to be consecutive. The reason is that now it can accept not only group numbers but also group names, where sorting can be confusing.
  • Add several tests for group names.
  • I saw that you reformatted the code with black8, and I decided to do this for PR code. It seems that now 79 chars is required max line length (corresponding to pep8speaks bot messages), so I changed default black8 line length value from default (88) to 79 and reformatted it again.

@labdmitriy
Copy link
Contributor Author

labdmitriy commented May 1, 2022

There are several updates of __init__.py because imports are automatically sorted by VS Code (using built-in isort tool) whereas imports in the file are not sorted, so for not changing your imports order I did it separately in plain text editor.
Should I resolve conflicts myself in this file?

@rasbt
Copy link
Owner

rasbt commented May 3, 2022

Thanks a lot for the PR! Sorry, I recently got really swamped with tasks due to some paper reviews and following up on other PRs! I will hopefully have more time again soon!

  1. When I changed something in my feature branch, and upstream/master is changed while the feature development, should I merge master branch to my feature branch even if I don't have any conflict changes? Sorry if it is basic question, but I tried to search best practices and didn't find common solution for it. I tried to rebase but as I understand it is not good idea when I already push my feature branch to public, and also I had diverged branches after that. So I decided to just change my code and commit/push it without any merging and rebasing now.

This is a good point. Unless it affects your code, there is nothing to worry about. GitHub will automatically flag conflicts if they appear and we can try to deal with them then in the webinterface.

Btw I resolved one of the issues, and you may have to pull on your end.

  1. sklearn has the following statement about init() method in custom estimators:
    "There should be no logic, not even input validation, and the parameters should not be changed."
    In the current implementation of GroupTimeSeriesSplit, in init() method I checked only parameters that do not require any calculations, and the rest of the logic I implemented in split() method with invocation of the self._calculate_split_params() after these checks. I think that it looks pretty weird.
    What do you think, what will the best option here

That's a tricky one. Personally, I feel like it's fine to check e.g., input arguments etc. It's not strictly scikit-learn consistent, but I don't see any harm in it to be honest. So, I am just checking the BootstrapOutOfBag here and I also have an init check here. I don't recall it ever causing any issues (https://github.com/rasbt/mlxtend/blob/master/mlxtend/evaluate/bootstrap_outofbag.py#L44).

  1. Where is the best place to discuss this feature - here in pull request or in related issue?

I'd say it's best to discuss it here so that we don't have to jump back and forth too much.

  1. When you have time, will it be possible to see the questions from related issue? Probably I had too many questions, and I am ready to group the questions somehow if it is required.

No worries, it's all good! :). I try to keep on top of things, but I am a little bit time constrained this week.

@rasbt
Copy link
Owner

rasbt commented May 3, 2022

Good point regarding the pep8/black discrepancy. I was thinking about this a bit, but maybe it's just time to adjust to more modern times and use the 88 character limit rather than the 79.

Then, if users are strict about 79 chars, it should still be okay when they submit it. On the other hand, if they are a the 88 lines, they don't get a complain either. I think this will make the PRs also a bit more frictionless while still maintaining recommended styles. I will adjust the pep8 checker via #920

@labdmitriy
Copy link
Contributor Author

labdmitriy commented May 3, 2022

Hi Sebastian,

Thanks a lot for the PR! Sorry, I recently got really swamped with tasks due to some paper reviews and following up on other PRs! I will hopefully have more time again soon!

No problem, I will also have more free time 2 next weeks, so I can answer more quickly too.

Btw I resolved one of the issues, and you may have to pull on your end.

I guess you mean black reformatting and resolving conflict in __init__.py - I will pull it in my local feature branch.

That's a tricky one. Personally, I feel like it's fine to check e.g., input arguments etc. It's not strictly scikit-learn consistent, but I don't see any harm in it to be honest. So, I am just checking the BootstrapOutOfBag here and I also have an init check here. I don't recall it ever causing any issues (https://github.com/rasbt/mlxtend/blob/master/mlxtend/evaluate/bootstrap_outofbag.py#L44).

Great! Then I will keep it 'as is'.

I'd say it's best to discuss it here so that we don't have to jump back and forth too much.

Great, ok!

No worries, it's all good! :). I try to keep on top of things, but I am a little bit time constrained this week.

Ok, thank you!

  • I also noticed that lint checking was not successful after your last merge to origin/group-times-series, I think because of 2 reasons:
  • So because you changed pep8 configuration to 88, I reconfigured black/flake8 in my configuration and reformatted files again (including trailing comma) - and now it was checked successfully.
  • Are you considering the option to automatically sort the imports using isort?
  • For my projects I use pre-commit hooks for autoformatting/linting, it seems very convenient and probably reduces number of review iterations. Maybe such predefined configuration will be useful for contributors?
  • Do I need to prepare something for the next implementation steps until you have more free time?
  • I noticed that there are some typos in Quick Contributor Checklist after updating to black:
  1. Check the autimated tests passed.
  2. The atuomatic PEP8/black integrations may prompt you to modify the code stylistically. It would be nice if you could apply the suggested changes.

Thank you.

@rasbt
Copy link
Owner

rasbt commented May 13, 2022

  1. Ohhh, I was maybe looking at the old file. You are totally right! The new test function you added with cross_val_score is totally sufficient!

To be honest, the code looks really good to me now. The next thing on my list is to check out the Jupyter Nb then.

Yeah, I personally also always (often) separated standard lib imports and 3rd party imports (when I don't forget). So, I like the idea of adding the known_first_party parameter. I can add that.

I understand that you have too much projects so it is not a problem at all, if you want we can continue with this pull request later and have a break now with it or even stop it at all, the goal was to collaborate with you and gain an experience in contributing, and of course this is not the thing with the highest priority :)

Oh, I definitely don't want to drop this. It's a really nice and useful PR. I also learned a lot regarding black & isort. Very useful!

I will try to review the Nb either later tonight or tomorrow early morning to give some more feedback.

@labdmitriy
Copy link
Contributor Author

To be honest, the code looks really good to me now. The next thing on my list is to check out the Jupyter Nb then.

Great!
As I mentioned before, Jupyter notebook is just a draft and your feedback will be very useful because I don't know what is expected finally.
Thank you!

Yeah, I personally also always (often) separated standard lib imports and 3rd party imports (when I don't forget). So, I like the idea of adding the known_first_party parameter. I can add that.

I made the comment in the relevant pull request about updated configuration, probably mlxtend was implied not biopandas.

Oh, I definitely don't want to drop this. It's a really nice and useful PR. I also learned a lot regarding black & isort. Very useful!

I will try to review the Nb either later tonight or tomorrow early morning to give some more feedback.

Excellent!

This sounds super cool to be honest. Haha, but given that I currently already have too many things on which I am far behind, I should probably say "no." It's not that I am not interested, but I really need to finish things before I start new things!

I've been thinking about it and would like to suggest returning to this task when you have time, if there is still interest for you.
Perhaps by that time I will already have several articles on this topic, and we could update the process in your library.
I think it might be useful for both of us.

@rasbt
Copy link
Owner

rasbt commented May 14, 2022

The documentation is a great start. It looks very comprehensive, and I love the plots. What's nice about them is that they are automatically generated, so that this allows us and the users to create the plots for all kinds of scenarios.

However, regarding the documentation of use cases, I don't think it needs to be exhaustive and show all possible ways you can use it. This would be very overwhelming.

My suggestions are to focus on a few, but give the users the tools to explore the other ones if they are interested. (I.e., if we have a few well explained examples, users can copy and modify them).

So, concretely, here are a few suggestions:

  • Move the helper function into a utility function into the library so that it can be imported without taking too much attention/space in the documentation. This will make the documentation more readable. You can move it into mlxtend.evalaute.time_series. These should be private methods though that shouldn't come up in the API documentation. I think that happens automatically as long as you don't add them in __init__.py.

  • It would be nice if each headline could tell the reader what the demonstration or use case is that they are reading about. I think there only need to be a few sections in my opinion. For example,

Your first example could be:

  1. A time series cross-validation iterator with multiple training groups

Then, the second one could be

  1. Defining the gap size between training and test folds

  2. Expanding the window size

and that's it.

  • for each example, I would have the visualization followed by the a "Usage in CV" example. This way, readers don't have to read the whole docs in order to figure that out. Each example will be self-contained in this way.

  • I would get rid of the failure cases in the docs. They are more like unit tests I would say? The errors are descriptive enough to figure these issues out if users encounter them.

  • Once the general structure is there, it would be nice to add some more descriptive text

@labdmitriy
Copy link
Contributor Author

Hi Sebastian,

Thanks a lot for your feedback, I will prepare the notebook based on your requirements and will push all the changes.

@labdmitriy
Copy link
Contributor Author

labdmitriy commented May 15, 2022

Hi Sebastian,

I've made changes in Jupyter notebook and the code based on your comments, and have a few questions/notes:

Notes

  • Based on the fact that there are 2 combinations of required parameters (test_size + train_size and test_size + n_splits), I decided to include 4 examples in the notebook.
  • I decided to include plot, split and cv usage information for each example
  • I added the last cell to the notebook which includes API description (similar to another notebooks).
  • I have one trouble with highlighting when I print the split/cv info, and the word with the colon at the end is highlighted with red color.
    I found similar issue on GitHub, but don't understand how to solve it in this case, could you please help with it? You can see it after you render the documentation.

Questions

  • Should I still add *_files directory for GroupTimeSeriesSplit?

  • I noticed that there are some differencies in isort/black configuration for pre-commit hook and CI:

    isort
    Pre-commit hook: args: ["--profile", "black"]
    CI: isort --check --diff --line-length 88 --multi-line 3 --py 39 --profile black mlxtend/*

    black
    Pre-commit hook: language_version: python3.9
    CI: black --check --diff mlxtend/*

    I checked default isort configuration for black profile:

    multi_line_output: 3
    include_trailing_comma: True
    force_grid_wrap: 0
    use_parentheses: True
    ensure_newline_before_comments: True
    line_length: 88
    

    Probably line length and multiline settings are not required to specify it manually.
    But perhaps Python versions can be inconsistent because isort and black have the following default Python version parameters (considering that the Python version 3.8 is specified in CI):

    isort

    --py {all,2,27,3,310,35,36,37,38,39,auto}, --python-version {all,2,27,3,310,35,36,37,38,39,auto}
                            Tells isort to set the known standard library based on the specified Python version. Default is to assume any
                            Python 3 version could be the target, and use a union of all stdlib modules across versions. If auto is specified,
                            the version of the interpreter used to run isort (currently: 38) will be used.
    

    black

    -t, --target-version [py33|py34|py35|py36|py37|py38|py39|py310]
                                      Python versions that should be supported by
                                      Black's output. [default: per-file auto-
                                      detection]
    

    Also to start using isort/black/flake8 rules for development, contributor should specify all the configurations from hook or CI manually (and also should install these libraries), to format on save in IDE.

    Maybe configuration of the files like .isort.cfg, .black will be the solution have consistent rules everywhere? There are more modern approaches in configuration like pyproject.toml, but unfortunately it is still not supported by flake8.
    Here is a good description about configuration using different files.

    Also I saw that for flake8 you have configuration file .flake8 that can be used for IDE, default configuration is used for pre-commit hook (as I understand from here, we need to specify arguments for hooks explicitly), and for CI the are another custom settings.

    And in precommit configuration file you have the order black-flake8-isort, is it correct order (linting will be executed before all formatting steps are completed)? Maybe flake8 should be the last step?

Again I am asking too many questions 😄, but I think (and hope 🤞) it can be useful.

Thank you.

@labdmitriy
Copy link
Contributor Author

Hi @rasbt,

Could you please tell do I need to improve something for this PR?

Thank you.

@rasbt
Copy link
Owner

rasbt commented May 20, 2022

I really like this restructured version! A few points that I think can be improved

1

Screen Shot 2022-05-20 at 3 26 20 PM

It's nice to start off with a general problem introduction (just as you did). However, considering that there is https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html, people might also be curious about the relationship to TimeSeriesSplit and the features GroupTimeSeriesSplit adds.

2

A few words about the example data would be helpful.
Screen Shot 2022-05-20 at 3 29 05 PM

First, I would put the features and targets first, and then start with something like "For the following examples, we are creating an example dataset consisting of 16 training data points ...". And then you can explain that we create 6 different groups so that the first training example belongs to group 0, the next 4 to group 1, and so forth.

Btw. side questions about the implementation: do the groups do have to be consecutive order? Or could it be

groups = np.array([0, 5, 5, 5, 5, 2, 2, 2, 3, 3, 4, 4, 1, 1, 1, 1])

3

For each example here, it would also be nice to start with a few words describing what we are looking at here:
Screen Shot 2022-05-20 at 3 35 02 PM

Otherwise it is pretty good and much more accessible than before! Thanks for the update!

@rasbt
Copy link
Owner

rasbt commented May 20, 2022

I have one trouble with highlighting when I print the split/cv info, and the word with the colon at the end is highlighted with red color.
I found similar mkdocs/mkdocs#902 (comment) on GitHub, but don't understand how to solve it in this case, could you please help with it? You can see it after you render the documentation.

Sure, let's revisit this when we have the final version. I can test this in my local mkdocs version then.

Should I still add *_files directory for GroupTimeSeriesSplit?

That would not be necessary. I will also remove the other folders in the future to save space on GitHub


Good points regarding the CI/Workflow setups. Regarding line-length I just wanted to be explicit about that as a visual aide (so that it is clear that this is 88 without knowing the defaults. I think some people are still new to black and expect it to be 79 perhaps.

I think I had some issues with multiline which is why I added it but I don't remember to be honest. Do you know if you can run it --py 39 in older Python versions without issue btw?

The inconsistency you mentioned refers to the missing --py 39 in .pre-commit-config.yaml?

@labdmitriy
Copy link
Contributor Author

Hi @rasbt,

It's nice to start off with a general problem introduction (just as you did). However, considering that there is https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html, people might also be curious about the relationship to TimeSeriesSplit and the features GroupTimeSeriesSplit adds.

I described the advantages of this implementation over scikit-learn's TimeSeriesSplit

First, I would put the features and targets first, and then start with something like "For the following examples, we are creating an example dataset consisting of 16 training data points ...". And then you can explain that we create 6 different groups so that the first training example belongs to group 0, the next 4 to group 1, and so forth.

I create dataset with features and specify months as index to have more clear description for usage examples, therefore I decided to define groups and months before features and target.
Now I reordered these sections due to your recommendations, and just set the index at the end of this code block.

Btw. side questions about the implementation: do the groups do have to be consecutive order?

They could be as in your example and I have corresponding tests and description in the first section of the notebook, and I supposed that this order (not in ascending order but with continuous group values) is "consecutive".
But probably this description was not clear enough so I added additional correct/incorrect examples of the groups order.
How do you think maybe there is another word better than 'consecutive' to describe it?

For each example here, it would also be nice to start with a few words describing what we are looking at here:

Done

Sure, let's revisit this when we have the final version. I can test this in my local mkdocs version then.

Thank you!

That would not be necessary. I will also remove the other folders in the future to save space on GitHub

Great!

I think I had some issues with multiline which is why I added it but I don't remember to be honest. Do you know if you can run it --py 39 in older Python versions without issue btw?

Unfortunately I don't have such experience but probably when I will progress in articles about development, I can tell more exact.

The inconsistency you mentioned refers to the missing --py 39 in .pre-commit-config.yaml?

Here I just tried to share my thoughts that CI and pre-commit hooks has probably little different configurations for isort and black, and the decision what to add or delete depends on the desired target configuration.

To be more succinct, my another notes were about the following:

  • The specified order in pre-commit hook is black-flake8-isort, but probably black-isort-flake8 will be more correct?
  • flake8 has 3 somewhat different configurations (.flake8, pre-commit hook and CI).

Thank you.

@labdmitriy
Copy link
Contributor Author

labdmitriy commented May 21, 2022

@rasbt
I also fixed list rendering for Overview section.
Now it seems that I've made all the changes you mentioned.

@labdmitriy
Copy link
Contributor Author

Hi @rasbt,

Could you please tell if there are no plans to finish this pull request over the next few weeks?
If so then I will switch to another tasks and will come back to this PR in summer.

Thank you.

@rasbt rasbt mentioned this pull request May 24, 2022
@rasbt
Copy link
Owner

rasbt commented May 24, 2022

Thanks for updating the docs, and thanks for your patience. I currently have too many ongoing projects, so I can't check in every day. So, if you want to revisit this in summer, I can totally understand. On the other hand, I think this PR is very close. It's just a bit of polishing the docs.
I took a crack at it and did some minor rewording and adding a few sentences here and there. Maybe have a look if that looks good to you. And if you don't have any further feedback, I'd say it's good to merge after adding the Changelog entry.

Btw when going over the docs, there was mainly only one thing that I found a bit confusing. I.e., in example 4, I wasn't sure what exactly the expanding window size was?

E.g., here an image from Example 3:

Screen Shot 2022-05-24 at 9 41 47 AM

And an image from Example 4:
Screen Shot 2022-05-24 at 9 42 07 AM

What's exactly expanded here? Do you mean that there are now 4 instead of 3 training groups (but this is specified) or is there something else I am missing?

My best guess was that you mean that the splits depend on the training and test sizes, so I moved this illustration up to example 1 where we talk about the train and test group sizes. I think this way it is a more natural reading order for the user. What do you think?

@labdmitriy
Copy link
Contributor Author

Hi @rasbt,

Sorry if you feel that I am forcing you, it was not my intention. I thought that several last times I disturbed you, and decided to switch to another tasks until you have more free time.
I will definitely answer to all the questions tomorrow and will stop with my endless questions.
Sorry again.

Thank you.

@rasbt
Copy link
Owner

rasbt commented May 25, 2022

No worries! It's all good :). I am still very excited about this PR and sometimes just wish the day has more hours 😅

@labdmitriy
Copy link
Contributor Author

labdmitriy commented May 25, 2022

Hi @rasbt,

Thank you for your update, your description is much clearer and cleaner than mine, of course it is good for me. I just fixed one my typo.

Btw when going over the docs, there was mainly only one thing that I found a bit confusing. I.e., in example 4, I wasn't sure what exactly the expanding window size was?
What's exactly expanded here? Do you mean that there are now 4 instead of 3 training groups (but this is specified) or is there something else I am missing?
My best guess was that you mean that the splits depend on the training and test sizes, so I moved this illustration up to example 1 where we talk about the train and test group sizes. I think this way it is a more natural reading order for the user. What do you think?

Probably I didn't understand your recommendation from here correctly:

  1. Expanding the window size

I thought that it is about expanding the size of training or test dataset, but maybe you mean another type of window (expanding).
Anyway Example 1 seems to be useful especially after you regroup the order.
Now it is even better than I expected.

And if you don't have any further feedback, I'd say it's good to merge after adding the Changelog entry.

I added new entry to Changelog for GroupTimeSeriesSplit.
Just noticed that the section New Features and Enhancements is appears twice for 0.20.0 version so I added the description to the end of the list.

Thank you.

@rasbt
Copy link
Owner

rasbt commented May 27, 2022

Thanks, I think this PR is ready to merge now. Thanks so much for the hard and dedicated work on this! And sorry for me not being as responsive, it's a really busy time for me right now. However, I am super glad about this PR. I will maybe try to make even a new version release tonight so that it can be used.

Regarding the

Expanding the window size

comment. We can always fine-tune the documentation later. It's not coupled to the main code. But what I meant was that I found that example 4 was basically a natural extension of example 1, so I merged them. From a reader's perspective, I was thinking that this is more intuitive this way.

@rasbt rasbt merged commit 9e08841 into rasbt:master May 27, 2022
@labdmitriy
Copy link
Contributor Author

labdmitriy commented May 27, 2022

Thanks, I think this PR is ready to merge now. Thanks so much for the hard and dedicated work on this! And sorry for me not being as responsive, it's a really busy time for me right now. However, I am super glad about this PR. I will maybe try to make even a new version release tonight so that it can be used.

Thank you for your patience while answering to all my questions, it was very useful and interesting experience!

But what I meant was that I found that example 4 was basically a natural extension of example 1, so I merged them. From a reader's perspective, I was thinking that this is more intuitive this way.

You are totally correct, it is much better now.

I you don’t mind, I will write the article about this experience, because I know a lot of people (like me until recently) thinking that contributing is too difficult to even try it.
I will always be happy to discuss with you any questions and collaboration if you ever have free time and interest.

@rasbt
Copy link
Owner

rasbt commented May 27, 2022

Thank you for your patience while answering to all my questions, it was very useful and interesting experience!

Actually, this was really great. During this process, we added lots of useful things like pre-commit hooks, black, isort, etc. :)!

I you don’t mind, I will write the article about this experience, because I know a lot of people (like me until recently) thinking that contributing is too difficult to even try it.

Sure, I think this is worthwhile and will be interesting for many people!

I will always be happy to discuss with you any questions and collaboration if you ever have free time and interest.

Cool! I will keep that in mind!

@rasbt rasbt mentioned this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add group time series validation
3 participants