New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to fix RNG state so that folds from IterativeStratification are reproducible #248
Add ability to fix RNG state so that folds from IterativeStratification are reproducible #248
Conversation
Is this package not maintained anymore? |
Hello @x0wllaar ! I tried to do |
Woops, this broke the PR somehow 🙈 |
I pushed the required changes to my branch. I didn't quite understand what I need to change for the first part of the review about the constructor though :(. |
No worries, that was just a note to myself haha. I don't have the ability to test it, but I'll just trust it works |
Currently, it is impossible to pass an RNG seed to IterativeStratification, which makes getting reproducible results from it impossible. This PR exposes the shuffle parameter of the base class in the IterativeStratification constructor. It also makes some changes that allow the CV results to become reproducible.
Notably, it changes all the np.random.choice calls from within IterativeStratification to use the RNG seeded in the constructor (or the global NumPy RNG if the seed is none). It also makes some changes to the _fold_tie_break function to allow it to use the RNG state.
These changes make the folds produced by IterativeStratification reproducible if one passes random_state to the constructor.
This should fix #144. I also should mention that credit for investigating the causes of non-reproducibility should go to @VaelK and @blackcat84 (see #144 )