Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to fix RNG state so that folds from IterativeStratification are reproducible #248

Merged
merged 5 commits into from Mar 14, 2023

Conversation

x0wllaar
Copy link

@x0wllaar x0wllaar commented Oct 12, 2022

Currently, it is impossible to pass an RNG seed to IterativeStratification, which makes getting reproducible results from it impossible. This PR exposes the shuffle parameter of the base class in the IterativeStratification constructor. It also makes some changes that allow the CV results to become reproducible.

Notably, it changes all the np.random.choice calls from within IterativeStratification to use the RNG seeded in the constructor (or the global NumPy RNG if the seed is none). It also makes some changes to the _fold_tie_break function to allow it to use the RNG state.

These changes make the folds produced by IterativeStratification reproducible if one passes random_state to the constructor.

This should fix #144. I also should mention that credit for investigating the causes of non-reproducibility should go to @VaelK and @blackcat84 (see #144 )

@nsarang
Copy link

nsarang commented Oct 23, 2022

Is this package not maintained anymore?

@dyhan316
Copy link

dyhan316 commented Feb 21, 2023

Hello @x0wllaar ! I tried to do IterativeStratification(n_splits= SPLITS, order=10, random_state = np.random.seed(0)) but when I call it multiple times, each time the result is different. Could you please provide an example of how to properly use your version of the code?

@ChristianSch
Copy link
Member

Woops, this broke the PR somehow 🙈

@x0wllaar
Copy link
Author

I pushed the required changes to my branch. I didn't quite understand what I need to change for the first part of the review about the constructor though :(.

@ChristianSch
Copy link
Member

No worries, that was just a note to myself haha. I don't have the ability to test it, but I'll just trust it works

@ChristianSch ChristianSch merged commit 21e98ad into scikit-multilearn:master Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random_state does not ensure reproducible splits in stratified kfold
4 participants