Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCUSSION: Should default_rng() have a seed argument and what is our policy? #16493

Closed
seberg opened this issue Jun 3, 2020 · 4 comments
Closed

Comments

@seberg
Copy link
Member

seberg commented Jun 3, 2020

We may have discussed this before, but it came up on the other issue.

@bashtage should we try to change the documentation of default_rng() to state clearly that we do not have guarantees about which bit generator is behind default_rng? I wonder if there is a point in default_rng() if we cannot fix it up without a warning.
Putting a warning on it would seem extremely annoying, even if you only need it when a seed is given. I admit, it is too bad that it means you have to do more than add a short seed when you do not want a fairly future proof (multiple numpy version) RNG stream.

So I am curious if we should even consider deprecating the seed argument to default_rng()? We could possibly give an error pointing out the (current) alternative:

TypeError: default_rng() does not support the seed argument, to achieve stable streams over different NumPy versions, please replace this call with the current default of: np.random.Generator(np.random.PCG64(seed))

Or we add to the documentation that the reproducibility is only guaranteed for the same NumPy (minor) version, but that seems like a trap for users.

@rkern
Copy link
Member

rkern commented Jun 3, 2020

So I am curious if we should even consider deprecating the seed argument to default_rng()?

Definitely not. This is integral to supporting the pattern developed in sklearn's check_random_state(). It's value is not to preserve the bitstream for a given seed across versions of numpy (something we explicitly disclaim in NEP 19), but to make building libraries easier and safer.

@seberg
Copy link
Member Author

seberg commented Jun 3, 2020

Right did not think about that. So I suppose we should probably just document clearly that default_rng can change occasionally? And maybe again document it (maybe before and after) a change with a version history of what the default was in the Notes section?

@charris
Copy link
Member

charris commented Jun 3, 2020

We should also document how to keep repeatability. There are various possibilities depending on the degree of repeatability one wants. From high to low I'd make it

  1. Same numpy wheel running on same platform with same seed (sequence)
  2. Same numpy version, bit generator, and seed (sequence)
  3. Same bit generator and seed (sequence), non-uniform distributions may vary by version.
  4. Don't care :)

The second it probably sufficient for most purposes

@seberg
Copy link
Member Author

seberg commented Jun 4, 2020

Let me close this I guess. @rkern unless the resolution of that discussion is high up on the todo list, should we add a milestoned issue for the 1.20 release to update either PCG64 or at least default_rng() if that was the take-away. Should we already backport a documentation update for 1.19 to note that this will happen?

@seberg seberg closed this as completed Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants