Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: np.random.default_gen() #13840

Merged
merged 10 commits into from Jun 28, 2019
Merged

ENH: np.random.default_gen() #13840

merged 10 commits into from Jun 28, 2019

Conversation

rkern
Copy link
Member

@rkern rkern commented Jun 26, 2019

We achieved some consensus in the default BitGenerator thread that we want numpy to express its opinion about the default BitGenerator through a function. Here's that function.

Along the way, I renamed the seed_seq arguments to seed and cleaned up the descriptions of that parameter. I think it's clearer this way. It puts the more familiar concepts ("a seed") up front before the newer concepts ("a SeedSequence").

I removed the argumentless Generator() construction in favor of default_gen(). I did notice that np.random.Generator().<method>() was being referenced a lot in the docstrings, as a text-substitution for np.random.<method>(). I changed many of these to explicitly create rng = np.random.default_gen() and then use the rng.<method>().

But then I noticed that we do have a default global Generator instance in np.random.generator._random_generator with function-style aliases, e.g. np.random.generator.normal(). These don't appear to be documented anywhere, and I don't recall discussing if it's a thing that we actually want in the 1.17 release (as it will commit us to maintaining them and the underlying global, and may encourage bad habits like re-seeding the global instance in the middle of a library). It was a reasonable choice for randomgen, and it might be for numpy 1.18, but I question if that's the right choice for numpy 1.17.

That said, the check_random_state() idiom is a good one, and it relies on having a global default instance somewhere. It might be best to provide that in numpy.random rather than spread out in all of the different libraries. This also raises the question what should default_gen() and default_gen(None) do? Currently, I implemented it to just create a fresh PCG64 instance with OS entropy. But maybe default_gen() should be more like check_random_state():

default_gen()  # -> global Generator
default_gen(None)  # -> global Generator
default_gen(seed)  # -> Generator(PCG64(seed))
default_gen(bit_generator_instance)  # -> Generator(bit_gen_instance)
default_gen(generator_instance)  # -> generator_instance

What do you think? I'm neutral on maintaining and documenting the global instance, and -1 on the method aliases.

The most common use will certainly be to pass in an integer seed, so I want to
make sure that we talk in terms of "seeds" first and foremost. I don't want to
overload people with new concepts and terminology right out of the gate.
@mattip
Copy link
Member

mattip commented Jun 27, 2019

But then I noticed that we do have a default global Generator instance in np.random.generator._random_generator with function-style aliases, e.g. np.random.generator.normal()

I think this was left by mistake and should be removed.

That said, the check_random_state() idiom is a good one, and it relies on having a global default instance somewhere.

What is the motivation for returning a singleton instance? To save the cost of sampling system entropy in creating a new Bitgenerator? I don't think we should add a singleton, since it can be abused in multi-threaded contexts. We can always add it later.

@rkern
Copy link
Member Author

rkern commented Jun 27, 2019

In the cases in which you would get the global, it's fine to reuse it in different threads. It's only when you want to reproduce the results that sharing an instance across threads would be a problem. Since we don't allow reseeding in-place, this would not come up.

But we can go ahead and delete it for now. We'll see who howls.

Going through those examples suggests that default_gen() should also take a BitGenerator or Generator () so that it can be used as a general "take this value and give me the Generator for it". This will let it be be used similarly to check_random_state() without the global.

@@ -167,7 +167,7 @@

from . import mtrand
from .mtrand import *
from .generator import Generator
from .generator import Generator, default_gen
Copy link
Member

@mattip mattip Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 9 Generator().function -> default_gen().function

@mattip
Copy link
Member

mattip commented Jun 27, 2019

other than a couple of documentation nits, LGTM

@mattip
Copy link
Member

mattip commented Jun 28, 2019

needs a rebase

@mattip mattip merged commit 0ec7f12 into numpy:master Jun 28, 2019
@mattip
Copy link
Member

mattip commented Jun 28, 2019

Thanks @rkern

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants