-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Description
π Feature
It would be nice to replicate numpy's RandomState API, which creates an object containing the state of a random number generator. The object has methods for generating random numbers. i.e.
rng = torch.RandomState(seed=1234)
data = rng.randn(3, 4) # (create a 3x4 random matrix from a standard normal distribution)
Currently, you can actually do this, though the feature is undocumented and not quite as nice as numpy's way:
rng = torch.Generator()
rng.manual_seed(1234)
data = torch.randn(3, 4, generator=rng)
Motivation
Pytorch is currently a bit awkward when it comes to random seeds, because the only officially supported option is to globally set random seeds. For reproducibility and model-comparison, it sometimes makes sense to have separate random number generators for, e.g. initial-parameters and stochastic-inference.
e.g.
model = create_mlp_with_dropout(
params = initialize_mlp_params(
layer_sizes = [784, 500, 500, 10],
rng = torch.RandomState(param_seed)
),
rng = torch.RandomState(inference_seed) # random seed for inference
)
for x, y in sample_minibatches(data, rng=torch.RandomState(data_seed)):
model.train(x, y)
Of course you could do this by manually reseeding the global random number generator, passing a random number generator object is nicer because it makes the dependence on the seed explicit.
Alternatives
You could also just document the current solution, though numpy's seems more elegant.
Another alternative would be just to add a context manager with torch.use_random_state(generator=rng): ...
, which sets the global generator to rng
on entrance and reverts to the previous on exit. However it still seems more pythonic to pass the generator as an optional argument than be messing around with global variables.
Additional context
This arose from a discussion here