Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove rng seeder #5787

Merged
merged 6 commits into from
May 24, 2022
Merged

Remove rng seeder #5787

merged 6 commits into from
May 24, 2022

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented May 19, 2022

This PR removes the Model(rng_seeder=seed) API as discussed in #5785 and instead makes use of the random_seed kwarg in all sampling functions. One big breaking change done on purpose (but not strictly needed) is that these functions will not care/respect user defined external global seeding.

import pymc as pm
import numpy as np

with pm.Model() as m:
  x = pm.Normal("x")
  np.random.seed(0)  # DOES NOT MATTER!
  always_different = pm.sample()
  always_the_same = pm.sample(random_seed=0)

This is done so that default seeding quality can be "as good as possible", as suggested by NumPy best practice: https://numpy.org/doc/stable/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence

# Global seeding is ignored by numpy default_rng, 
# and that's why our PyMC sampling functions now do the same:
import numpy as np

np.random.seed(0)
print(np.random.default_rng(None).random())  # 0.7166779009481196
np.random.seed(0)
print(np.random.default_rng(None).random())  # 0.6145552524375484

Is everyone okay with this change?

Also allowed passing RandomState or Generators to all sampling functions for user convenience.

rng = np.random.default_rng()
with pm.Model() as m:
  x = pm.Normal("x")
  pm.sample(random_seed=rng)

Closes #5785
Closes #5733
Closes #5784
Closes #4301

@codecov
Copy link

codecov bot commented May 19, 2022

Codecov Report

Merging #5787 (7096547) into main (d0af6b1) will increase coverage by 0.01%.
The diff coverage is 97.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5787      +/-   ##
==========================================
+ Coverage   89.36%   89.38%   +0.01%     
==========================================
  Files          74       74              
  Lines       13804    13764      -40     
==========================================
- Hits        12336    12303      -33     
+ Misses       1468     1461       -7     
Impacted Files Coverage Δ
pymc/tuning/starting.py 92.56% <ø> (-0.13%) ⬇️
pymc/parallel_sampling.py 87.45% <66.66%> (+0.74%) ⬆️
pymc/sampling.py 88.70% <97.67%> (+0.27%) ⬆️
pymc/aesaraf.py 91.89% <100.00%> (+0.27%) ⬆️
pymc/distributions/censored.py 93.02% <100.00%> (-1.72%) ⬇️
pymc/distributions/distribution.py 90.83% <100.00%> (+0.62%) ⬆️
pymc/distributions/mixture.py 96.75% <100.00%> (-0.31%) ⬇️
pymc/distributions/timeseries.py 77.46% <100.00%> (-0.09%) ⬇️
pymc/initial_point.py 99.09% <100.00%> (-0.91%) ⬇️
pymc/model.py 86.52% <100.00%> (-0.30%) ⬇️
... and 7 more

@michaelosthege
Copy link
Member

@ricardoV94 I can't invest the time for a thorough review right now---the diff is a little big and seeding is a sensitive and tricky topic..

@ricardoV94 ricardoV94 force-pushed the remove_rng_seeder branch 3 times, most recently from ea90c4f to c8a2464 Compare May 23, 2022 10:21
@ricardoV94 ricardoV94 force-pushed the remove_rng_seeder branch 4 times, most recently from 3ba1070 to 9d65f7a Compare May 23, 2022 12:08
@ricardoV94 ricardoV94 requested a review from ferrine May 23, 2022 12:09
@ricardoV94 ricardoV94 force-pushed the remove_rng_seeder branch 2 times, most recently from 0bc7acf to 392b00d Compare May 23, 2022 12:16
Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of questions @ricardoV94. The aesaraf functions look good but I have some questions regarding sampling. How does the random seed go into the step methods at the moment? The idea of passing different "seeds" (the entropy parameter) to SeedSequence isn't the best way to ensure independent streams. The best way is to spawn as many seeders as necessary. These will all get the same entropy value, but they'll have different spawn_keys. We should aim to use this instead of different entropies somehow.

pymc/aesaraf.py Show resolved Hide resolved
pymc/aesaraf.py Show resolved Hide resolved
pymc/aesaraf.py Outdated Show resolved Hide resolved
pymc/distributions/censored.py Show resolved Hide resolved
pymc/sampling.py Outdated Show resolved Hide resolved
@ricardoV94
Copy link
Member Author

ricardoV94 commented May 23, 2022

I have a bunch of questions @ricardoV94. The aesaraf functions look good but I have some questions regarding sampling. How does the random seed go into the step methods at the moment? The idea of passing different "seeds" (the entropy parameter) to SeedSequence isn't the best way to ensure independent streams. The best way is to spawn as many seeders as necessary. These will all get the same entropy value, but they'll have different spawn_keys. We should aim to use this instead of different entropies somehow.

In forward sampling, we always go via a single compile_pymc and we do a single SeedSequence(seed).spawn(len(rngs)). This should be optimal.

In MCMC sampling, all our step methods rely on global seeding. We have to create different integer seeds for each chain. Ideally we would be passing Generators around instead, and we would probably spawn them (or their SeedSequences) before branching off. This however will require a larger refactoring. See #5093

@ricardoV94 ricardoV94 marked this pull request as ready for review May 23, 2022 13:51
pymc/aesaraf.py Outdated Show resolved Hide resolved
pymc/distributions/censored.py Show resolved Hide resolved
pymc/aesaraf.py Outdated Show resolved Hide resolved
pymc/aesaraf.py Outdated Show resolved Hide resolved
pymc/sampling.py Show resolved Hide resolved
pymc/sampling.py Outdated Show resolved Hide resolved
…bal seeding.

Together, `test_sample_does_not_set_seed` and `test_parallel_sample_does_not_reuse_seed` covered two unspoken behaviors of `sample`:
1. When no seed is specified, PyMC shall not set global seed state of numpy in the main process.
2. When no seed is specified, sampling will depend on numpy global seeding state for reproducible behavior.

Point 1 is due to PyMC legacy dependency on global seeding for step samplers. It tries to minimize "damage" by only setting global seeds when it absolutely needs to, in order to ensure deterministic sampling. Ideally calls to `numpy.seed` would never be made.

Point 2 goes against NumPy current best practices of using None when defining new Generators / SeedSequences (https://numpy.org/doc/stable/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence)

The refactored tests cover point 1 more directly, and assert the opposite of point 2.
…s in compiled function

Sampling functions now also accept RandomState or Generators as input to random_seed, similarly to how random_state behaves in scipy distributions. For backwards compatibility this argument was not renamed.
Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good @ricardoV94 !

@lucianopaz
Copy link
Contributor

@ricardoV94, I’ll let you merge since I’m not sure if you want to squash merge or rebase and preserve the history

@ricardoV94
Copy link
Member Author

@ricardoV94, I’ll let you merge since I’m not sure if you want to squash merge or rebase and preserve the history

Always rebase with me :)

@ricardoV94 ricardoV94 merged commit 62e1e9d into pymc-devs:main May 24, 2022
@ricardoV94
Copy link
Member Author

Thanks for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants