Consider removing `rng_seeder` user-interface from Model #5785

ricardoV94 · 2022-05-19T11:20:38Z

Right now draw, prior and posterior_predictive can be seeded only from rng_seeder passed when creating a new pm.Model. This makes it convenient in that a user does not need to pass seeds to these functions individually to get seeded results, but it is less convenient in that there is an "invisible" dependency from the order and number of methods that are called.

In concrete, if you call pm.draw before pm.sample_prior_predictive you will get different results than if you do it in the opposite order.

import pymc as pm
with pm.Model(rng_seeder=3) as m:
    x = pm.Normal("x")
    print("draw: ", pm.draw(x))
    print("prior: ", pm.sample_prior_predictive(samples=1, return_inferencedata=False)["x"][0])
    
with pm.Model(rng_seeder=3) as m:
    x = pm.Normal("x")
    print("prior: ", pm.sample_prior_predictive(samples=1, return_inferencedata=False)["x"][0])
    print("draw: ", pm.draw(x))

draw:  1.1557538103058285
prior:  -1.2931919932524942
prior:  1.1557538103058285
draw:  -1.2931919932524942

This happens because shared RandomState/Generator variables for the model RVs are only set once when they are created (based on the rng_seeder) and never re-seeded. There is currently no easy mechanism to re-seed all variables although it seems like there was one at some point during the V4 refactor, probably when we were using a RandomStream under the hood.

If there was one mechanism (which will probably be reintroduced when #5733 gets fixed), users could in theory call Model.seed(seed) between calls to control/reset the seeding. It would look something like this:

import pymc as pm
with pm.Model(rng_seeder=3) as m:
    x = pm.Normal("x")
    print("draw1: ", pm.draw(x))

    # Does not exist currently
    m.seed(3)
    print("prior1: ", pm.sample_prior_predictive(samples=1, return_inferencedata=False)["x"][0])
    
with pm.Model(rng_seeder=3) as m:
    x = pm.Normal("x")
    print("prior2: ", pm.sample_prior_predictive(samples=1, return_inferencedata=False)["x"][0])

    # Does not exist currently
    m.seed(3)
    print("draw2: ", pm.draw(x))

draw:  1.1557538103058285
prior:  1.1557538103058285
prior:  1.1557538103058285
draw:  1.1557538103058285

But if users want to do this, it seems more naturally to allow them to pass a seed directly to these methods and then do the re-seeding internally for the user. It would look something like this:

import pymc as pm
# rng_seeder would be removed to avoid confusion over which "way" is best
with pm.Model() as m:
    x = pm.Normal("x")
    # Neither of these methods accepts seeds currently
    print("draw1: ", pm.draw(x, seed=3))
    print("prior1: ", pm.sample_prior_predictive(samples=1, seed=3, return_inferencedata=False)["x"][0])
    
with pm.Model() as m:
    x = pm.Normal("x")
    # Neither of these methods accepts seeds currently
    print("prior2: ", pm.sample_prior_predictive(samples=1, seed=3, return_inferencedata=False)["x"][0])
    print("draw2: ", pm.draw(x, seed=3))

With the same results as in the previous example.

Right now only sample accepts seeds explicitly, but if that's missing it defaults to rng_seeder. IMHO it is confusing to allow two ways to seed things, and this has emerged a couple of times in pymc-examples (cc @OriolAbril).

There is one tiny technical difference in that users can directly specify a seed per chain only via sample, but I think this is not enough reason to justify keeping two interfaces for that function. Also it's probably safer if users pass a single seed and we spawn multiple ones "smartly", to avoid the type of issues described in #5733

Creating this as an issue instead of discussion, because I think it should be addressed quickly to not create confusion as we release V4 / too much work re-running pymc-examples.

@pymc-devs/dev-core

The text was updated successfully, but these errors were encountered:

twiecki · 2022-05-19T11:59:50Z

I like the second API better.

…

On Thu, May 19, 2022, 13:22 Ricardo Vieira ***@***.***> wrote: Right now draw, prior and posterior_predictive can be seeded only from rng_seeder passed when creating a new pm.Model. This makes it convenient in that a user does not need to pass seeds to these functions individually to get seeded results, but it is less convenient in that there is an "invisible" dependency from the order and number of methods that are called. In concrete, if you call pm.draw before pm.sample_prior_predictive you will get different results than if you do it in the opposite order. import pymc as pmwith pm.Model(rng_seeder=3) as m: x = pm.Normal("x") print("draw: ", pm.draw(x)) print("prior: ", pm.sample_prior_predictive(samples=1, return_inferencedata=False)["x"][0]) with pm.Model(rng_seeder=3) as m: x = pm.Normal("x") print("prior: ", pm.sample_prior_predictive(samples=1, return_inferencedata=False)["x"][0]) print("draw: ", pm.draw(x)) draw: 1.1557538103058285prior: -1.2931919932524942prior: 1.1557538103058285draw: -1.2931919932524942 This happens because shared RandomState/Generator variables for the model RVs are only set once when they are created (based on the rng_seeder) and never re-seeded. There is currently no easy mechanism to re-seed all variables although it seems like there was one at some point during the V4 refactor, probably when we were using a RandomStream under the hood. If there was one mechanism (which will probably be reintroduced when #5733 <#5733> gets fixed), users could in theory call Model.seed(seed) between calls to control/reset the seeding. It would look something like this: import pymc as pmwith pm.Model(rng_seeder=3) as m: x = pm.Normal("x") print("draw1: ", pm.draw(x)) # Does not exist currently m.seed(3) print("prior1: ", pm.sample_prior_predictive(samples=1, return_inferencedata=False)["x"][0]) with pm.Model(rng_seeder=3) as m: x = pm.Normal("x") print("prior2: ", pm.sample_prior_predictive(samples=1, return_inferencedata=False)["x"][0]) # Does not exist currently m.seed(3) print("draw2: ", pm.draw(x)) draw: 1.1557538103058285 prior: 1.1557538103058285 prior: 1.1557538103058285 draw: 1.1557538103058285 But if users want to do this, it seems more naturally to allow them to pass a seed directly to these methods and then do the re-seeding internally for the user. It would look something like this: import pymc as pm# rng_seeder would be removed to avoid confusion over which "way" is bestwith pm.Model() as m: x = pm.Normal("x") print("draw1: ", pm.draw(x, seed=3)) print("prior1: ", pm.sample_prior_predictive(samples=1, seed=3, return_inferencedata=False)["x"][0]) with pm.Model() as m: x = pm.Normal("x") print("prior2: ", pm.sample_prior_predictive(samples=1, seed=3, return_inferencedata=False)["x"][0]) print("draw2: ", pm.draw(x, seed=3)) With the same results as in the previous example. Right now only sample accepts seeds explicitly, but if that's missing it defaults to rng_seeder. IMHO it is confusing to allow two ways to seed things, and this has emerged a couple of times in pymc-examples (cc @OriolAbril <https://github.com/OriolAbril>). There is one tiny technicality in that users can directly specify a seed per chain in sample, but I think this is not enough reason to justify keeping two interfaces for that function. Also it's probably safer if users pass a single seed and we spawn multiple ones, to avoid the issues described in #5733 <#5733> ------------------------------ Creating this as an issue instead of discussion, because I think it should be addressed quickly to not create confusion as we release V4 / too much work re-running pymc-examples. @pymc-devs/dev-core <https://github.com/orgs/pymc-devs/teams/dev-core> — Reply to this email directly, view it on GitHub <#5785>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFETGFE4LBF3BSECA4F5GDVKYQAXANCNFSM5WL32SWQ> . You are receiving this because you are on a team that was mentioned.Message ID: ***@***.***>

michaelosthege · 2022-05-19T14:52:26Z

Yes, I would also prefer passing seed to the functions instead of a RNG-stateful Model.

lucianopaz · 2022-05-19T17:14:33Z

How do you think this should look like internally? Should we add an entry in the model that points to the shared random generators that feed into the random variables and swap them with new ones in a copy or a function graph? Or should we set their values somehow?
I’m thinking about this keeping in mind that in the near future we want to make it easy to pickle a compiled function and unpickle it, swaping the random seeds in the copies.

ricardoV94 · 2022-05-19T18:08:11Z

@lucianopaz I opened a draft PR in #5787.

Internally it just swaps RNGs in compile_pymc, as we were already collecting the updates there anyway.

There are two accessible helper functions to find all RNG variables in a graph and update them, which could be used when copying a compiled function.

ricardoV94 added request discussion v4 labels May 19, 2022

ricardoV94 added this to the v4.0.0 milestone May 19, 2022

ricardoV94 mentioned this issue May 19, 2022

Remove rng seeder #5787

Merged

ricardoV94 removed the request discussion label May 21, 2022

ricardoV94 closed this as completed in #5787 May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider removing `rng_seeder` user-interface from Model #5785

Consider removing `rng_seeder` user-interface from Model #5785

ricardoV94 commented May 19, 2022 •

edited

Loading

twiecki commented May 19, 2022 via email

michaelosthege commented May 19, 2022

lucianopaz commented May 19, 2022

ricardoV94 commented May 19, 2022 •

edited

Loading

Consider removing rng_seeder user-interface from Model #5785

Consider removing rng_seeder user-interface from Model #5785

Comments

ricardoV94 commented May 19, 2022 • edited Loading

twiecki commented May 19, 2022 via email

michaelosthege commented May 19, 2022

lucianopaz commented May 19, 2022

ricardoV94 commented May 19, 2022 • edited Loading

Consider removing `rng_seeder` user-interface from Model #5785

Consider removing `rng_seeder` user-interface from Model #5785

ricardoV94 commented May 19, 2022 •

edited

Loading

ricardoV94 commented May 19, 2022 •

edited

Loading