Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel jobs generate identical chains #1479

Closed
aloctavodia opened this issue Oct 25, 2016 · 16 comments
Closed

Parallel jobs generate identical chains #1479

aloctavodia opened this issue Oct 25, 2016 · 16 comments

Comments

@aloctavodia
Copy link
Member

Doing pm.sample(1000, njobs=4) used to return 4 chains, each one starting from a different point. Now it returns 4 identical chains, unless something like random_seed=None or random_seed=[1,2,3,4] is specified.

@ColCarroll
Copy link
Member

Wow! I bet this is related to this conversation, and clears up a lot.

What should behavior be in this situation? It looks like in _make_parallel, the random seed is replicated over the number of jobs. Apparently numpy thought about this, and stack overflow has a discussion.

We have three situations to think about:

  • -1 (default): respect the current random state,
  • None: reseed the rng,
  • Specified by the user: pass this value along to any rngs

In the first and third case, I would expect the behavior you pointed out to occur. I doubt this is ever the intended outcome, and would probably change _make_parallel to something like:

def _make_parallel(rstate, njobs):
    if rstate is None:
        return [None for _ in range(njobs)]
    rsize= shape(rstate)
    if rsize and rsize[0] == njobs:
        return rstate
    raise SomeCustomException('Informative error message')

@springcoil
Copy link
Contributor

Wow, that's interesting. I think what needs done now is change _make_parallel to something like you suggested and maybe add in a regression test for what @aloctavodia observed.

@springcoil
Copy link
Contributor

@ColCarroll I think that change is a good idea.
Maybe a regression test for what @aloctavodia observed.

A very interesting observation.

@ColCarroll
Copy link
Member

This would be a definite breaking change (though arguably it would only break on problems like what @aloctavodia observed, which is A Good Thing)

I can put together a PR/test tonight.

@springcoil
Copy link
Contributor

@fonnesbeck how do you feel about a breaking change like this? I guess all we'd need to do is to announce it in the documentation and on twitter.

@springcoil
Copy link
Contributor

Great work btw @aloctavodia I love the things you discover :)

@fonnesbeck
Copy link
Member

fonnesbeck commented Oct 25, 2016

I could have sworn we've seen this before and had it fixed, but I can't remember when. Anyhow, definitely needs fixing. If you break something that gave you the wrong answer, I don't consider that a breaking change. If you are passing along your own random seeds, which is the only way to get a right answer right now, this should not break it. I don't think it requires any announcement or warning.

@fonnesbeck
Copy link
Member

Seems odd that this should happen, though because if random_seed is left to -1, then numpy.random.seed does not get called, so the chains should be initialized with arbitrary values, not the same value (unless this is being done elsewhere that I am forgetting about).

At any rate, I think just setting the default value to None will fix it. _make_parallel would return a list of Nones that would result in different unseeded chains.

@ColCarroll
Copy link
Member

@fonnesbeck yeah -- passing your own random seeds or None gives you the right answer. In the past, None was the default, which was changed because it would randomly reseed the generator.

@ColCarroll
Copy link
Member

In particular, the goal was that

np.random.seed(42)
pm.sample(100)
print(np.random.random())

be deterministic. Using None as a default makes it non-deterministic.

@fonnesbeck
Copy link
Member

Ah, right. I knew it used to work. I think its appropriate not to have any expectation of chain seeding (other than expecting them not to be seeded identically) if you do not manually specify your seed to sample.

@ColCarroll
Copy link
Member

That's reasonable -- maybe a warning message that pymc is re-seeding the rngs, and how to ask it not to?

@fonnesbeck
Copy link
Member

That's not a bad idea.

@aloctavodia
Copy link
Member Author

@springcoil I am glad to help. Thank you all guys for the quick fix!

@ColCarroll
Copy link
Member

#1481

@hvasbath
Copy link
Contributor

hvasbath commented Nov 1, 2016

I am late to the party here, but for the future ...
This is where we were talking about it before @fonnesbeck

#879 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants