Distributed draws evenly among chains by default #2743

fonnesbeck · 2017-12-04T03:47:21Z

This addresses the issue in #2739 where the number of samples drawn does not reflect what was requested. Draws are now evenly distributed among chains. Now, the following occurs, as expected:

In [1]: import pymc3 as pm

In [2]: with pm.Model() as m: x = pm.Normal('x')

In [3]: with m: trace = pm.sample(1000)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|█████████████████████████████████████| 1000/1000 [00:00<00:00, 1273.81it/s]

In [4]: trace['x'].shape
Out[4]: (1000,)

I also removed the sampler warning for chains with <500 samples. If you ask for fewer than 500 samples, you should receive them without generating a warning.

… warning

ColCarroll · 2017-12-04T04:03:59Z

I like this -- right now, asking for 1000 samples with njobs=4 gives you a progress bar with 1500 iterations, and then the actual trace has 4000 samples.

One corner case is that int(draws / chains) * chains might not be equal to draws. Perhaps that is the place to give a warning?

junpenglao · 2017-12-04T06:48:18Z

Is this change really necessary? The current behaviour is similar to Stan, which might help minimize the mental switch for ppl working with both or changing from Stan to PyMC3.

aseyboldt · 2017-12-04T15:31:40Z

I guess whether we want to distribute the draws or not depends a bit on how you interpret a call like sample(draws=1000, chains=4). Does that mean: "give me 1000 samples, but generate those in 4 chains", or does it mean "give me 4 chains of 1000 samples each".
I usually think about it the second way, but I can't think of a good reason why one of them would be "right".

A couple of arguments either way:

Most of the time people don't specify chains explicitly. So that would favour the first interpretation.
Maybe we want people to think about chains, because they are important for convergence checking.
The first option suggests that multiple chains are a means to get more samples faster. I don't think this actually works well and should be encouraged, as a huge amount of work goes into initialisation, and for larger models even single chains often are parallel already.
traces don't distinguish between chains unless you ask them, so that would also favour the first option.
If the number of chains is relatively high, you easily end up running mostly tuning steps. For example, if you ask for 1000 samples in 4 chains, you'd end up doing 4 * 500 + 4 * 250 steps. So two thirds of the samples are discarded, and in many model the tuning steps are even slower than the final steps. I don't think rhat or n_effective would be particularly accurate with only 250 samples per chain. So we'd have to increase the default number of draws and also teach people not to use draws=500 or even draws=1000.

Overall, I like the current behaviour more...

fonnesbeck · 2017-12-04T15:56:02Z

I think the critical distinction is that when you ask for n draws (particularly with no other arguments), you get n samples and not 2n, etc. Getting 2000 samples when you ask for 1000 doesn't seem like the right behavior.

It’s true you end up with a lot of tuning with multiple chains, but I don’t think it’s a huge issue given how few tuning steps are needed for most models. I would imagine 2 or 4 chains is what most folks end up using most of the time.

At the end of the day, I think most users just want a certain number of samples, and aren't necessarily concerned with how they got there. All we are using parallel chains for is to calculate diagnostics and to improve the efficiency of sampling.

fonnesbeck · 2017-12-11T15:18:21Z

How do we want to proceed here? One option is to revert to sampling from one core by default, so that asking for n draws without other arguments gives you n samples. Otherwise, we would need something like this PR to divide them evenly among the cores.

junpenglao · 2017-12-11T15:38:54Z

If we are going forward with this change, we should increase the default number of draws.

twiecki · 2017-12-11T20:48:42Z

I'm in favor of merging. If I think about how many samples I want I don't think about the number of chains. We also had the occasional question why multiprocessing wasn't faster, so this would be less confusing in this regard as well.

fonnesbeck · 2018-03-19T20:57:15Z

This PR is obsolete now; closing.

Distributed draws evenly among chains by default; removed <500 sample…

a64c755

… warning

Added warning for uneven number of samples per chain

8f5bda2

Set default draws to 1000 in sample

6dc391f

junpenglao added the request discussion label Jan 27, 2018

fonnesbeck closed this Mar 19, 2018

aseyboldt deleted the correct_samples branch June 13, 2018 11:05

star-yar mentioned this pull request Oct 12, 2019

Ch2&Ch3 - I got every trace with the number of several times draws CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers#419

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed draws evenly among chains by default #2743

Distributed draws evenly among chains by default #2743

fonnesbeck commented Dec 4, 2017

ColCarroll commented Dec 4, 2017

junpenglao commented Dec 4, 2017

aseyboldt commented Dec 4, 2017

fonnesbeck commented Dec 4, 2017 •

edited

Loading

fonnesbeck commented Dec 11, 2017

junpenglao commented Dec 11, 2017

twiecki commented Dec 11, 2017

fonnesbeck commented Mar 19, 2018

Distributed draws evenly among chains by default #2743

Distributed draws evenly among chains by default #2743

Conversation

fonnesbeck commented Dec 4, 2017

ColCarroll commented Dec 4, 2017

junpenglao commented Dec 4, 2017

aseyboldt commented Dec 4, 2017

fonnesbeck commented Dec 4, 2017 • edited Loading

fonnesbeck commented Dec 11, 2017

junpenglao commented Dec 11, 2017

twiecki commented Dec 11, 2017

fonnesbeck commented Mar 19, 2018

fonnesbeck commented Dec 4, 2017 •

edited

Loading