NUTS sampler gets 'stuck' for very long periods #776

FedericoV · 2015-06-22T15:54:06Z

Hi!

Sorry to keep opening issues - but just noticed this:

Working on a hierarchical model, again very similar to the standard model described by Thomas Wiecki:

        with pm.Model() as hierarchical_model:
            mu_b = pm.Normal('mu_beta', mu=0., sd=100**2)
            sigma_b = pm.Uniform('sigma_beta', lower=0, upper=100)

            b = pm.Normal('beta', mu=mu_b, sd=sigma_b, shape=3)

            # Model error prior
            eps = pm.Uniform('eps', lower = 0, upper = 2)

            # Linear model
            enrich_est = b[gs_code] * del_idx
            # Check to make sure this is right.

            # Data likelihood
            enrich_like = pm.Normal('enrich_like', mu=enrich_est, sd=eps, observed=tip_fluorescence)

        with hierarchical_model:
            start = pm.find_MAP()
            step = pm.NUTS(scaling=start)
            hierarchical_trace = pm.sample(2000, step, start=start, progressbar=True)

After a few looops, that mostly took this timing:

[-----------------100%-----------------] 2001 of 2000 complete in 63.5 sec

It's currently stuck like so:

 [-                 3%                  ] 62 of 2000 complete in 11487.3 sec

I suspect this is largely because of this line:

 b[gs_code] * del_idx

del_idx is a boolean array of length n, while gs_code is an array with 3 possible values (0, 1, 2), and the values are quite imbalanced:

print ((gs_code == 0) * del_idx).sum()
print ((gs_code == 1) * del_idx).sum()
print ((gs_code == 2) * del_idx).sum()
print len(gs_code)
104
16
38
4925

I also had to simplify the model quite a bit, because when I was fitting an alpha term independently (as in the original Wiecki notebook) the iteration times were incredibly long. So I just did some independent mean centering for each condition as a pre-processing step - which is of course far less precise.

        for gs in [0, 1, 2]:
            wt_and_gs_code = np.logical_and(wt_idx, gs_code == gs)
            # Get cells that are wild type and in a specific growth stage
            m = tip_fluorescence[wt_and_gs_code].mean()
            # Get mean of wild type population in a specific growth stage
            tip_fluorescence[gs_code == gs] -= m
            # Substract mean from all cells in a specific growth stage

The text was updated successfully, but these errors were encountered:

jsalvatier · 2015-06-22T16:46:42Z

This is actually quite useful for us. Any chance you could post the full model and data somewhere? I would love to be able to dig into this and optimize the slow parts. I suspect that if NUTS goes from fast to slow, that it has begun to make very large trees, which we can probably fix some way.

FedericoV · 2015-06-22T17:28:16Z

I'd be happy too - I'm currently running the other model (the one that ran out of memory) in a loop over night on my workstation to see where exactly is the leak.

I do not know how reproducible that 'frozen' state actually is though.

jsalvatier · 2015-06-22T20:16:38Z

Great, let me know when you're able to post the model/data, and I'll take a look.

FedericoV · 2015-06-23T08:52:52Z

I sent you an e-mail with the subset of the data.

jsalvatier · 2015-06-24T02:42:20Z

Thanks :)

On Tue, Jun 23, 2015 at 1:52 AM Federico Vaggi notifications@github.com
wrote:

I sent you an e-mail with the subset of the data.

—
Reply to this email directly or view it on GitHub
#776 (comment).

jwjohnson314 · 2015-11-04T18:18:14Z

I'm having the same issue, also on a model similar to that described above. My data are pretty well balanced.

twiecki · 2015-11-09T12:41:57Z

Any data/NB to reproduce the problem?

FedericoV · 2015-11-09T12:47:01Z

I had e-mailed the data to John as soon as he asked for it, but didn't hear
back. In the meanwhile, I had reformulated the problem to use emcee, and
it worked fine there.

I also ran into a limitation of the pymc3 API I could not work around, so
emcee worked well. Should I open a different ticket for that?

On Mon, 9 Nov 2015 at 13:42 Thomas Wiecki notifications@github.com wrote:

Any data/NB to reproduce the problem?

—
Reply to this email directly or view it on GitHub
#776 (comment).

twiecki · 2015-11-09T12:48:09Z

@FedericoV sure. Can you send me the data as well? firstname.lastname@gmail.com

twiecki · 2015-11-09T14:48:12Z

I think this is related to the model being close to undetermined. You place a hyperprior on the three beta coefficients which is not a whole lot to infer mu and sd for them. Here is an example the illustrates the problem:

import pymc3 as pm

with pm.Model():
    mu = pm.Normal('mu', 0, 1)
    sd = pm.HalfCauchy('sd', 1)
    obs = pm.Normal('obs', mu=mu, sd=sd, observed=[0.1, -0.1])

    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(2000, step, start=start)

Now I'm not sure whether this is a bug or a property of the posterior space which is just extremely flat.

In any case, if I remove the hyperprior in your model it converges just fine. If I keep the hyperprior but sample using Metropolis and then using the last sample as the starting point for NUTS it also works well (although there are some convergence instabilities).

@jwjohnson314 How many coefficients are in your hierarchical model?

jwjohnson314 · 2015-11-09T16:47:48Z

68 - it's a hierarchical model similar to the radon model with varying
intercept and slope terms, over 34 different sets of observations.

On Mon, Nov 9, 2015 at 9:48 AM, Thomas Wiecki notifications@github.com
wrote:

I think this is related to the model being close to undetermined. You
place a hyperprior on the three beta coefficients which is not a whole lot
to infer mu and sd for them. Here is an example the illustrates the problem:

import pymc3 as pm
with pm.Model():
mu = pm.Normal('mu', 0, 1)
sd = pm.HalfCauchy('sd', 1)
obs = pm.Normal('obs', mu=mu, sd=sd, observed=[0.1, -0.1])
start = pm.find_MAP()
step = pm.NUTS(scaling=start)
trace = pm.sample(2000, step, start=start)
[image: image]
https://cloud.githubusercontent.com/assets/674200/11036473/0941bc28-86f9-11e5-8368-f8f2387b309c.png

Now I'm not sure whether this is a bug or a property of the posterior
space which is just extremely flat.

In any case, if I remove the hyperprior in your model it converges just
fine. If I keep the hyperprior but sample using Metropolis and then using
the last sample as the starting point for NUTS it also works well (although
there are some convergence instabilities).

@jwjohnson314 https://github.com/jwjohnson314 How many coefficients are
in your hierarchical model?

—
Reply to this email directly or view it on GitHub
#776 (comment).

twiecki · 2015-11-09T20:44:55Z

Have you tried sampling with Metropolis or Slice?

jwjohnson314 · 2015-11-10T03:17:11Z

No. I'll give it a go a let you know how it does.

springcoil · 2016-02-17T22:40:11Z

Was this error resolved - maybe we could add it to some sort of user guide - or just add a warning somewhere.

twiecki · 2016-11-18T15:43:35Z

With the new init (#1523) and some model tweaks this works pretty well:

twiecki closed this as completed Nov 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NUTS sampler gets 'stuck' for very long periods #776

NUTS sampler gets 'stuck' for very long periods #776

FedericoV commented Jun 22, 2015

jsalvatier commented Jun 22, 2015

FedericoV commented Jun 22, 2015

jsalvatier commented Jun 22, 2015

FedericoV commented Jun 23, 2015

jsalvatier commented Jun 24, 2015

jwjohnson314 commented Nov 4, 2015

twiecki commented Nov 9, 2015

FedericoV commented Nov 9, 2015

twiecki commented Nov 9, 2015

twiecki commented Nov 9, 2015

jwjohnson314 commented Nov 9, 2015

twiecki commented Nov 9, 2015

jwjohnson314 commented Nov 10, 2015

springcoil commented Feb 17, 2016

twiecki commented Nov 18, 2016

NUTS sampler gets 'stuck' for very long periods #776

NUTS sampler gets 'stuck' for very long periods #776

Comments

FedericoV commented Jun 22, 2015

jsalvatier commented Jun 22, 2015

FedericoV commented Jun 22, 2015

jsalvatier commented Jun 22, 2015

FedericoV commented Jun 23, 2015

jsalvatier commented Jun 24, 2015

jwjohnson314 commented Nov 4, 2015

twiecki commented Nov 9, 2015

FedericoV commented Nov 9, 2015

twiecki commented Nov 9, 2015

twiecki commented Nov 9, 2015

jwjohnson314 commented Nov 9, 2015

twiecki commented Nov 9, 2015

jwjohnson314 commented Nov 10, 2015

springcoil commented Feb 17, 2016

twiecki commented Nov 18, 2016