Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUTS sampler gets 'stuck' for very long periods #776

Closed
FedericoV opened this issue Jun 22, 2015 · 15 comments
Closed

NUTS sampler gets 'stuck' for very long periods #776

FedericoV opened this issue Jun 22, 2015 · 15 comments

Comments

@FedericoV
Copy link

Hi!

Sorry to keep opening issues - but just noticed this:

Working on a hierarchical model, again very similar to the standard model described by Thomas Wiecki:

        with pm.Model() as hierarchical_model:
            mu_b = pm.Normal('mu_beta', mu=0., sd=100**2)
            sigma_b = pm.Uniform('sigma_beta', lower=0, upper=100)

            b = pm.Normal('beta', mu=mu_b, sd=sigma_b, shape=3)

            # Model error prior
            eps = pm.Uniform('eps', lower = 0, upper = 2)

            # Linear model
            enrich_est = b[gs_code] * del_idx
            # Check to make sure this is right.

            # Data likelihood
            enrich_like = pm.Normal('enrich_like', mu=enrich_est, sd=eps, observed=tip_fluorescence)

        with hierarchical_model:
            start = pm.find_MAP()
            step = pm.NUTS(scaling=start)
            hierarchical_trace = pm.sample(2000, step, start=start, progressbar=True)

After a few looops, that mostly took this timing:

[-----------------100%-----------------] 2001 of 2000 complete in 63.5 sec

It's currently stuck like so:

 [-                 3%                  ] 62 of 2000 complete in 11487.3 sec

I suspect this is largely because of this line:

 b[gs_code] * del_idx

del_idx is a boolean array of length n, while gs_code is an array with 3 possible values (0, 1, 2), and the values are quite imbalanced:

print ((gs_code == 0) * del_idx).sum()
print ((gs_code == 1) * del_idx).sum()
print ((gs_code == 2) * del_idx).sum()
print len(gs_code)
104
16
38
4925

I also had to simplify the model quite a bit, because when I was fitting an alpha term independently (as in the original Wiecki notebook) the iteration times were incredibly long. So I just did some independent mean centering for each condition as a pre-processing step - which is of course far less precise.

        for gs in [0, 1, 2]:
            wt_and_gs_code = np.logical_and(wt_idx, gs_code == gs)
            # Get cells that are wild type and in a specific growth stage
            m = tip_fluorescence[wt_and_gs_code].mean()
            # Get mean of wild type population in a specific growth stage
            tip_fluorescence[gs_code == gs] -= m
            # Substract mean from all cells in a specific growth stage
@jsalvatier
Copy link
Member

This is actually quite useful for us. Any chance you could post the full model and data somewhere? I would love to be able to dig into this and optimize the slow parts. I suspect that if NUTS goes from fast to slow, that it has begun to make very large trees, which we can probably fix some way.

@FedericoV
Copy link
Author

I'd be happy too - I'm currently running the other model (the one that ran out of memory) in a loop over night on my workstation to see where exactly is the leak.

I do not know how reproducible that 'frozen' state actually is though.

@jsalvatier
Copy link
Member

Great, let me know when you're able to post the model/data, and I'll take a look.

@FedericoV
Copy link
Author

I sent you an e-mail with the subset of the data.

@jsalvatier
Copy link
Member

Thanks :)

On Tue, Jun 23, 2015 at 1:52 AM Federico Vaggi notifications@github.com
wrote:

I sent you an e-mail with the subset of the data.


Reply to this email directly or view it on GitHub
#776 (comment).

@jwjohnson314
Copy link

I'm having the same issue, also on a model similar to that described above. My data are pretty well balanced.

@twiecki
Copy link
Member

twiecki commented Nov 9, 2015

Any data/NB to reproduce the problem?

@FedericoV
Copy link
Author

I had e-mailed the data to John as soon as he asked for it, but didn't hear
back. In the meanwhile, I had reformulated the problem to use emcee, and
it worked fine there.

I also ran into a limitation of the pymc3 API I could not work around, so
emcee worked well. Should I open a different ticket for that?

On Mon, 9 Nov 2015 at 13:42 Thomas Wiecki notifications@github.com wrote:

Any data/NB to reproduce the problem?


Reply to this email directly or view it on GitHub
#776 (comment).

@twiecki
Copy link
Member

twiecki commented Nov 9, 2015

@FedericoV sure. Can you send me the data as well? firstname.lastname@gmail.com

@twiecki
Copy link
Member

twiecki commented Nov 9, 2015

I think this is related to the model being close to undetermined. You place a hyperprior on the three beta coefficients which is not a whole lot to infer mu and sd for them. Here is an example the illustrates the problem:

import pymc3 as pm

with pm.Model():
    mu = pm.Normal('mu', 0, 1)
    sd = pm.HalfCauchy('sd', 1)
    obs = pm.Normal('obs', mu=mu, sd=sd, observed=[0.1, -0.1])

    start = pm.find_MAP()
    step = pm.NUTS(scaling=start)
    trace = pm.sample(2000, step, start=start)

image

Now I'm not sure whether this is a bug or a property of the posterior space which is just extremely flat.

In any case, if I remove the hyperprior in your model it converges just fine. If I keep the hyperprior but sample using Metropolis and then using the last sample as the starting point for NUTS it also works well (although there are some convergence instabilities).

@jwjohnson314 How many coefficients are in your hierarchical model?

@jwjohnson314
Copy link

68 - it's a hierarchical model similar to the radon model with varying
intercept and slope terms, over 34 different sets of observations.

On Mon, Nov 9, 2015 at 9:48 AM, Thomas Wiecki notifications@github.com
wrote:

I think this is related to the model being close to undetermined. You
place a hyperprior on the three beta coefficients which is not a whole lot
to infer mu and sd for them. Here is an example the illustrates the problem:

import pymc3 as pm
with pm.Model():
mu = pm.Normal('mu', 0, 1)
sd = pm.HalfCauchy('sd', 1)
obs = pm.Normal('obs', mu=mu, sd=sd, observed=[0.1, -0.1])

start = pm.find_MAP()
step = pm.NUTS(scaling=start)
trace = pm.sample(2000, step, start=start)

[image: image]
https://cloud.githubusercontent.com/assets/674200/11036473/0941bc28-86f9-11e5-8368-f8f2387b309c.png

Now I'm not sure whether this is a bug or a property of the posterior
space which is just extremely flat.

In any case, if I remove the hyperprior in your model it converges just
fine. If I keep the hyperprior but sample using Metropolis and then using
the last sample as the starting point for NUTS it also works well (although
there are some convergence instabilities).

@jwjohnson314 https://github.com/jwjohnson314 How many coefficients are
in your hierarchical model?


Reply to this email directly or view it on GitHub
#776 (comment).

@twiecki
Copy link
Member

twiecki commented Nov 9, 2015

Have you tried sampling with Metropolis or Slice?

@jwjohnson314
Copy link

No. I'll give it a go a let you know how it does.

@springcoil
Copy link
Contributor

Was this error resolved - maybe we could add it to some sort of user guide - or just add a warning somewhere.

@twiecki
Copy link
Member

twiecki commented Nov 18, 2016

With the new init (#1523) and some model tweaks this works pretty well:
image

@twiecki twiecki closed this as completed Nov 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants