Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect inference with NUTS in pymc3 #982

Closed
deepers opened this issue Feb 16, 2016 · 25 comments
Closed

Incorrect inference with NUTS in pymc3 #982

deepers opened this issue Feb 16, 2016 · 25 comments

Comments

@deepers
Copy link

deepers commented Feb 16, 2016

I'm getting what appears to be an incorrect posterior in pymc3 using NUTS for a very simple continuous toy model. The posterior disagrees with both analytic calculation and with the Metropolis posterior.

In the code that follows, I've generated synthetic data with a fixed random seed (so the results are reproducible). I then define the same generative model in pymc3, observing only the final data. Finally I compare the marginal distribution of one of the latent variables with the true analytic posterior and the Metropolis posterior. The results do not agree.

#!/usr/bin/env python2
from __future__ import division

import numpy as np
import pymc3 as mc
import scipy as sci
import theano.tensor as th

np.random.seed(13)

n = 10
tau_scale = 2
tau0 = sci.stats.expon.rvs() * tau_scale
mu0 = np.random.randn(n) / np.sqrt(tau0)
x0 = mu0 + np.random.randn(n)

with mc.Model() as model1:
    tau = mc.Exponential('tau', lam=1 / tau_scale)
    mu = mc.Normal('mu', tau=tau, shape=(n,))
    mc.Normal('x', mu=mu, observed=x0)

with mc.Model() as model2:
    tau = mc.Exponential('tau', lam=1 / tau_scale)
    mu_z = mc.Normal('mu_z', shape=(n,))
    mu = mc.Deterministic('mu', mu_z / th.sqrt(tau))
    mc.Normal('x', mu=mu, observed=x0)


def infer(model):
    with model:
        map_ = mc.find_MAP(fmin=sci.optimize.fmin_l_bfgs_b)
        step = mc.NUTS(scaling=map_)
        trace = mc.sample(100, step=step, start=map_, progressbar=False)
        step = mc.NUTS(scaling=trace[-1])
        return mc.sample(11000, step=step, start=trace[-1], progressbar=False)

trace1 = infer(model1)
trace2 = infer(model2)

with model2:
    trace3 = mc.sample(100000, step=mc.Metropolis(), progressbar=False,
                       start=mc.find_MAP(fmin=sci.optimize.fmin_l_bfgs_b))

samples_tau1 = trace1['tau'][1000:]
samples_tau2 = trace2['tau'][1000:]
samples_tau3 = trace3['tau'][10000:]

print
print 'pymc3 version: ' + mc.__version__
print
print 'Model 1 NUTS tau'
print 'Mean: {0:3.1f}'.format(samples_tau1.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau1.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau1, 50))
print
print 'Model 2 NUTS tau'
print 'Mean: {0:3.1f}'.format(samples_tau2.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau2.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau2, 50))
print
print 'Model 2 Metropolis tau'
print 'Mean: {0:3.1f}'.format(samples_tau3.mean())
print 'Standard Deviation: {0:3.1f}'.format(samples_tau3.std())
print 'Median {0:3.1f}'.format(np.percentile(samples_tau3, 50))

I've actually defined the same generative model in two slightly different ways. The output of the above program is as follows.

deepee@entropy:~$ ./test_inference.py 
Applied log-transform to tau and added transformed tau_log to model.
Applied log-transform to tau and added transformed tau_log to model.

pymc3 version 3.0

Model 1 tau
Mean: 2.5
Standard Deviation: 1.6
Median 2.1

Model 2 tau
Mean: 4.0
Standard Deviation: 2.5
Median 3.4

Model 2 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median 2.9

The true posterior of tau has mean 3.5, standard deviation 2.3, and median 3.0, which agrees with Metropolis. These values are also much more closely matched using Stan. I'm using a relatively recent commit (ca40cd3) of pymc3.

@fonnesbeck
Copy link
Member

I think you may be parameterizing our Exponential incorrectly. PyMC3 uses the scale, not the mean as its parameter:

    def logp(self, value):
        lam = self.lam
        return bound(T.log(lam) - lam * value, value > 0, lam > 0)

@deepers
Copy link
Author

deepers commented Feb 16, 2016

Hi Chris,

From your code snippet, it looks like lambda (lam) is the inverse mean of the exponential distribution, as defined here. In my example, I want tau to come from an exponential distribution with mean tau_scale, so I think it's correct to set lam=1 / tau_scale.

Thanks,
Deepee

@deepers
Copy link
Author

deepers commented Feb 25, 2016

Let me know if there's any way in which I can help with debugging or investigating this issue.

Thanks,
Deepee

@fonnesbeck
Copy link
Member

Sorry this slipped off my radar. Will try and get to it tomorrow.

@deepers
Copy link
Author

deepers commented Feb 26, 2016

Thanks Chris,
Deepee

@fonnesbeck
Copy link
Member

@deepers I'm trying to implement this in Stan, just to compare. Here's what I came up with:

data {
  int<lower=0> n; 
  real<lower=0> tau_scale; 
  real x0[n];
} 
parameters {
  real<lower=0> tau; 
  vector[n] mu;
} 
model {
  tau ~ exponential(tau_scale);
  mu ~ normal(0, 1/tau);
  for (i in 1:n) x0[i] ~ normal(mu[i], 1); 
}

Does that look good to you?

@fonnesbeck
Copy link
Member

Interestingly, if I just run sample without doing MAP estimation beforehand, that is:

trace = sample(5000)

it works fine for model 2:

Model 1 NUTS tau
Mean: 2.6
Standard Deviation: 2.0
Median 2.1

Model 2 NUTS tau
Mean: 3.6
Standard Deviation: 2.4
Median 3.0

Model 1 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median 3.0

Model 2 Metropolis tau
Mean: 3.6
Standard Deviation: 2.4
Median 3.0

Model 1 ADVI tau
Mean: 2.2
Standard Deviation: 1.6

Odd that it does not work for the Model 1 parameterization, though, and I'm surprised how poorly ADVI does! cc @twiecki

@deepers
Copy link
Author

deepers commented Feb 26, 2016

Hi @fonnesbeck,

Here's the Stan code I used...same as yours I think, but vectorized for what it's worth.

Model 1

data {
    int<lower=1> n;
    real log_tau_scale;
    vector[n] x;
}
parameters {
    real log_tau;
    vector[n] mu;
}
model {
    log_tau ~ normal(log_tau_scale, 1);
    mu ~ normal(0, 1 / exp(log_tau / 2));
    x ~ normal(mu, 1);
}

Model 2

data {
    int<lower=1> n;
    real log_tau_scale;
    vector[n] x;
}
parameters {
    real log_tau;
    vector[n] mu_z;
}
transformed parameters {
    vector[n] mu;
    mu <- mu_z / exp(log_tau / 2);
}
model {
    log_tau ~ normal(log_tau_scale, 1);
    mu_z ~ normal(0, 1);
    x ~ normal(mu, 1);
}

For variational inference, I found that I got reasonable results with Model 2. (I rolled my own VI in Python with the help of autograd.

I took out the MAP estimate from my original pymc3 code above, but it didn't give me a different result for either model 1 or model 2. :-(

Thanks,
Deepee

@twiecki
Copy link
Member

twiecki commented Feb 27, 2016

I'm getting similar results as @fonnesbeck, when not initializing with the MAP() and no burn-in it seems to work much better:
image

Also, it works fine for me when using HMC.

@fonnesbeck
Copy link
Member

It appears that the NUTS algo needs some tuning.

@deepers
Copy link
Author

deepers commented Feb 29, 2016

For the record, here are my results using no MAP or burn in with NUTS, HMC, and Metropolis. HMC and Metropolis do okay, but NUTS is still slightly off for model 2 and way off for model 1.

deepee@entropy:~/repo/magnitude/bin/deepee$ ./test_inference2.py 
Applied log-transform to tau and added transformed tau_log to model.
Applied log-transform to tau and added transformed tau_log to model.

Model 1 NUTS tau
Mean: 2.6
Standard Deviation: 2.0
Median: 2.1

Model 2 NUTS tau
Mean: 3.7
Standard Deviation: 2.5
Median: 3.0

Model 1 HamiltonianMC tau
Mean: 3.5
Standard Deviation: 2.3
Median: 2.9

Model 2 HamiltonianMC tau
Mean: 3.5
Standard Deviation: 2.3
Median: 2.9

Model 1 Metropolis tau
Mean: 3.6
Standard Deviation: 2.4
Median: 3.1

Model 2 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Here is the code I ran.

#!/usr/bin/env python2
from __future__ import division

import numpy as np
import pymc3 as mc
import scipy as sci
import theano.tensor as th

np.random.seed(13)

n = 10
tau_scale = 2
tau0 = sci.stats.expon.rvs() * tau_scale
mu0 = np.random.randn(n) / np.sqrt(tau0)
x0 = mu0 + np.random.randn(n)

with mc.Model() as model1:
    tau = mc.Exponential('tau', lam=1 / tau_scale)
    mu = mc.Normal('mu', tau=tau, shape=(n,))
    mc.Normal('x', mu=mu, observed=x0)

with mc.Model() as model2:
    tau = mc.Exponential('tau', lam=1 / tau_scale)
    mu_z = mc.Normal('mu_z', shape=(n,))
    mu = mc.Deterministic('mu', mu_z / th.sqrt(tau))
    mc.Normal('x', mu=mu, observed=x0)


output = []
for step_type in mc.NUTS, mc.HamiltonianMC, mc.Metropolis:
    for model, num in (model1, 1), (model2, 2):
        with model:
            trace = mc.sample(30000, step=step_type(),
                              progressbar=False)['tau'][3000:]
        output.extend([
            '',
            'Model {} {} tau'.format(num, step_type.__name__),
            'Mean: {:3.1f}'.format(trace.mean()),
            'Standard Deviation: {:3.1f}'.format(trace.std()),
            'Median: {:3.1f}'.format(np.percentile(trace, 50))])
print '\n'.join(output)

@deepers
Copy link
Author

deepers commented Feb 29, 2016

In the meantime I'll use HMC... Do you guys have any advice for tuning HMC parameters?
Thanks,
Deepee

@fonnesbeck
Copy link
Member

There are a few knobs one can adjust for NUTS. The target acceptance rate, speed of adaptation, scaling, leapfrog step size. The latter gets adapted automatically but the others do not.

@twiecki
Copy link
Member

twiecki commented Mar 3, 2016

You shouldn't get a different posterior though, just worse convergence.

@fonnesbeck
Copy link
Member

Its possible it could be different if HMC has been tuned poorly.

@twiecki
Copy link
Member

twiecki commented Mar 14, 2016

When reverting to commit 79971ca, I get the correct results:

Model 2 NUTS tau
Mean: 3.5
Standard Deviation: 2.3
Median 3.0
Model 2 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median 3.0

@twiecki
Copy link
Member

twiecki commented Mar 14, 2016

CC @jsalvatier

@deepers
Copy link
Author

deepers commented Mar 14, 2016

When I revert to the commit before 79971ca (viz. 14dbb45), then everything seems okay. Here's the output of my second test script. (Compare with the output from two weeks ago.) Even Model 1 is happy!

deepee@entropy:~/repo/magnitude/bin/deepee$ ./test_inference2.py 

Model 1 NUTS tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Model 2 NUTS tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Model 1 HamiltonianMC tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Model 2 HamiltonianMC tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Model 1 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median: 2.9

Model 2 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

@springcoil
Copy link
Contributor

This is fascinating and weird but glad it is resolved.

On Mon, Mar 14, 2016 at 4:40 PM, deepers notifications@github.com wrote:

When I revert to the commit before 79971ca
79971ca
(viz. 14dbb45
14dbb45),
then everything seems okay. Here's the output of my second test script.
(Compare with the output from two weeks ago.) Even Model 1 is happy!

deepee@entropy:~/repo/magnitude/bin/deepee$ ./test_inference2.py

Model 1 NUTS tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Model 2 NUTS tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Model 1 HamiltonianMC tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Model 2 HamiltonianMC tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0

Model 1 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median: 2.9

Model 2 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0


Reply to this email directly or view it on GitHub
#982 (comment).

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

@deepers
Copy link
Author

deepers commented Mar 15, 2016

Unfortunately, reverting to this previous commit re-introduces other bugs that have since been fixed. For example.

#!/usr/bin/env python2
from __future__ import division

import numpy as np
import pymc3 as mc
import theano.tensor as th

with mc.Model() as model:
    mc.Lognormal('x1', mu=0, tau=1)
    mc.Lognormal('x2', mu=0, tau=1, shape=(2,))
    y1_log = mc.Normal('y1_log', mu=0, tau=1)
    y2_log = mc.Normal('y2_log', mu=0, tau=1, shape=(2,))
    mc.Deterministic('y1', th.exp(y1_log))
    mc.Deterministic('y2', th.exp(y2_log))

pt = dict(x1_log=np.array(5.), x2_log=np.array([5., 5]),
          y1_log=np.array(5.), y2_log=np.array([5., 5]))

for var in 'x1_log', 'x2_log', 'y1_log', 'y2_log':
    print '{}: {}'.format(var, model[var].logp_elemwise(pt))

yields

deepee@entropy:~$ ./test_lognormal.py
x1_log: -13.4189385332
x2_log: [-8.41893853 -8.41893853]
y1_log: -13.4189385332
y2_log: [-13.41893853 -13.41893853]

Here you see that the vectorized log normal distribution gives the wrong log probability. This bug is fixed in the current master commit.

@fonnesbeck
Copy link
Member

Reverting is definitely not a fix. We need to fix the commit, not roll it back.

@twiecki
Copy link
Member

twiecki commented Mar 15, 2016

Agreed, the new version is also much faster.

@twiecki
Copy link
Member

twiecki commented Mar 16, 2016

This should be fixed with 141f5f7. Thanks @jsalvatier for debugging this with me.

@fonnesbeck
Copy link
Member

Nice work, guys!

@deepers
Copy link
Author

deepers commented Mar 21, 2016

Thanks Thomas and John!
Deepee

@twiecki twiecki closed this as completed Mar 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants