-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect inference with NUTS in pymc3 #982
Comments
I think you may be parameterizing our Exponential incorrectly. PyMC3 uses the scale, not the mean as its parameter: def logp(self, value):
lam = self.lam
return bound(T.log(lam) - lam * value, value > 0, lam > 0) |
Hi Chris, From your code snippet, it looks like lambda ( Thanks, |
Let me know if there's any way in which I can help with debugging or investigating this issue. Thanks, |
Sorry this slipped off my radar. Will try and get to it tomorrow. |
Thanks Chris, |
@deepers I'm trying to implement this in Stan, just to compare. Here's what I came up with: data {
int<lower=0> n;
real<lower=0> tau_scale;
real x0[n];
}
parameters {
real<lower=0> tau;
vector[n] mu;
}
model {
tau ~ exponential(tau_scale);
mu ~ normal(0, 1/tau);
for (i in 1:n) x0[i] ~ normal(mu[i], 1);
} Does that look good to you? |
Interestingly, if I just run
it works fine for model 2:
Odd that it does not work for the Model 1 parameterization, though, and I'm surprised how poorly ADVI does! cc @twiecki |
Hi @fonnesbeck, Here's the Stan code I used...same as yours I think, but vectorized for what it's worth. Model 1 data {
int<lower=1> n;
real log_tau_scale;
vector[n] x;
}
parameters {
real log_tau;
vector[n] mu;
}
model {
log_tau ~ normal(log_tau_scale, 1);
mu ~ normal(0, 1 / exp(log_tau / 2));
x ~ normal(mu, 1);
} Model 2 data {
int<lower=1> n;
real log_tau_scale;
vector[n] x;
}
parameters {
real log_tau;
vector[n] mu_z;
}
transformed parameters {
vector[n] mu;
mu <- mu_z / exp(log_tau / 2);
}
model {
log_tau ~ normal(log_tau_scale, 1);
mu_z ~ normal(0, 1);
x ~ normal(mu, 1);
} For variational inference, I found that I got reasonable results with Model 2. (I rolled my own VI in Python with the help of autograd. I took out the MAP estimate from my original pymc3 code above, but it didn't give me a different result for either model 1 or model 2. :-( Thanks, |
I'm getting similar results as @fonnesbeck, when not initializing with the MAP() and no burn-in it seems to work much better: Also, it works fine for me when using HMC. |
It appears that the NUTS algo needs some tuning. |
For the record, here are my results using no MAP or burn in with NUTS, HMC, and Metropolis. HMC and Metropolis do okay, but NUTS is still slightly off for model 2 and way off for model 1. deepee@entropy:~/repo/magnitude/bin/deepee$ ./test_inference2.py
Applied log-transform to tau and added transformed tau_log to model.
Applied log-transform to tau and added transformed tau_log to model.
Model 1 NUTS tau
Mean: 2.6
Standard Deviation: 2.0
Median: 2.1
Model 2 NUTS tau
Mean: 3.7
Standard Deviation: 2.5
Median: 3.0
Model 1 HamiltonianMC tau
Mean: 3.5
Standard Deviation: 2.3
Median: 2.9
Model 2 HamiltonianMC tau
Mean: 3.5
Standard Deviation: 2.3
Median: 2.9
Model 1 Metropolis tau
Mean: 3.6
Standard Deviation: 2.4
Median: 3.1
Model 2 Metropolis tau
Mean: 3.5
Standard Deviation: 2.3
Median: 3.0 Here is the code I ran. #!/usr/bin/env python2
from __future__ import division
import numpy as np
import pymc3 as mc
import scipy as sci
import theano.tensor as th
np.random.seed(13)
n = 10
tau_scale = 2
tau0 = sci.stats.expon.rvs() * tau_scale
mu0 = np.random.randn(n) / np.sqrt(tau0)
x0 = mu0 + np.random.randn(n)
with mc.Model() as model1:
tau = mc.Exponential('tau', lam=1 / tau_scale)
mu = mc.Normal('mu', tau=tau, shape=(n,))
mc.Normal('x', mu=mu, observed=x0)
with mc.Model() as model2:
tau = mc.Exponential('tau', lam=1 / tau_scale)
mu_z = mc.Normal('mu_z', shape=(n,))
mu = mc.Deterministic('mu', mu_z / th.sqrt(tau))
mc.Normal('x', mu=mu, observed=x0)
output = []
for step_type in mc.NUTS, mc.HamiltonianMC, mc.Metropolis:
for model, num in (model1, 1), (model2, 2):
with model:
trace = mc.sample(30000, step=step_type(),
progressbar=False)['tau'][3000:]
output.extend([
'',
'Model {} {} tau'.format(num, step_type.__name__),
'Mean: {:3.1f}'.format(trace.mean()),
'Standard Deviation: {:3.1f}'.format(trace.std()),
'Median: {:3.1f}'.format(np.percentile(trace, 50))])
print '\n'.join(output) |
In the meantime I'll use HMC... Do you guys have any advice for tuning HMC parameters? |
There are a few knobs one can adjust for NUTS. The target acceptance rate, speed of adaptation, scaling, leapfrog step size. The latter gets adapted automatically but the others do not. |
You shouldn't get a different posterior though, just worse convergence. |
Its possible it could be different if HMC has been tuned poorly. |
When reverting to commit 79971ca, I get the correct results:
|
CC @jsalvatier |
When I revert to the commit before 79971ca (viz. 14dbb45), then everything seems okay. Here's the output of my second test script. (Compare with the output from two weeks ago.) Even Model 1 is happy!
|
This is fascinating and weird but glad it is resolved. On Mon, Mar 14, 2016 at 4:40 PM, deepers notifications@github.com wrote:
Peadar Coyle |
Unfortunately, reverting to this previous commit re-introduces other bugs that have since been fixed. For example. #!/usr/bin/env python2
from __future__ import division
import numpy as np
import pymc3 as mc
import theano.tensor as th
with mc.Model() as model:
mc.Lognormal('x1', mu=0, tau=1)
mc.Lognormal('x2', mu=0, tau=1, shape=(2,))
y1_log = mc.Normal('y1_log', mu=0, tau=1)
y2_log = mc.Normal('y2_log', mu=0, tau=1, shape=(2,))
mc.Deterministic('y1', th.exp(y1_log))
mc.Deterministic('y2', th.exp(y2_log))
pt = dict(x1_log=np.array(5.), x2_log=np.array([5., 5]),
y1_log=np.array(5.), y2_log=np.array([5., 5]))
for var in 'x1_log', 'x2_log', 'y1_log', 'y2_log':
print '{}: {}'.format(var, model[var].logp_elemwise(pt)) yields
Here you see that the vectorized log normal distribution gives the wrong log probability. This bug is fixed in the current master commit. |
Reverting is definitely not a fix. We need to fix the commit, not roll it back. |
Agreed, the new version is also much faster. |
This should be fixed with 141f5f7. Thanks @jsalvatier for debugging this with me. |
Nice work, guys! |
Thanks Thomas and John! |
I'm getting what appears to be an incorrect posterior in pymc3 using NUTS for a very simple continuous toy model. The posterior disagrees with both analytic calculation and with the Metropolis posterior.
In the code that follows, I've generated synthetic data with a fixed random seed (so the results are reproducible). I then define the same generative model in pymc3, observing only the final data. Finally I compare the marginal distribution of one of the latent variables with the true analytic posterior and the Metropolis posterior. The results do not agree.
I've actually defined the same generative model in two slightly different ways. The output of the above program is as follows.
The true posterior of tau has mean 3.5, standard deviation 2.3, and median 3.0, which agrees with Metropolis. These values are also much more closely matched using Stan. I'm using a relatively recent commit (ca40cd3) of pymc3.
The text was updated successfully, but these errors were encountered: