-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not recalculate gradient in NUTS #1730
Conversation
How did you test the speedup? It works for me, if I try a large model. Fitting this (probably not particularly useful) model took 2:50 without, and 1:35 with this patch: import pymc3
import numpy as np
import theano.tensor as tt
model = pymc3.Model()
data = np.random.randn(200, 1000)
with model:
mu = pymc3.Normal("mu", mu=0, sd=1, shape=1000)
mu_ = tt.dot(data, mu)
pymc3.Normal("measure", mu=mu_, sd=1, observed=np.ones(200))
with model:
step = pymc3.NUTS(profile=True)
trace = pymc3.sample(1000, step=step, init=None,
tune=500, progressbar=True) I had to work around the test_value mismatch for |
Wow, thanks for testing it out so quickly! I've got a way-too-simplistic benchmark I use, of just sampling from a normal distribution 100k times. On master it is ~4.1k samples/s, on this branch, ~3.9k samples/s. The speed up you're reporting sounds worth it -- let me refactor this a bit to make it maintainable! |
I guess that would work well for measuring the overhead for each sample, but for real world models I'm usually happy if it reports samples/s instead of s/sample. :-) |
259c188
to
b0a3f4a
Compare
Should be ready for review |
pymc3/step_methods/hmc/nuts.py
Outdated
@@ -69,39 +74,38 @@ def astep(self, q0): | |||
else: | |||
step_size = np.exp(self.log_step_size_bar) | |||
|
|||
u = floatX(nr.uniform()) | |||
u = np.typeDict[theano.config.floatX](nr.uniform()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... could have sworn there was a reasonable speedup from this, but can't see it any more. I'll revert these two (there's another in the buildtree
).
leaf_size = int(np.log(u) + energy_change <= 0) | ||
is_valid_sample = (np.log(u) + energy_change < Emax) | ||
return q_edge, p_edge, q_edge, leaf_size, is_valid_sample, min(1, np.exp(-energy_change)), 1 | ||
p_accept = min(1, np.exp(-energy_change)) | ||
return BinaryTree(q, p, q_grad, q, leaf_size, is_valid_sample, p_accept, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of namedtuple definitely cleans things up a bit 👍.
Very nice, a bit unfortunate for the code that we have to carry the dlogp around all the time but the speed-up is definitely worth it. Also another example of where the regression tests come in really handy. |
This is an attempt at addressing #1693. Local benchmarks put it marginally slower than
NUTS
on current master. My two explanations are-- The grad calculation is already memoized
-- I did a bad job factoring out the
dlogp
functionWanted to make this PR in case @aseyboldt or someone else wanted to take a look, but will close again if there is not progress soon.