Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not recalculate gradient in NUTS #1730

Merged
merged 4 commits into from
Feb 2, 2017

Conversation

ColCarroll
Copy link
Member

This is an attempt at addressing #1693. Local benchmarks put it marginally slower than NUTS on current master. My two explanations are
-- The grad calculation is already memoized
-- I did a bad job factoring out the dlogp function

Wanted to make this PR in case @aseyboldt or someone else wanted to take a look, but will close again if there is not progress soon.

@ColCarroll ColCarroll added the WIP label Jan 30, 2017
@aseyboldt
Copy link
Member

How did you test the speedup? It works for me, if I try a large model. Fitting this (probably not particularly useful) model took 2:50 without, and 1:35 with this patch:

import pymc3
import numpy as np
import theano.tensor as tt

model = pymc3.Model()
data = np.random.randn(200, 1000)
with model:
    mu = pymc3.Normal("mu", mu=0, sd=1, shape=1000)
    mu_ = tt.dot(data, mu)
    pymc3.Normal("measure", mu=mu_, sd=1, observed=np.ones(200))

with model:
    step = pymc3.NUTS(profile=True)
    trace = pymc3.sample(1000, step=step, init=None,
                         tune=500, progressbar=True)

I had to work around the test_value mismatch for q_grad.

@ColCarroll
Copy link
Member Author

Wow, thanks for testing it out so quickly! I've got a way-too-simplistic benchmark I use, of just sampling from a normal distribution 100k times. On master it is ~4.1k samples/s, on this branch, ~3.9k samples/s.

The speed up you're reporting sounds worth it -- let me refactor this a bit to make it maintainable!

@aseyboldt
Copy link
Member

I guess that would work well for measuring the overhead for each sample, but for real world models I'm usually happy if it reports samples/s instead of s/sample. :-)

@ColCarroll ColCarroll removed the WIP label Feb 1, 2017
@ColCarroll
Copy link
Member Author

Should be ready for review

@@ -69,39 +74,38 @@ def astep(self, q0):
else:
step_size = np.exp(self.log_step_size_bar)

u = floatX(nr.uniform())
u = np.typeDict[theano.config.floatX](nr.uniform())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... could have sworn there was a reasonable speedup from this, but can't see it any more. I'll revert these two (there's another in the buildtree).

leaf_size = int(np.log(u) + energy_change <= 0)
is_valid_sample = (np.log(u) + energy_change < Emax)
return q_edge, p_edge, q_edge, leaf_size, is_valid_sample, min(1, np.exp(-energy_change)), 1
p_accept = min(1, np.exp(-energy_change))
return BinaryTree(q, p, q_grad, q, leaf_size, is_valid_sample, p_accept, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of namedtuple definitely cleans things up a bit 👍.

@twiecki
Copy link
Member

twiecki commented Feb 2, 2017

Very nice, a bit unfortunate for the code that we have to carry the dlogp around all the time but the speed-up is definitely worth it. Also another example of where the regression tests come in really handy.

@twiecki twiecki merged commit c014311 into pymc-devs:master Feb 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants