Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of hamiltonian slice sampling compared to random walk #146

Closed
fbartolic opened this issue Aug 22, 2019 · 3 comments

Comments

@fbartolic
Copy link

commented Aug 22, 2019

Hi Josh, thanks for releasing this fantastic package!

I'm using dynesty to fit a model defined in PyMC3 and comparing the performance to HMC. I find that Dynamic Nested Sampling with rwalk sampler works pretty well on an 8 dimensional model with a multi-modal posterior. However, when I pass the gradients to dynesty and use the hslice sampler the model doesn't really converge and the sampling efficiency drops from about ~3% with rwalk to about 0.3% with hslice.

Is this behaviour expected? Naively I'd expect that using the information about the gradients would improve the sampling performance rather than the other way around. The exact same model converges easily using NUTS in PyMC3 (albeit to only one of the modes).

Sorry I can't provide a minimal example, the code is a bit hacky with too many dependancies.

@joshspeagle

This comment has been minimized.

Copy link
Owner

commented Aug 23, 2019

This seems about right to me. Nested Sampling has a really hard time using gradient information because by construction it can't use the curvature of the space to plan subsequent proposals (which is what makes HMC so amazing) since that breaks the "uniform sampling" condition. This means proposals essentially involve moving in a straight line for a while before "bouncing" off the edge (the only step that uses the gradient). This is incredibly inefficient, especially since you need to have the bounce point be close to the edge to minimize terrible proposals (which requires quite small timesteps -- I think the default in dynesty is like 20/1 moves/bounce).

While the scaling with dimensionality is much better even with this really dumb scheme, the constant pre-factor out front related to these inefficient proposals essentially kill you when running on most reasonably-sized problems.

If convergence means "it runs too slowly", then that sounds like the expected behavior. If convergence means "even letting it run for a similar amount of iterations as rwalk I find weird behavior", that's more concerning. I didn't vet my implementation nearly as thoroughly as the other approaches since it was by construction such an edge case, so if it is having real performance issues I can look into officially deprecating it.

@fbartolic

This comment has been minimized.

Copy link
Author

commented Aug 23, 2019

Thanks for clarifying! I'm still trying to wrap my head around the differences between Nested Sampling and HMC.

I mean it runs too slowly. Here's the output of the both runs:

'rwalk':
iter: 37635 | batch: 8 | bound: 479 | nc: 25 | ncall: 1115285 | eff(%): 3.374 | loglstar: 1028.106 < 1036.613 < 1035.031 | logz: 1000.262 +/- 0.354 | stop: 0.877

'hslice':
iter: 31883 | batch: 6 | bound: 311 | nc: 685 | ncall: 19756374 | eff(%): 0.161 | loglstar: 1028.503 < 1037.027 < 1035.168 | logz: 999.567 +/- 0.358 | stop: 0.970

~1M likelihood calls for rwalk vs ~20M for hslice.

@joshspeagle

This comment has been minimized.

Copy link
Owner

commented Aug 23, 2019

I'm still trying to wrap my head around the differences between Nested Sampling and HMC.

The fundamental difference for sampling is what the gradient can be used for. HMC, because it just needs to satisfy detailed balance (as an MCMC algorithm), uses the gradient to propose new positions along a given smooth trajectory. This allows it to take advantage of the curvature of the space. Nested Sampling, however, samples with a stronger constraint: generating samples uniformly with L > L_worst. Any attempt to use the gradient to propose positions using the shape of the posterior fundamentally violate this assumption, making it hard to incorporate. I'm sure there's a clever implementation out there that somehow gets around this problem, but it makes gradients only useful for "bouncing" within this uniform hyper-volume.

I mean it runs too slowly.

Ah, okay. Please let me know if it appears to give answers that are wildly off. Otherwise, I would recommend avoiding the method unless you're in >40 dimensions. ;)

@fbartolic fbartolic closed this Aug 24, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.