Join GitHub today
Performance of hamiltonian slice sampling compared to random walk #146
Hi Josh, thanks for releasing this fantastic package!
I'm using dynesty to fit a model defined in PyMC3 and comparing the performance to HMC. I find that Dynamic Nested Sampling with
Is this behaviour expected? Naively I'd expect that using the information about the gradients would improve the sampling performance rather than the other way around. The exact same model converges easily using NUTS in PyMC3 (albeit to only one of the modes).
Sorry I can't provide a minimal example, the code is a bit hacky with too many dependancies.
This seems about right to me. Nested Sampling has a really hard time using gradient information because by construction it can't use the curvature of the space to plan subsequent proposals (which is what makes HMC so amazing) since that breaks the "uniform sampling" condition. This means proposals essentially involve moving in a straight line for a while before "bouncing" off the edge (the only step that uses the gradient). This is incredibly inefficient, especially since you need to have the bounce point be close to the edge to minimize terrible proposals (which requires quite small timesteps -- I think the default in
While the scaling with dimensionality is much better even with this really dumb scheme, the constant pre-factor out front related to these inefficient proposals essentially kill you when running on most reasonably-sized problems.
If convergence means "it runs too slowly", then that sounds like the expected behavior. If convergence means "even letting it run for a similar amount of iterations as
Thanks for clarifying! I'm still trying to wrap my head around the differences between Nested Sampling and HMC.
I mean it runs too slowly. Here's the output of the both runs:
~1M likelihood calls for
The fundamental difference for sampling is what the gradient can be used for. HMC, because it just needs to satisfy detailed balance (as an MCMC algorithm), uses the gradient to propose new positions along a given smooth trajectory. This allows it to take advantage of the curvature of the space. Nested Sampling, however, samples with a stronger constraint: generating samples uniformly with L > L_worst. Any attempt to use the gradient to propose positions using the shape of the posterior fundamentally violate this assumption, making it hard to incorporate. I'm sure there's a clever implementation out there that somehow gets around this problem, but it makes gradients only useful for "bouncing" within this uniform hyper-volume.
Ah, okay. Please let me know if it appears to give answers that are wildly off. Otherwise, I would recommend avoiding the method unless you're in >40 dimensions. ;)