I left a comment on Gelman's blog about this but I realized this is the best place to keep track of feature requests. Here's a copy of that request:
Speaking of mixing, Geyer’s parallel tempering seems like a pretty useful technique. Any chance of including it into Stan so that Stan will run NUTS on 2 or 3 additional tempered posterior densities, with the tempering coefficients given as an array or something? If you do switching with lowish probability then the chains may be mostly able to run in parallel threads on multiple cores without a lot of synchronization overhead and this could be a fantastic way to improve exploration of the space at basically no wall-clock-time cost.
Specifically, I imagine something like the following method:
You set up your N temperatures, maybe something like
stan_parallel_temps <- [1.25,1.5,3]; ## you always need 1 to be first, so perhaps it's better to just declare the additional temps and let stan put 1 at the front of this list on its own.
during warmup Stan estimates the average length between U turns in the untempered distribution internally. It then chooses a number N at random exponentially distributed (or maybe gamma distributed or with a distribution the user specifies in the model file) with mean equal to some constant times this average inter-u-turn length (ceiling to the nearest int).
All the chains then run N steps and synchronize on a thread semaphore so that when all the high temp threads are done with N steps the temp 1 thread can proceed. It proceeds by choosing a random adjacent pair of temperatures and attempting an exchange between those states, and then generating a new N and setting all the threads back to work to do N HMC timesteps.
in this scheme you most of the time complete several HMC trajectories before trying to exchange, so you don't disrupt the NUTS sampler too much, but then you have several tempered distributions running in parallel, and feeding your untempered simulation new regions of space so you may be able to explore space more readily. And as I said, with several cores your wall clock overhead is only due to the synchronization, which shouldn't be too bad.
I left a comment on Gelman's blog about this but I realized this is the best place to keep track of feature requests. Here's a copy of that request:
Speaking of mixing, Geyer’s parallel tempering seems like a pretty useful technique. Any chance of including it into Stan so that Stan will run NUTS on 2 or 3 additional tempered posterior densities, with the tempering coefficients given as an array or something? If you do switching with lowish probability then the chains may be mostly able to run in parallel threads on multiple cores without a lot of synchronization overhead and this could be a fantastic way to improve exploration of the space at basically no wall-clock-time cost.
Specifically, I imagine something like the following method:
You set up your N temperatures, maybe something like
stan_parallel_temps <- [1.25,1.5,3]; ## you always need 1 to be first, so perhaps it's better to just declare the additional temps and let stan put 1 at the front of this list on its own.
during warmup Stan estimates the average length between U turns in the untempered distribution internally. It then chooses a number N at random exponentially distributed (or maybe gamma distributed or with a distribution the user specifies in the model file) with mean equal to some constant times this average inter-u-turn length (ceiling to the nearest int).
All the chains then run N steps and synchronize on a thread semaphore so that when all the high temp threads are done with N steps the temp 1 thread can proceed. It proceeds by choosing a random adjacent pair of temperatures and attempting an exchange between those states, and then generating a new N and setting all the threads back to work to do N HMC timesteps.
in this scheme you most of the time complete several HMC trajectories before trying to exchange, so you don't disrupt the NUTS sampler too much, but then you have several tempered distributions running in parallel, and feeding your untempered simulation new regions of space so you may be able to explore space more readily. And as I said, with several cores your wall clock overhead is only due to the synchronization, which shouldn't be too bad.