# HMCTuning

Author: <b>Ignacio Peis</b>

*Note*: All the gifs can be obtained using the scripts in '../examples/'


This notebook includes some illustrative examples that will help in understanding why tuning the hyperparameters with HMCTuning is effective. Although some background in Hamiltonian Monte Carlo might be helpful, all you need to know is explained in the Introduction section.


## Introduction


When sampling with HMC, you have to set some hyperparameters:

* Step sizes $\mathbf{\epsilon}$. Matrix with dims $(T,D)$. Different step sizes can be learned to be applied within each state of the chains.
* Momentum variances, $M$. Matrix with dims $(T, D)$.
* An inflation/scale parameter $\mathbf{s}$, that can be a scalar or a vector with dims $D$ so that different inflations can be applied per dimension.


You can observe in these gifs how the initial proposal affects the sampling procedure. In the examples, the proposal is cyclically changed to show visually demonstrate that, when the proposal is tight, chains will get stuck in local density regions or modes, and therefore, will not explore the whole density (as desired).
<p float="center">
    <img src="../figs/cycle/gaussian_mixture/samples.gif" width="300" align="center">     &emsp;
    <img src="../figs/cycle/wave/samples.gif" width="300" align="center">
</p>




For the first distribution (Mixture of Gaussians), a proper initial proposal is a zero-centered Gaussian. The chains reach all the modes. As you might observe, the first state update is a big step, and then smaller steps are applied to refine the final sample $x^{(T)}$ ($T=5$ in this example).
<p float="center">
    <img src="../figs/chains/gaussian_mixture/samples.gif" width="300" align="center">
</p>

For the second density (named "wave") choosing the same centered, low variance proposal, chains will get stuck in a small region:
<p float="center">
    <img src="../figs/chains/wave/samples_stuck.gif" width="300" align="center">
</p>

But if we define a wider horizontal variance (for instance, [0.1, 20.0]), the proposal will cover better the wave density. 
<p float="center">
    <img src="../figs/chains/wave/samples_wide.gif" width="300" align="center">
</p>



Several questions come up here: 
1. Could we automatically tune that initial proposal? 
2. Rather than considering a scalar step size, could we automatically tune the step sizes applied in each dimension, for each state of the chain? 
3. Rather than considering a single momentum distribution, could we tune the variances per dimension and state to find a better distribution?

The answer is yes. Let's fit the previous HMC.


## Training the HMC
In the following Figure you can observe the optimization of the hyperparameters for the wave density. As it is clearly appreciated, starting from a tight initial proposal, only the scale factor applied to horizontal dimension automatically increases in order to inflate the proposal.

<p float="center">
    <img src="../figs/training/wave/samples.gif" width="600" align="center">
</p>

In the following Figure, you can observe that for the Gaussian Mixture, starting with the zero-centered proposal, the step sizes start shrinking for the latest steps of the chains, while the first step converges to the highest value.

# How does it work?

Let's go with the Maths. You can find more details in [our paper](https://arxiv.org/pdf/2202.04599.pdf), but let me recap here. As told above, the HMC objective is in the form:
$$
\mathcal{L}(\mathbf{x})
$$

