### References

[Beginning Bayesian Statistics]

[Hamiltonian Monte Carlo in Python](https://colindcarroll.com/2019/04/11/hamiltonian-monte-carlo-from-scratch/)

[Betancourt HMC - Best introduction to HMC](https://www.youtube.com/watch?v=VnNdhsm0rJQ)

[NUTS paper](http://arxiv.org/abs/1111.4246)

[HMC Tuning by Colin Caroll](https://colcarroll.github.io/hmc_tuning_talk/)


### Building blocks

#### Proposal distribution
An easy to sample distribution such as a Gaussian distribution $q(x)$ such that 

$q(x_{i+1} | x_{i}) \approx N(\mu, \sigma)$

#### Foundation of Bayesian Inference

1. Obtain the data and inspect it for a high-level understanding of the distribution of the data and the outliers
2. Define a reasonable prior for the data based on (1) and your understanding of the problem
3. Define a likelihood distribution for the data and obtain the likelihood of the data given this likelihood distribution
4. Obtain the posterior distribution using (2) and (3) by applying the Bayes Theorem

### Metropolis-Hastings

We start off by modeling a discrete number of events using a Poisson distribution shown below. 

$f(x) = e^{-\mu} \mu^x / x!$

The mean rate is represented by μ and x is positive integer that represents the number of events that can happen. If you recall from the discussion of the binomial distribution, that can also be used to model the probability of the number of successes out of 'n' trials. The Poisson distribution is a special case of this binomial distribution and is used when the trials far exceed the number of successes.

If our observed data has a Poisson likelihood distribution, using a Gamma prior for $\mu$ results in a Gamma posterior distribution. 

#### Outline 



#### Traceplot 

The sequence of accepted values from the proposed values that is plotted over each draw. If a proposed value was not accepted, you see the same value repeated again. If you notice a straight line, this is an indication that several proposed values are being rejected. This is a sign that something is askew with the distribution or sampling process.


#### Building the Posterior distribution

Use the current values that we obtain at each step and build a frequency distribution (histogram) from it.

#### Notes about the Metropolis algorithm

* The proposal distribution has to be symmetric. A normal distribution is commonly used.

* Tuning - A hyperparameter, i.e. the standard deviation is essential to tune this proposal distribution. This needs to be tuned such that the acceptance probability is a certain value. This is referred to as the tuning parameter.


### Hamiltonian Monte Carlo (also called Hybrid Monte Carlo)

Based on the solution of differential equations known as Hamilton's equations. These differential equations depend on the probability distributions we are trying to learn. We navigate these distributions by moving around them in a trajectory using steps that are defined by a position and momentum at that position. Navigating these trajectories can be a very expensive process and the goal is to minimize this computational process.

HMC is based on the notion of conservation of energy. When the sampler trajectory is far away from the probability mass center, it has high potential energy but low kinetic energy and when it is closer to the center of the probability mass will have high kinetic energy but low potential energy.