# Introduction to Markov Chain Monte Carlo

## Takeaways and objectives from this notebook
2. Basic ideas behind MCMC and convergence conditions in a (simplified) discrete universe.
3. The detailed balance condition.

## Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) is a method of generating **dependent** samples instead of the **independent** samples generated by Monte Carlo.

There are a few intuitive ways we can think about MCMC:
- as a method "learning" from past samples and moving toward better samples more often than not, or preferentially staying in high-density regions of the space
- as a method of rejection sampling where our proposal distribution is adaptive (typically centered on the last sample) instead of fixed as in Monte Carlo rejection sampling

For a more mathematical overview of MCMC theory mixed with practical tips, consult [Geyer](http://www.mcmchandbook.net/HandbookChapter1.pdf) [1].  For another, rather refreshing take that emphasizes graph theory, read the blog of [Jeremy Kun](https://jeremykun.com/2015/04/06/markov-chain-monte-carlo-without-all-the-bullshit/) [2].  In this notebook, we will follow the accessible introduction for machine learners by [Andrieu et al.](http://www.cs.ubc.ca/~arnaud/andrieu_defreitas_doucet_jordan_intromontecarlomachinelearning.pdf) [3].

### A note about MCMC
Note that if you remember the theory of Markov Chains from your studies, chances are that your university courses solved a different problem that what we encounter here.

A typical question about Markov chains would be: given a transition probability matrix $P$, does there exist a stationary distribution (independent of a starting position) of the Markov Chain? If so, what is it?

The entire theory of MCMC approaches the problem from the **opposite direction**.  We know what we want the stationary distribution of the Markov Chain to be: it should be our posterior distribution from whatever model we just built! The objective of MCMC theory is then to **design an algorithm (called a sampler)** that is associated with a transition probability matrix such that its stationary distribution is the model posterior.

## References

1. Geyer, C. [Chapter 1: Introduction to Markov Chain Monte Carlo](http://www.mcmchandbook.net/HandbookChapter1.pdf) in Handbook of Markov Chain Monte Carlo, Chapman and Hall/CRC, 2011.
2. Kun, J. [Markov Chain Monte Carlo without all the bullshit](https://jeremykun.com/2015/04/06/markov-chain-monte-carlo-without-all-the-bullshit/), 2015.
3. Andrieu C. and De Freitas, N, Doucet, A, Jordan, MI. [An introduction to MCMC for machine learning](http://www.cs.ubc.ca/~arnaud/andrieu_defreitas_doucet_jordan_intromontecarlomachinelearning.pdf). Machine Learning 50, 2003.