# Markov Chain Monte Carlo

It's not always easy to generate the samples of a random vector $\mathbf{X}$ especially when $\mathbf{X}$ has dependent random variables or it's not easy to integrate w.r.t its density function.

If we could design a Markov chain where its stationary distribution is the same as the distribution of $\mathbf{X}$, then we could generate samples of $\mathbf{X}$ via the markov chain process.

## Metropolis–Hastings Algorithm

**Video lecture: https://youtu.be/-o8drmhudjs**

Metropolis–Hastings Algorithm can generate samples directly from any probability distribition $\pi(x)$, given that $\pi(x) \propto f(x)$ where $f(x)$ is known and values of $f(x)$ can be calcuated. Metropolis–Hastings Algorithm solves the problem when integrating $f(x)$ (finding the normalization factor) is computationally difficult.

Metropolis–Hastings Algorithm constructs a Markov chain which asymtotically reaches a unique stationary distribution equal to $\pi$.

To guarantee the existence of stationary distribution, Metropolis–Hastings Algorithm constructs the Markov chain $p(y|x)$ (transition probability from $x$ to $y$) such that it's time-reversible. i.e. 

$$
\begin{equation}
\pi(x)p(y|x) = \pi(y)p(x|y) \tag{1}
\end{equation}
$$ 
Because $\int{\pi(x)p(y|x)dy} = \pi(x) = \int{\pi(y)p(x|y)dy}$, which indicates that $\pi$ is a stationary distribution.

To guarantee the uniqueness of the stationary distribution, the Markov chain must also be ergodic (aperiodic and positive recurrent).


Our goal is to construct a Markov chain $p(.|.)$ such that (1) holds. And Metropolis–Hastings Algorithm's idea is to use another known Markov chain $q(.|.)$  which we know how to generate samples from and use a reject-or-accept ratio say $\alpha(.|.)$ to make the accepted samples from $q(.|.)$ asymtotically reaches to the stationary distribution $\pi$. Let the two step procedure's probability construct equals to $p(y|x)$ as follows:

$$
p(y|x) = q(y|x)\alpha(y|x)
$$

Insert into (1) we get:

$$
\begin{equation}
\pi(x)q(y|x)\alpha(y|x) = \pi(y)q(x|y)\alpha(x|y)
\end{equation}
$$

$$
\begin{align}
\alpha(y|x) = \alpha(x|y)\frac{\pi(y)q(x|y)}{\pi(x)q(y|x)} = \alpha(x|y)\frac{f(y)q(x|y)}{f(x)q(y|x)}
\end{align}
$$

If we let $\alpha(y|x)$ equals to:

$$
\begin{align}
\alpha(y|x) = min\Big( \frac{f(y)q(x|y)}{f(x)q(y|x)}, 1  \Big)     \tag{2}
\end{align}
$$

Then (2) makes the equation (1) holds in all cases. Why? if $\alpha(y|x) = \frac{\pi(y)q(x|y)}{\pi(x)q(y|x)} < 1$ then $\alpha(x|y) = 1$, equation holds, and vice versa.

### Alogorithm:

1. Initialization: Let $t = 1$, Pick an initial state $x_1$ and let $X_t = x_1$. Choose a large number $N$ for stopping the markov chain process.
2. while $t \le N$ do:
    1. Propose a new sample candidate $Y$ according to $q(Y|X_t)$
    2. Calculate the acceptance probability $\alpha(Y|X_t) = min\Big( \frac{f(Y)q(X_t|Y)}{f(X)q(Y|X_y)}, 1  \Big)$
    3. Generate $U \sim U(0,1)$
    4. If $U \le \alpha(Y|X_t)$, accept the sample $Y$ by $t = t+1$ setting $X_{t+1} = Y$
    5. Else continue to the next loop.


## Gibbs Sampler

**Video lecture: https://youtu.be/wlpDrBEZXpg**

Gibbs sampler makes generating high dimension samples easier than Metropolis–Hastings Algorithm. The idea is to pick a component of the vector and propose a new sample by generating it from conditional distribution given other components fixed.

Gibbs Sampler algorithm is another version of Metropolis–Hastings Algorithm with the proposal distribution $q$ as follows:

$$
q(\mathbf{y}|\mathbf{x}) = \frac{1}{n}f(y_i|x_1,...,x_{i-1},x_{i+1},...,x_{n}) = \frac{f(\mathbf{y})}{ n\int{ f(\mathbf{x}) dx_i } }
$$

Note that for fixed $i$, $\int{ f(\mathbf{x}) dx_i } = \int{ f(\mathbf{y}) dy_i }$,

$$
\frac{f(\mathbf{y})q(\mathbf{x}|\mathbf{y})}{f(\mathbf{x})q(\mathbf{y}|\mathbf{x})} = \frac{f(\mathbf{y})f(\mathbf{x})}{f(\mathbf{x})f(\mathbf{y})} = 1
$$

The \@ref(eq:mcmc-mh-alpha) now becomes constant 1: $\alpha(\mathbf{y}|\mathbf{x}) = min\Big( 1, 1  \Big) = 1$.

### Alogorithm:

1. Initialization: Let $j = 1$, Pick an initial state $\mathbf{x_1}$ and let $\mathbf{X_j} = \mathbf{x_1}$. Choose a large number $N$ for stopping the markov chain process.
2. For $j = 1$ to $N$ do:
    1. For $i = 1$ to $n$ do:
        1. Draw $Y_i$ from conditional pdf $f(y_i|x_{j,1},...x_{j,i-1},x_{j,i+1},...,x_{j,n})$
    2. $\mathbf{X_{j+1}} = Y$