# Posterior Distribution Simulation

## 1. Randon Number Generation
In this section we discuss how to generate random numbers from Uniform distribution $U(0,1)$.

### 1.1 Linear Congruential Generator
A linear congruential generator is an algorithm that can generate pseudo-random sequences with discontinuous calculations using piecewise linear equations. The algorithm is defined as follows:
- Given $X_{0}$, the interaction process is defined as $X_{n+1} = (aX_n + c) \mod m$, where $m,a,c,X_{0}$ are nonnegative integers.$m>0$ is modulus, $a$ is multiplier($0<a<m$), $c$ is increment( $0\leq c<m$), $X_{0}$ is seed or start value($0\leq X_{0}<m$).
- Pseudo-random number is defined as $R_{n} = X_{n}/m$.

If we define the period of the sequence as $T$(obviously $T\leq m$), then we have the following theorem:
> **Theorem**: The period of the sequence $T=m$ if and only if the following 3 conditions are satisfied: 
> - $c$ and $m$ are relatively prime.
> - $a-1$ is divisible by all prime factors of $m$.
> - $a-1$ is divisible by 4 if $m$ is divisible by 4.

In pratice, considering the way computer stores numbers, we usually choose the modulus $m$ a power of 2.
>**Theorem**: If $m=2^{b}$, then the period of the sequence $T=m$ if and only if the following 2 conditions are satisfied:
> - $c$ is odd.
> - $a-1$ is divisible by 4.

## 2. Sampling 

### 2.1 The Inverse Transformation Method
According to the previous section, we can now generate random numbers approximately following the uniform distribution $U[0, 1]$ using computers. When the target distribution is not very complex, we can consider constructing transformations that ensure a random variable following the uniform distribution $U[0, 1]$ is transformed to a random variable that has the same distribution as the target distribution. This is what we will refer to as the inverse transform method in this section.

To implement the inverse transform method, we first provide the definition of the generalized inverse function of a distribution function.
> **Definition**: Let $F$ be a distribution function,$$ F^{-1}(y) = inf\left \{x:F(x)\geq y\right \},\quad y\in [0,1]$$
> is the generalized inverse function of $F$.
When $F$ is continuous, $F^{-1}(y)$ is the inverse function of $F$.

Now we can state the following theorem:
> **Theorem**: Let $F(x),x\in \mathbb{R}$ be a cumulative distribution function(cdf) with generalized inverse function $F^{-1}(y)$, and let $U$ be a random variable following the uniform distribution $U[0, 1]$. Then $X = F^{-1}(U)$ is a random variable with distribution function $F$, namely, $P(X\leq x)=F(x)$.

### 2.2 Rejection Sampling
**Rejection sampling** is a fundamental sampling method used to generate observed samples from a specific distribution. It is also known as "the acceptance-rejection method" or "the accept-reject algorithm." Let $p(x)$ be the probability density function of the target distribution, and let $\mathcal{X}_{p}$  be its support set, such that $p(x)>0$ for any $x\in \mathcal{X}_{p}$ and $p(x)=0$ for any $x \notin \mathcal{X}_{p}$. In rejection sampling, we introduce a proposal distribution $q(x)$ from which we can directly sample. The support set $\mathcal{X}_{p}$ of $p(x)$ is included in the support set $\mathcal{X}_{q}$ of $q(x)$, meaning that $q(x)>0$ for any $x\in \mathcal{X}_{p}$. Additionally, there exists a positive constant $M$ such that $p(x)\leq Mq(x)$ for any $x\in \mathcal{X}_{p}$.<br> 
The idea behind rejection sampling is to first sample from the proposal distribution $q(x)$ and then accept or reject the sampled values based on the ratio $p(x)/(Mq(x))$, which serves as the parameter for a Bernoulli distribution. The remaining accepted samples will eventually follow the desired target distribution.

The steps of rejection sampling are as follows:
- Defination: Let $x$ represent a sample drawn from the proposal distribution, $X$ represent the final accepted/rejected sample generated by rejection sampling, and $k$ represent the iteration parameter during the sampling process, initialized as $k=1$.
- Step 1: Sample $x_{k}$ from the proposal distribution $q(x)$.
- Step 2: Sample $u_{k}$ from the uniform distribution $U[0, 1]$.
- Step 3: If $u_{k}\leq \alpha_{k} = p(x_{k})/(Mq(x_{k}))$, accept $x_{k}$ and set $X_{k}=x_{k}$; otherwise, reject $x_{k}$ and set $k=k+1$.
- Step 4: Repeat steps 1-3 until $X$ is accepted.

It is worth noting that sometimes we do not have the exact probability density function of the target distribution, but only know that the probability density function $p(x)$ of the target distribution is proportional to a certain function, such as $p(x)=c^{-1}f(x)$. In this case, the constant $M$ should satisfy $f(x)=cp(x)\leq Mq(x)$. We just need to let $\alpha_{k} = f(x_{k})/(Mq(x_{k}))$

### 2.3 Importance sampling
Unlike the inverse transform method and rejection sampling method introduced earlier, the samples obtained from **importance sampling** **do not follow the target distribution**. Importance sampling assigns a weight to each sample, so that the expectation of the weighted empirical distribution function matches the true distribution function, and ensures that as the number of samples increases, the corresponding weighted empirical distribution function converges to the true distribution function.

> **Defination**: Let $X_{1}, \dots, X_{n}$ be independent, identically distributed real random variables with the common cumulative distribution function $F(x)$. Then the empirical distribution function is defined as
> $$ \hat{F}_{n}(x) = \frac{1}{n} \sum^{n}_{i=1} \mathbb{1}(x_{i}\leq x) $$
> where $\mathbb{1}(x_{i}\leq x)$ is the indicator function, which is equal to 1 if $x_{i}\leq x$ and 0 otherwise. It's obvious that $\hat{F}_{n}(x)$ is a **unbiased estimator for $F(x)$**, namely, $\mathbb{E}\hat{F}_{n}(x) = F(x)$.

With empirical distribution function, we can estimate the expectation and variation of a function $f(x)$ with respect to the distribution function $F(x)$ as follows:
$$
Ef(X) \approx \hat{E}f(X) \equiv \int f(X)d\hat{F}(x) = \int f(x)\frac{1}{n} \sum^{n}_{i=1} \delta (x-x_{i})dx = \frac{1}{n} \sum^{n}_{i=1}f(x_{i})\\
Varf(X) \approx \int (f(X)-\hat{E}f(X))^{2}d\hat{F}(x) = \frac{1}{n} \sum^{n}_{i=1}(f(x_{i})-\hat{E}f(X))^{2}
$$

In case of importance sampling, we can assign a weight $W_{i}$ to each sample $x_{i}$. Suppose our suggested distribution is $G$ and its pdf is still denoted as $q(x)$, then the weighted empirical distribution function is defined as
$$
\tilde{F}_{n}(x) = \frac{1}{n} \sum^{n}_{i=1} W_{i}\mathbb{1}(x_{i}\leq x)
$$
where $W_{i} = p(x_{i})/q(x_{i})$.<br>
The expectation and variation of $\tilde{F}(x)$ are:
$$
\mathbb{E}\tilde{F}_{n}(x) = \mathbb{E}_{X \sim G}\left ( \frac{p(X)}{q(X)}\mathbb{1}(X\leq x)\right ) = \int\frac{p(y)}{q(y)}\mathbb{1}(y\leq x) q(y)dy = F(x)
$$
$$
Var\tilde{F}_{n}(x) = \frac{1}{n}Var_{X\sim G}\left ( \frac{p(X)}{q(X)}\mathbb{1}(X\leq x) \right ) = \frac{1}{n} \left [\int \frac{p^{2}(y)}{q(y)}\mathbb{1}(y\leq x)dy - F^{2}(x)\right ] 
$$
From the first formula, we can see that $\tilde{F}_{n}(x)$ is still an unbiased estimator for $F(x)$.<br>
Now we can estimate the expectation and variation of a function $f(x)$ with respect to the distribution function $F(x)$ as follows:
$$
Ef(X) \approx \tilde{E}f(X) \equiv \int f(X)d\tilde{F}(x) = \int f(x)\frac{1}{n} \sum^{n}_{i=1} W_{i}\delta (x-x_{i})dx = \frac{1}{n} \sum^{n}_{i=1}W_{i}f(x_{i})\\
$$
$$
E(\tilde{E}f(X)) = \frac{1}{n} \sum^{n}_{i=1}W_{i}Ef(x_{i}) = E_{X\sim F}f(X) \equiv I 
$$
$$
Var(\tilde{E}f(X)) = \frac{1}{n}Var\left ( \frac{p(X)}{q(X)}f(X)\right) = \frac{1}{n} \left [\int \frac{p^{2}(y)}{q(y)}f^{2}(y)dy - I^{2}\right ]


According to the Cauchy-Schwarz inequality, we have
$$
\int \frac{p^{2}(y)}{q(y)}f^{2}(y)dy = \int \frac{f^{2}(y)p^{2}(y)}{q(y)}dy\int q(y)dy  \geq \left ( \int f(y)p(y)dy \right )^{2} = I^{2}\\
\therefore Var(\tilde{E}f(X)) \geq 0
$$
The inequality holds if and only if $q(y) \propto f(y)p(y)$. So we get the optimal proposal distribution $q_{opt}(y) = \frac{f(y)p(y)}{\int f(y)p(y)dy}$.
Sometimes we only know $p(x) \propto \gamma(x)$, where $\gamma(x)$ is not standard pdf. In this case
$$
\begin{aligned}
F(x) & = E(\mathbb{1}(X\leq x)) = \int \mathbb{1}(y\leq x)p(y)dy = \frac{\int \gamma(y) \mathbb{1}(y\leq x)dy}{\int \gamma(y)dy}\\
& = \frac{\int \frac{\gamma(y)}{q(y)} \mathbb{1}(y\leq x)q(y)dy}{\int \frac{\gamma(y)}{q(y)}q(y)dy} = \frac{E_{X\sim G}\frac{\gamma(X)}{q(X)} \mathbb{1}(X\leq x)}{E_{X\sim G} \frac{\gamma(X)}{q(X)}}
\end{aligned}
$$
Suppose $\left \{ x_{i}\right \}_{i=1}^{n}$ is a set of fsamples drawn from $G$,$i.i.d$, then we can estimate $F(x)$ as follows:
$$
\tilde{F}_{n}(x) = \frac{\frac{1}{n}\sum^{n}_{i=1}\frac{\gamma(x_{i})}{q(x_{i})}\mathbb{1}(x_{i}\leq x)}{\frac{1}{n}\sum^{n}_{i=1}\frac{\gamma(x_{i})}{q(x_{i})}} = \frac{\sum^{n}_{i=1}W_{i}\mathbb{1}(x_{i}\leq x)}{\sum^{n}_{i=1}W_{i}} = \sum^{n}_{i=1}w_{i}\mathbb{1}(x_{i}\leq x)
$$
where $W_{i} = \frac{\gamma(x_{i})}{q(x_{i})}$, $w_{i} = \frac{W_{i}}{\sum^{n}_{i=1}W_{i}}$.