## Forward and Reverse process
The diffusion models make data into a gaussian noise (latent vector) and restore it again. The former is called the forward process, and the latter is called the reverse process.

### Forward process
In the forward process, we add a gaussian noise to the data step by step (usually hundreds of steps). The transform of an individual step is defined as follows.
$$
\begin{aligned}
x_{t} &= q(x_{t}|x_{t-1}) \\
&= N(x_{t}, \sqrt{1-\beta_{t}}x_{t-1}, \beta_{t}\mathrm{I})
\end{aligned}
$$
where $\beta_{t}$ is a coefficient in $(0, 1)$.

The $x_{t}$ is sampled from a gaussian distribution with mean $\sqrt{1-\beta_{t}}x_{t-1}$ and variance $\beta_{t}$. Because the mean is slightly smaller than $x_{t-1}$ by $\sqrt{1-\beta}$, the overall variance remains nearly constant. 
$$
\begin{aligned}
Var(x_{t}) &= Var[N(x_{t}, \sqrt{1-\beta_{t}}x_{t-1}, \beta_{t}\mathrm{I})] \\
&= Var(\sqrt{1-\beta_{t}}x_{t-1} + \sqrt{\beta_{t}} \epsilon) \\
&= Var(\sqrt{1-\beta_{t}}x_{t-1}) + Var(\sqrt{\beta_{t}} \epsilon) \\
&= (1-\beta_{t}) Var(x_{t-1}) + \beta_{t} \\
&\sim 1 - \beta_{t} + \beta_{t} \quad (\text{let the variance of } x_{t-1} \text{ is about } 1) \\ 
&= 1
\end{aligned}
$$

The total transform from $x_{0}$ to $x_{T}$ is as follows.
$$
\begin{aligned}
x_{1:T} &= q(x_{1:T}|x_{0}) \\
&= \prod_{t=1}^{T} q(x_{t}|x_{t-1})
\end{aligned}
$$

The reparameterization trick allows $x_{t}$ to be expressed by $x_{0}$ and $\beta_{t}$.
$$
\begin{aligned}
x_{t} &= q(x_{t}|x_{t-1}) \\
&= N(x_{t}, \sqrt{1-\beta_{t}}x_{t-1}, \beta_{t}\mathrm{I}) \\
&= \sqrt{1-\beta_{t}}x_{t-1} + \sqrt{\beta_{t}} \epsilon_{t-1} \\
&= \sqrt{\alpha_{t}}x_{t-1} + \sqrt{1-\alpha_{t}} \epsilon_{t-1} \quad (\alpha_{t}=1-\beta_{t}) \\
&= \sqrt{\alpha_{t}}(\sqrt{\alpha_{t-1}}x_{t-2} + \sqrt{1-\alpha_{t-1}} \epsilon_{t-2}) + \sqrt{1-\alpha_{t}} \epsilon_{t-1} \\
&= \sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{\alpha_{t}}\sqrt{1-\alpha_{t-1}} \epsilon_{t-2} + \sqrt{1-\alpha_{t}} \epsilon_{t-1} \\
&= \sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{\alpha_{t}-\alpha_{t}\alpha_{t-1}+1-\alpha_{t}} \bar{\epsilon} \quad (\bar{\epsilon} \sim N(0, \mathbf{I})) \\
&= \sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{1-\alpha_{t}\alpha_{t-1}} \bar{\epsilon} \\
&= \sqrt{\alpha_{t}\alpha_{t-1}\alpha_{t-2}}x_{t-3} + \sqrt{1-\alpha_{t}\alpha_{t-1}\alpha_{t-2}} \bar{\epsilon} \\
&= ... \\
&= \sqrt{\alpha_{t}\alpha_{t-1}...\alpha_{1}}x_{0} + \sqrt{1-\alpha_{t}\alpha_{t-1}...\alpha_{1}} \bar{\epsilon} \\
&= \sqrt{\bar{\alpha}} x_{0} + \sqrt{1-\bar{\alpha}} \epsilon \quad (\bar{\alpha}=\alpha_{t}\alpha_{t-1}...\alpha_{1})\\
&= N(x_{t}; \sqrt{\bar{\alpha}} x_{0}, (1-\bar{\alpha})\mathbf{I}) \\
\therefore q(x_{t}|x_{0}) &= N(x_{t}; \sqrt{\bar{\alpha}} x_{0}, (1-\bar{\alpha})\mathbf{I})
\end{aligned}
$$