## Forward and Reverse process
The diffusion models make data into a gaussian noise (latent vector) and restore it again. The former is called the forward process, and the latter is called the reverse process.

### Forward process
In the forward process, we add a gaussian noise to the data step by step (usually hundreds of steps). The transform of an individual step is defined as follows.
$$
\begin{aligned}
x_{t} &= q(x_{t}|x_{t-1}) \\
&= N(x_{t}, \sqrt{1-\beta_{t}}x_{t-1}, \beta_{t}\mathrm{I})
\end{aligned}
$$
where $\beta_{t}$ is a coefficient in $(0, 1)$.

The $x_{t}$ is sampled from a gaussian distribution with mean $\sqrt{1-\beta_{t}}x_{t-1}$ and variance $\beta_{t}$. Because the mean is slightly smaller than $x_{t-1}$ by $\sqrt{1-\beta}$, the overall variance remains nearly constant. 
$$
\begin{aligned}
Var(x_{t}) &= Var[N(x_{t}, \sqrt{1-\beta_{t}}x_{t-1}, \beta_{t}\mathrm{I})] \\
&= Var(\sqrt{1-\beta_{t}}x_{t-1} + \sqrt{\beta_{t}} \epsilon) \\
&= Var(\sqrt{1-\beta_{t}}x_{t-1}) + Var(\sqrt{\beta_{t}} \epsilon) \\
&= (1-\beta_{t}) Var(x_{t-1}) + \beta_{t} \\
&\sim 1 - \beta_{t} + \beta_{t} \quad (\text{let the variance of } x_{t-1} \text{ is about } 1) \\ 
&= 1
\end{aligned}
$$

The total transform from $x_{0}$ to $x_{T}$ is as follows.
$$
\begin{aligned}
x_{1:T} &= q(x_{1:T}|x_{0}) \\
&= \prod_{t=1}^{T} q(x_{t}|x_{t-1})
\end{aligned}
$$

The reparameterization trick allows $x_{t}$ to be expressed by $x_{0}$ and $\beta_{t}$.
$$
\begin{aligned}
x_{t} &= q(x_{t}|x_{t-1}) \\
&= N(x_{t}, \sqrt{1-\beta_{t}}x_{t-1}, \beta_{t}\mathrm{I}) \\
&= \sqrt{1-\beta_{t}}x_{t-1} + \sqrt{\beta_{t}} \epsilon_{t-1} \\
&= \sqrt{\alpha_{t}}x_{t-1} + \sqrt{1-\alpha_{t}} \epsilon_{t-1} \quad (\alpha_{t}=1-\beta_{t}) \\
&= \sqrt{\alpha_{t}}(\sqrt{\alpha_{t-1}}x_{t-2} + \sqrt{1-\alpha_{t-1}} \epsilon_{t-2}) + \sqrt{1-\alpha_{t}} \epsilon_{t-1} \\
&= \sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{\alpha_{t}}\sqrt{1-\alpha_{t-1}} \epsilon_{t-2} + \sqrt{1-\alpha_{t}} \epsilon_{t-1} \\
&= \sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{\alpha_{t}-\alpha_{t}\alpha_{t-1}+1-\alpha_{t}} \bar{\epsilon} \quad (\bar{\epsilon} \sim N(0, \mathbf{I})) \\
&= \sqrt{\alpha_{t}\alpha_{t-1}}x_{t-2} + \sqrt{1-\alpha_{t}\alpha_{t-1}} \bar{\epsilon} \\
&= \sqrt{\alpha_{t}\alpha_{t-1}\alpha_{t-2}}x_{t-3} + \sqrt{1-\alpha_{t}\alpha_{t-1}\alpha_{t-2}} \bar{\epsilon} \\
&= ... \\
&= \sqrt{\alpha_{t}\alpha_{t-1}...\alpha_{1}}x_{0} + \sqrt{1-\alpha_{t}\alpha_{t-1}...\alpha_{1}} \bar{\epsilon} \\
&= \sqrt{\bar{\alpha}_{t}} x_{0} + \sqrt{1-\bar{\alpha}_{t}} \epsilon \quad (\bar{\alpha}_{t}=\alpha_{t}\alpha_{t-1}...\alpha_{1})\\
&= N(x_{t}; \sqrt{\bar{\alpha}_{t}} x_{0}, (1-\bar{\alpha}_{t})\mathbf{I}) \\
\therefore q(x_{t}|x_{0}) &= N(x_{t}; \sqrt{\bar{\alpha}_{t}} x_{0}, (1-\bar{\alpha}_{t})\mathbf{I})
\end{aligned}
$$

### Reverse process
In the reverse process, we restore image from a gaussian noise (a latent vector). If $\beta_{t}$ is small enough, the reverse $q(x_{t-1}|x_{t})$ will also be gaussian. It is noteworthy that the reverse process is tractable when conditioned on $x_{0}$.

$$
\begin{aligned}
q(x_{t-1}|x_{t}, x_{0}) 
&= q(x_{t}|x_{t-1}, x_{0}) \frac{q(x_{t-1}|x_{0})}{q(x_{t}|x_{0})} \\
&= N(x_{t}; \sqrt{1-\beta_{t}} x_{t-1}, \beta_{t}\mathbf{I}) \frac{N(x_{t-1}; \sqrt{\bar{\alpha}_{t-1}} x_{0}, (1-\bar{\alpha}_{t-1})\mathbf{I})}{N(x_{t}; \sqrt{\bar{\alpha}_{t}} x_{0}, (1-\bar{\alpha}_{t})\mathbf{I})} \\
&\propto \mathrm{exp} \left \{ \frac{1}{2} \left [ \frac{(x_{t}-\sqrt{\alpha_{t}}x_{t-1})^{2}}{\beta_{t}} + \frac{(x_{t-1}-\sqrt{\bar{\alpha}_{t-1}}x_{0})^{2}}{1-\bar{\alpha}_{t-1}} - \frac{(x_{t}-\sqrt{\bar{\alpha}_{t}}x_{0})^{2}}{1-\bar{\alpha}_{t}} \right ] \right \} \\
&= \mathrm{exp} \left \{ \frac{1}{2} \left [ \frac{x_{t}^{2}-2\sqrt{\alpha_{t}}x_{t}x_{t-1}+\alpha_{t} x_{t-1}^{2}}{\beta_{t}} + \frac{x_{t-1}^{2}-2\sqrt{\bar{\alpha}_{t-1}}x_{t-1}x_{0})^{2} + \bar{\alpha}_{t-1}x_{0}^{2}}{1-\bar{\alpha}_{t-1}} - \frac{(x_{t}-\sqrt{\bar{\alpha}_{t}}x_{0})^{2}}{1-\bar{\alpha}_{t}} \right ] \right \} \\
&= \mathrm{exp} \left \{ \frac{1}{2} \left [ \left ( \frac{\alpha_{t}}{\beta_{t}} + \frac{1}{1-\bar{\alpha}_{t-1}} \right ) x_{t-1}^{2} - \left ( \frac{2\sqrt{\alpha_{t}}}{\beta_{t}}x_{t} + \frac{2\sqrt{\bar{\alpha}_{t-1}}}{1-\bar{\alpha}_{t-1}} x_{0} \right ) x_{t-1} + C(x_{t}, x_{0}) \right ] \right \} \\
&= N(x_{t-1}; \tilde{\mu}(x_{t}, x_{0}), \tilde{\beta}_{t} \mathrm{I})
\end{aligned}
$$
where $C(x_{t}, x_{0})$ has nothing to do with $x_{t-1}$ so the details are omitted.  

The mean $\tilde{\mu}(x_{t}, x_{0})$ and the variance $\tilde{\beta}_{t}$ can be parameterized as follows.

1. $\bf{mean}$, $\tilde{\mu}(x_{t}, x_{0})$
   $$
   \begin{aligned}
   \tilde{\mu}(x_{t}, x_{0}) &= \left ( \frac{\sqrt{\alpha_{t}}}{\beta_{t}}x_{t} + \frac{\sqrt{\bar{\alpha}_{t-1}}}{1-\bar{\alpha}_{t-1}} x_{0} \right ) / \left ( \frac{\alpha_{t}}{\beta_{t}} + \frac{1}{1-\bar{\alpha}_{t-1}} \right ) \\
   \frac{\alpha_{t}}{\beta_{t}} + \frac{1}{1-\bar{\alpha}_{t-1}} &= \frac{\alpha_{t}(1-\bar{\alpha}_{t-1})+\beta_{t}}{\beta_{t}(1-\bar{\alpha}_{t-1})} \\
   &= \frac{\alpha_{t}-\alpha_{t}\bar{\alpha}_{t-1}+1-\alpha_{t}}{(1-\bar{\alpha}_{t-1})\beta_{t}} \\
   &= \frac{1-\bar{\alpha}_{t}}{(1-\bar{\alpha}_{t-1})\beta_{t}} \\
   \tilde{\mu}(x_{t}, x_{0}) &= \left ( \frac{\sqrt{\alpha_{t}}}{\beta_{t}}x_{t} + \frac{\sqrt{\bar{\alpha}_{t-1}}}{1-\bar{\alpha}_{t-1}} x_{0} \right )\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}}\beta_{t} \\
   &= \frac{\sqrt{\alpha_{t}}}{\beta_{t}} \frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}}\beta_{t} x_{t} + \frac{\sqrt{\bar{\alpha}_{t-1}}}{1-\bar{\alpha}_{t-1}} \frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}}\beta_{t} x_{0} \\
   &= \frac{\sqrt{\alpha_{t}}(1-\bar{\alpha}_{t-1})}{1-\bar{\alpha}_{t}} x_{t} + \frac{\sqrt{\bar{\alpha}_{t-1}}\beta_{t}}{1-\bar{\alpha}_{t}} x_{0} \\
   \tilde{\mu}(x_{t}, t) &= \frac{\sqrt{\alpha_{t}}(1-\bar{\alpha}_{t-1})}{1-\bar{\alpha}_{t}} x_{t} + \frac{\sqrt{\bar{\alpha}_{t-1}}\beta_{t}}{1-\bar{\alpha}_{t}} \frac{1}{\sqrt{\bar{\alpha}_{t}}} (x_{t}-\sqrt{1-\bar{\alpha}_{t}}\epsilon_{t}) \quad \because x_{t}=\sqrt{\bar{\alpha}_{t}}x_{0} + \sqrt{1-\bar{\alpha}_{t}}\epsilon_{t} \\
   &= \frac{\sqrt{\bar{\alpha}_{t}}\sqrt{\alpha_{t}}(1-\bar{\alpha}_{t-1}) + \sqrt{\bar{\alpha}_{t-1}}\beta_{t}}{\sqrt{\bar{\alpha}_{t}}(1-\bar{\alpha}_{t})} x_{t} - \frac{\sqrt{\bar{\alpha}_{t-1}}\beta_{t} \sqrt{1-\bar{\alpha}_{t}}}{\sqrt{\bar{\alpha}_{t}} (1-\bar{\alpha}_{t})} \epsilon_{t} \\
   &= \frac{\sqrt{\bar{\alpha}_{t-1}}(\alpha_{t}-\bar{\alpha}_{t}+1-\alpha_{t})}{\sqrt{\bar{\alpha}_{t}}(1-\bar{\alpha}_{t})} x_{t} - \frac{1-\alpha_{t}}{\sqrt{\alpha_{t}}\sqrt{1-\bar{\alpha}_{t}}} \epsilon_{t} \\
   &= \frac{\sqrt{\bar{\alpha}_{t-1}}(1-\bar{\alpha}_{t})}{\sqrt{\bar{\alpha}_{t}}(1-\bar{\alpha}_{t})} x_{t} - \frac{1-\alpha_{t}}{\sqrt{\alpha_{t}}\sqrt{1-\bar{\alpha}_{t}}} \epsilon_{t} \\
   &= \frac{1}{\sqrt{\alpha}_{t}} x_{t} - \frac{1-\alpha_{t}}{\sqrt{\alpha_{t}}\sqrt{1-\bar{\alpha}_{t}}} \epsilon_{t} \\
   &= \frac{1}{\sqrt{\alpha}_{t}} \left ( x_{t} - \frac{1-\alpha_{t}}{\sqrt{1-\bar{\alpha}_{t}}} \epsilon_{t} \right ) \\
   \therefore \tilde{\mu}(x_{t}, t) &= \frac{1}{\sqrt{\alpha}_{t}} \left ( x_{t} - \frac{1-\alpha_{t}}{\sqrt{1-\bar{\alpha}_{t}}} \epsilon_{t} \right )
   \end{aligned}
   $$

2. $\bf{variance}$, $\tilde{\beta_{t}}$
   $$
   \begin{aligned}
   \tilde{\beta}_{t} &= 1/ \left ( \frac{\alpha_{t}}{\beta_{t}} + \frac{1}{1-\bar{\alpha}_{t-1}} \right ) \\
   &= 1/ \left ( \frac{\alpha_{t} - \alpha_{t}\bar{\alpha}_{t-1} + \beta_{t}}{\beta_{t}(1-\bar{\alpha}_{t-1})} \right ) \\
   &= 1/ \left ( \frac{1-\beta_{t} - \bar{\alpha}_{t} + \beta_{t}}{\beta_{t}(1-\bar{\alpha}_{t-1})} \right ) \quad (\alpha_{t}=1-\beta_{t}, \alpha_{t}\bar{\alpha}_{t-1}=\bar{\alpha}_{t}) \\
   &= \frac{1 - \bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}} \beta_{t} \\
   \therefore \tilde{\beta}_{t} &= \frac{1 - \bar{\alpha}_{t-1}}{1-\bar{\alpha}_{t}} \beta_{t}
   \end{aligned}
   $$