### Reparameterization trick
Since sampling data from a probability distribution is a stochastic process, backpropagating gradients is not possible. To make it trainable, the reparameterization trick is commonly used.

Let us assume that $\mathbf{z}$ is sampled data from a Gaussian distribution with mean $\mathbf{\mu}$ and variance $\mathbf{\sigma^{2}}$, $\mathcal{N}(\mathbf{\mu}, \mathbf{\sigma}^{2} \mathbf{I})$. Then, the distribution of $\mathbf{z}$ can be expressed as follows:

$$
\mathbf{z} = \mathbf{\mu} + \mathbf{\sigma} \odot \boldsymbol{\epsilon}
$$

where $\boldsymbol{\epsilon}$ is another random variable sampled from a standard Gaussian distribution, $\mathcal{N}(0,\mathbf{I})$, and $\odot$ denotes element-wise multiplication. By this reparameterization, we can now backpropagate gradients through $\mathbf{z}$ since it is now a deterministic function of $\mathbf{\mu}$, $\mathbf{\sigma}$, and $\boldsymbol{\epsilon}$.

The mean and variance of $\mathbf{z}$ are equal to $\mathbf{\mu}$ and $\mathbf{\sigma^{2}}$, respectively.

#### Expectation
$$
\begin{aligned}
E[\mathbf{z}] &= E[\mathbf{\mu} + \mathbf{\sigma} \odot \mathbf{\epsilon}] \\
&= E[\mathbf{\mu}] + \sigma E[\mathbf{\epsilon}] \\
&= \mathbf{\mu}
\end{aligned}
$$
The expectation of $\mathbf{\epsilon}$ is 0 by definition, since it is a standard normal distribution with a mean of 0.

#### Variance
$$
\begin{aligned}
Var[\mathbf{z}] &= Var[\mathbf{\mu} + \mathbf{\sigma} \odot \mathbf{\epsilon}] \\
&= Var[\mathbf{\sigma} \odot \mathbf{\epsilon}] \\
&= \mathbf{\sigma}^{2} Var[\mathbf{\epsilon}] \\
&= \mathbf{\sigma}^{2}
\end{aligned}
$$
By definition, the variance of $\mathbf{\epsilon}$ is equal to 1.