## Variational Autoencoder (VAE)
A tutorial with code for a VAE as described in [Kingma and Welling, 2013](http://arxiv.org/abs/1312.6114). A talk with more details was given at the [DataLab Brown Bag Seminar](https://home.zhaw.ch/~dueo/bbs/files/vae.pdf).

Much of the code was taken, from https://jmetzen.github.io/2015-11-27/vae.html. However, I tried to focus more on the mathematical understanding, not so much on design of the algorithm.

### Some theoretical considerations 

#### Outline
Situation: $x$ is from a high-dimensional space and $z$ is from a low-dimensional (latent) space, from which we like to reconstruct $p(x)$. 

We consider a parameterized model $p_\theta(x|z)$ (with parameter $\theta$), to construct x for a given value of $z$. We build this model: 

* $p_\theta(x | z)$ with a neural network determening the parameters $\mu, \Sigma$ of a Gaussian (or as done here with a Bernoulli-Density). 

#### Inverting $p_\theta(x | z)$

The inversion is not possible, we therefore approximate $p(z|x)$ by $q_\phi (z|x)$ again a combination of a NN determening the parameters of a Gaussian

* $q_\phi(z | x)$ with a neural network + Gaussian 

#### Training

We train the network treating it as an autoencoder. 

#### Lower bound of the Log-Likelihood
The likelihood cannot be determined analytically. Therefore, in a first step we derive a lower (variational) bound $L^{v}$ of the log likelihood, for a given image. Technically we assume a discrete latent space. For a continous case simply replace the sum by the appropriate integral over the respective densities. We replace the inaccessible conditional propability $p(z|x)$ with an approximation $q(z|x)$ for which we later use a neural network topped by a Gaussian.

\begin{align}
L & = \log\left(p(x)\right) &\\
  & = \sum_z q(z|x) \; \log\left(p(x)\right) &\text{multiplied with 1 }\\
  & = \sum_z q(z|x) \; \log\left(\frac{p(z,x)}{p(z|x)}\right) &\\
  & = \sum_z q(z|x) \; \log\left(\frac{p(z,x)}{q(z|x)} \frac{q(z|x)}{p(z|x)}\right) &\\
  & = \sum_z q(z|x) \; \log\left(\frac{p(z,x)}{q(z|x)}\right) + \sum_z q(z|x) \; \log\left(\frac{q(z|x)}{p(z|x)}\right) &\\
  & = L^{\tt{v}} + D_{\tt{KL}} \left( q(z|x) || p(z|x) \right) &\\
  & \ge L^{\tt{v}} \\
\end{align}

The KL-Divergence $D_{\tt{KL}}$ is always positive, and the smaller the better $q(z|x)$ approximates $p(z|x)$


### Rewritting  $L^\tt{v}$
We split $L^\tt{v}$ into two parts.

\begin{align}
L^{\tt{v}} & = \sum_z q(z|x) \; \log\left(\frac{p(z,x)}{q(z|x)}\right)  & \text{with} \;\;p(z,x) = p(x|z) \,p(z)\\
  & =  \sum_z q(z|x) \; \log\left(\frac{p(x|z) p(z)}{q(z|x)}\right)  &\\
  & =  \sum_z q(z|x) \; \log\left(\frac{p(z)}{q(z|x)}\right)  + \sum_z q(z|x) \; \log\left(p(x|z)\right) &\\
  & =  -D_{\tt{KL}} \left( q(z|x) || p(z) \right)  +  \mathbb{E}_{q(z|x)}\left( \log\left(p(x|z)\right)\right) &\text{putting in } x^{(i)} \text{ for } x\\
  & =  -D_{\tt{KL}} \left( q(z|x^{(i)}) || p(z) \right)  +  \mathbb{E}_{q(z|x^{(i)})}\left( \log\left(p(x^{(i)}|z)\right)\right) &\\
\end{align}

Approximating $\mathbb{E}_{q(z|x^{(i)})}$ with sampling form the distribution $q(z|x^{(i)})$

#### Sampling 
With $z^{(i,l)}$ $l = 1,2,\ldots L$ sampled from $z^{(i,l)} \thicksim q(z|x^{(i)})$
\begin{align}
L^{\tt{v}} & = -D_{\tt{KL}} \left( q(z|x^{(i)}) || p(z) \right)  
+  \mathbb{E}_{q(z|x^{(i)})}\left( \log\left(p(x^{(i)}|z)\right)\right) &\\
L^{\tt{v}} & \approx -D_{\tt{KL}} \left( q(z|x^{(i)}) || p(z) \right)  
+  \frac{1}{L} \sum_{i=1}^L \log\left(p(x^{(i)}|z^{(i,l)})\right) &\\
\end{align}

#### Calculation of $D_{\tt{KL}} \left( q(z|x^{(i)}) || p(z) \right)$
TODO