## 3.1 Variational autoencoder

[Auto-Encoding Variational Bayes (Kingma, Welling, 2013)](https://arxiv.org/abs/1312.6114)

Solution is based on [Oliver Dürr's](https://github.com/oduerr/dl_tutorial/tree/master/tensorflow/vae).

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Import the MNIST dataset in the same way as for the perceptron.

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

## Encoder

Create a placeholder called `x` for data minibatches of size 64. Then create two fully connected layers in the same way as for the classifier, but here in the last layer, instead of a softmax, we should map with a linear trainsformation into the parameters of the gaussian posterior. 

$$h_2(x)=\sigma\circ A_2\circ \sigma \circ A_1\circ x$$

$$\mu=A_{3m}\circ h_2(x)$$

$$\sigma_{pre}=A_{3v}\circ h_2(x)$$

These should for now be two `n_z` dimensional vector variables named `z_mean` and `z_sigma_pre`. Try `n_z = 2` and `n_h = 500` for the hidden layer widths.

Now define the sampling from the variational posterior, where we need to generate z samples for the whole minibatch. Generate an epsilon from a `tf.random_normal` with zero mean and unit variance, then:

$$z=\mu+\sqrt{e^{\sigma_{pre}}}\cdot \epsilon$$

## Decoder

The generative model or decoder takes the latent $z$ values as inputs and generates datapoints from them. Define a two-layer fully connected network again to use as a generative model of digits. The last layer should map linearly into a vector of mean pixel values named `x_reconstr_mean`.

## Loss function

The ELBO loss consists of the reconstruction and regularization terms. The first is the expected value of the log pdf of data conditioned on the inferred z value, the second is the kl divergence between the inferred posterior and the prior. The KL term can be calculated analytically for the case where both the prior and the posterior are gaussians.

## Training

This training will need a bit more time so I've provided a training code skeleton with a progress bar.

```
runs = 10
n_minibatches = int(mnist.train.num_examples / batch_size)

sess = tf.InteractiveSession()
init = tf.global_variables_initializer()
sess.run(init)

for epoch in range(runs):
    pbar = tf.contrib.keras.utils.Progbar(n_minibatches)
    for i in range(n_minibatches):
    
        #YOUR CODE HERE - update and calculate "cost_" on a minibatch
        
        pbar.add(1,[("cost",cost_)])
```

If you want to leave the training running for a longer while and save the resulting model for later analysis, you can load the tensorflow saver with `saver = tf.train.Saver()`, and save the current state with `saver.save(sess, "./model.ckpt")`. You can later restore the saved parameters with `saver.restore(sess, check_point_file)`.

## Load and analyse model

Plot some reconstructions. You can use the following code for a plot grid:
    
```
n_rec = 10
plt.figure(figsize=(n_rec+2,4))
for i in range(n_rec):
    plt.subplot(2, n_rec, i+1)
    
    #YOUR CODE HERE - plot command for samples
    
    plt.subplot(2, n_rec, n_rec+i+1)
    
    #YOUR CODE HERE - plot command for reconstructions
    
plt.tight_layout()
```

Plot the z space embedding of the data.

Visualise the latent space by plotting images conditioned on a grid in $z$.