# Introduction
The Variational Autoencoder (VAE) introduced in the paper [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114), an extension of autoencoder neural network architecture, is used for generative modeling. It is widely being used to generate all kinds of art: text, images, audio, etc. In this notebook, we will learn how a VAE works by building and training one.

## Autoencoder vs Variational Autoencoder
An autoencoder learns to reconstruct the input image by forcing the model to learn it's dense and compact representation. This creates  a bottleneck in the information flow and forces the model to extract and learn the most important features which are necessary to reconstruct the image. Although this works, it does not gives us a continuous latent space. It just clusters the input data points in the latent space. 

### Latent space
To generate samples, the latent space needs to be continous as we should be able to sample any point in the space and be able to get a realistic sample. If the space is continuous, we also get the benefit of finding different VARIATIONS of a given sample. For example, given a sample face, we could extract a new sample having the same face but with additional features like wearing a hat or sunglasses. Different directions in the latent space have different meanings and we can follow a vector from particular points to add those meanings to the points. This process is called Latent Space Interpolation.

#### Making the latent space continuous
To convert the discrete latent space learned by an autoencoder into continuous, we add stochasticity to it. Instead of learning fixed points representing the space, we learn the probability distributions from which we can sample the points. Instead of learning one latent space vector, we learn two different vectors which represent the mean and the standard deviation of the probability distribution of the latent space respectively. This gives us smooth regions around the training data.

We still have two problems to address in this setup which we will discuss below as we proceed with building and training the network.

# Data Preparation
We will use the [Fashion-MNIST](https://arxiv.org/abs/1708.07747) dataset provided by Zalando Research for training our VAE. It is inspired by MNIST dataset and has grayscale images of fashion articles. It has the same number of data samples, classes and image size but is more complex thant the MNIST dataset. We will import the dataset from the Keras Datasets API, add the channels dimension, which in our case is 1 since the images are in grayscale, and normalize it. We will ignore the labels as we do not need them for training our VAE.

In [6]:
from keras.datasets import fashion_mnist
(x_train, _), (x_test, _) = fashion_mnist.load_data()
x_train = (x_train.reshape(x_train.shape + (1,)))/255.
x_test = (x_test.reshape(x_test.shape + (1,))) / 255.
print(x_train.shape)
print(x_test.shape)

(60000, 28, 28, 1)
(10000, 28, 28, 1)


# Building the encoder model
As discussed above, we will force the model to learn a probability distribution of the compact latent space so that it becomes continuous. We start with the image, extract features from it using convolutions, and learn two vectors that we will use to sample a point from latent space. We store the shape of the output of the last convolution layer, which we will use when we reconstruct our image back from the latent representation.

In [0]:
from keras import backend as K
from keras import Input
from keras.layers import Conv2D, Flatten, Dense
from keras.models import Model
import numpy as np

K.clear_session()

# Hyperparameters
batch_size = 64
hidden_state_size = 8

x_input = Input(shape=(28, 28, 1), name='input_image')
x = Conv2D(32, 3, padding='same', activation='relu')(x_input)
x = Conv2D(64, 3, padding='same', activation='relu', strides=(2, 2))(x)
x = Conv2D(128, 3, padding='same', activation='relu')(x)
x = Conv2D(128, 3, padding='same', activation='relu')(x)
shape = x.shape.as_list()[1:]
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
z_mean = Dense(hidden_state_size, name='z_mean')(x)
z_log_variance = Dense(hidden_state_size, name='z_log_variance')(x)

## Sampling
By adding stochasticity to the latent space, we force it to become continuous. But, we can only backpropagate our errors through deterministic nodes. To solve this, we apply a hack called the "Reparameterization Trick". Instead of making our latent space vector a stochastic node, we make it deterministic by adding a different stochastic node called "epsilon". The epsilon node simply samples values from a standard normal distribution, we multiply this epsilon with the standard deviation and add this term to the mean to get our latent space representation. The epsilon node is stochastic, but it is just a predefined distribution and we do not need to learn its parameters. This excludes it from the backpropagation and instead makes our latent space node deterministic. 

In [8]:
from keras.layers import Lambda
def sampling(args):
    z_mean, z_log_variance = args
    epsilon = K.random_normal(shape=(K.shape(z_mean)[0], hidden_state_size),
                              mean=0., stddev=1.)
    return z_mean + K.exp(z_log_variance) * epsilon
z = Lambda(sampling, name='z')([z_mean, z_log_variance])
encoder = Model(x_input, z)
encoder.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_image (InputLayer)        (None, 28, 28, 1)    0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 28, 28, 32)   320         input_image[0][0]                
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 14, 14, 64)   18496       conv2d_1[0][0]                   
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 14, 14, 128)  73856       conv2d_2[0][0]                   
__________________________________________________________________________________________________
conv2d_4 (