### VAE as Generative Model

In this part we will draw a comparison between Autoencoders and Variational Autoencoders. We will see how plain AE are good to learn a single latent representation of data but they are not good for generation when we sample from proability distribtion. On the other hand, VAE can map input data to continuous probability distribution. This distribution is normal distribution in case of VAEs. Once we learn this mapping, we can generate new images by simply taking samples from the learnt distribution. 

### Imports

In [None]:
import os
import random
import numpy as np
from glob import glob
from scipy.stats import norm
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid

import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Input, Conv2D, Flatten, Dense, Conv2DTranspose, Reshape
from keras.layers import Lambda, Activation, BatchNormalization, LeakyReLU, Dropout
from keras.models import Model
from keras import backend as K
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint 
from keras.utils import plot_model

# just a little hack so that tensorflow can accpet our custom loss function
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

### Loading, Unzipping and Displaying the Dataset

Mounting your google drive.

In [None]:
from google.colab import drive
drive.mount('/drive')

Unzipping the data file to load it locally in the colab runtime. You can see your unzipped files by clicking the folder icon on left side of your colab.

In [None]:
# replace this your google drive path of the zip file of dataset provided with this homework
!unzip -o -q "/drive/MyDrive/CS5317_DeepLearning_SP21/Assign04/ffhq-dataset.zip" -d "/content/data/"

In [None]:
DATA_FOLDER = '/content/data/'

filenames = np.array(glob(os.path.join(DATA_FOLDER, '*/*.png')))
NUM_IMAGES = len(filenames)
print("Total number of images : " + str(NUM_IMAGES))

The dataset is quite large (70000 images) which makes it impossible to load it all at the same time in computer memory. We will use Keras' <i>ImageDataGenerator</i> object and call its member function - <i>flow_from_directory</i> to define the flow of data directly from disk rather than loading the entire dataset into memory. You can also apply various transformations (for augmentation) to the images directly while loading the data (e.g normalizing, rescaling, rotating etc).


You can read more about this in official Keras [documentation](https://keras.io/preprocessing/image/#flow_from_directory).

Below we have setup everything you need for this dataset. You will need to pass <i>data_flow</i> object to <i>fit</i> functions later when training your model.

In [None]:
INPUT_DIM = (128,128,3) # Image dimension
BATCH_SIZE = 512        # batch of images returned by ImageDataGenerator
Z_DIM = None            # Dimension of the latent vector (z) [Specify your latent_vector dimension here]

data_flow = ImageDataGenerator(rescale=1./255).flow_from_directory(DATA_FOLDER, target_size = INPUT_DIM[:2], 
                                                                   batch_size = BATCH_SIZE, shuffle = True, 
                                                                   class_mode = 'input', subset = 'training')

Utility function to display grid of images.

In [None]:
def display_image_grid(images, num_rows, num_cols, title_text):

    fig = plt.figure(figsize=(num_cols*3., num_rows*3.), )
    grid = ImageGrid(fig, 111, nrows_ncols=(num_rows, num_cols), axes_pad=0.15)

    for ax, im in zip(grid, images):
        ax.imshow(im)
        ax.axis("off")
    
    plt.suptitle(title_text, fontsize=20)
    plt.show()

Displaying some samples images from the dataset.

In [None]:
# a batch of 512 images returned by data generator
sample_images = next(data_flow)[0] 

# only taking 10 of those to display
sample_images = sample_images[:10]

# displaying the images
display_image_grid(sample_images, 2, 5, "Some sample Images from ffhq-dataset")

## AUTOENCODER

#### Encoder

Below you will create the model for your encoder just like the one in Image Completion task (but not necessarily that same). The architecture of the Encoder consists of a stack of convolutional layers followed by a dense (fully connected) layer which outputs a vector of size <i>Z_DIM</i>. The whole image of size 128x128x3 is decoed into this latent space vector of size <i>Z_DIM</i>.


NOTE: You can experiment with the number of feature maps, kernel size, strides and number of conv layer.

In [None]:
ae_encoder = None
ae_encoder_input = None
ae_encoder_output = None

######################## WRITE YOUR CODE BELOW ########################


########################### END OF YOUR CODE ##########################

ae_encoder.summary()

#### Decoder

Just like the encoder you will create the model for the decoder. This model can be the exact mirror of encoder model, but that is not mandatory.

Since the function of the Decoder to reconstruct the image from the latent vector. Therefore, it is necessary to define the decoder so as to increase the size of the activations gradually through the network. This can be achieved through the  [Conv2DTransponse](https://keras.io/layers/convolutional/#conv2dtranspose) layer. This layer produces an output tensor double the size of the input tensor in both height and width.

Again, you can experiment with the number of feature maps, kernel size, strides and number of conv layer.


In [None]:
######################## WRITE YOUR CODE BELOW ########################


########################### END OF YOUR CODE ##########################

ae_decoder.summary()

#### Attaching the Decoder to the Encoder

Finally, here we connect the encoder to the docoder.

In [None]:
autoencoder_model = None

######################## WRITE YOUR CODE BELOW ########################

# The input of the autoencoder will be the same as of encoder

# The output of the autoencoder will be the output of decoder, when passed encoder input


# Input to the combined model will be the input to the encoder.
# Output of the combined model will be the output of the decoder.


########################### END OF YOUR CODE ##########################

autoencoder_model.summary()

### Training the AE

The hyperparameters are the same as given in the Image Completion task.

Also for training, you can use <i>Adam</i> optimizer with the learning rate given below (or you can try out your own).

The number of epochs given are 10, but experiment with that number to know where you can acheive the best results.

In [None]:
LEARNING_RATE = 0.0005
N_EPOCHS = 10

######################## WRITE YOUR CODE BELOW ########################

# compile your model here

########################### END OF YOUR CODE ##########################

Now simply call the <i>fit</i> function of the model with the appropriate paramters.

In [None]:
######################## WRITE YOUR CODE BELOW ########################


########################### END OF YOUR CODE ##########################

### Reconstruction

Now we will get a batch of images from ImageDataGenerator object and try to reconstruct the images after passing it through our autoencoder.

The first image grid shows the original images and the second grid shows the reconstructed images after passing it through the AE.

In [None]:
test_batch = next(data_flow)[0]
test_images = test_batch[:10]

reconst_images = autoencoder_model.predict(test_images)

display_image_grid(test_images, 2, 5, "Original Images")
display_image_grid(reconst_images, 2, 5, "Reconstructed Images with Autoencoder")

<i>NOTE:</i> The reason that you are seeing the reconstructed images as blurry because MSE averages out the differences between individual pixel values. GANs (which you will see in the next assignment) on the other hand produces much sharper results. 

Adding noise before decoding the images

In [None]:
num_of_images = 10

# encoding our images
encodings = ae_encoder.predict(test_images)

# adding random normal noise to the encoded latent vectors
encodings += np.random.normal(0.0, 1.0, size = (num_of_images, Z_DIM))

# reconstruct from noisy latent vector
reconst_images_noisy = ae_decoder.predict(encodings)

display_image_grid(reconst_images_noisy, 2, 5, "Reconstructed Images from Noisy Latent Vector")

It can be observed that the images are starting to get distorted with a bit of noise added to its encodings. One possible reason could be that the model did not ensure that the space around the encoded values (latent space) was continuous. We will see later how to overcome this with the help of Variational Autoencoder.

### Generation 
Generate images from latent vectors sampled from a standard normal distribution

In [None]:
reconst_images = ae_decoder.predict(np.random.normal(0,1,size=(num_of_images, Z_DIM)))

display_image_grid(reconst_images, 2, 5, "Images generated by sampling from normal distribution")

It is evident that the latent vector sampled from a standard normal distribution can not be used to generate new faces. This shows that the latent vectors generated by the model are not centered/symmetrical around the origin. This also strengthens our inference that the latent space is not continuous.

Since we do not have a definite distribution to sample latent vectors from, it is unclear as to how we can generate new faces. We observed that adding a bit of noise to the latent vector does not produce new faces. We can encode and decode images but that does not meet our objective of learning the joing distribution of data. 

Building on this thought, wouldn't it be great if we could generate new faces from latent vectors sampled from a standard normal distribution? This is essentially what a Variational Autoencoder does.

## VARIATIONAL AUTOENCODER



Variational Autencoders tackle most of the problems discussed above. They are trained to generate new faces from latent vectors sampled from a standard normal distribution. While a Simple Autoencoder learns to map each image to a fixed point in the latent space, the Encoder of a Variational Autoencoder (VAE) maps each image to a z-dimensional standard normal distribution. 



Here is a high level overview of what a Variational Autoencoder does.

<br>
<img src="https://blog.bayeslabs.co/assets/img/vae.jpg"> 
<br><i>Source : blog.bayeslabs.co/2019/06/04/All-you-need-to-know-about-Vae</i>

#### Encoder

The encoder for the Variational AE is little trickier than the simple autoencoder.  While a Simple Autoencoder learns to map each image to a fixed point in the latent space, the Encoder of a Variational Autoencoder (VAE) maps each image to a z-dimensional standard normal distribution. It will loos something like this:

<center>

![picture](https://drive.google.com/uc?export=view&id=1e3iGHK0s83O-RjpbRkO4OWKU52B5kS2I)

</center>

You will need Keras' functional API to make this type of model as this is not a simple feed forward network. You can read and learn more about Function API [here](https://keras.io/guides/functional_api/).

The input to the Decoder, as shown in the image above is a vector sampled from the normal distribution represented by the output of the Encoder - $\mu$ and $\sigma$. This sampling can be done as follows:

<center>
$Z = \mu + \sigma\varepsilon$
</center>

where $\varepsilon$ is a sampled from a multivariate standard normal distribution.


In [None]:
mean_mu = None
var = None
vae_encoder_input = None
vae_encoder = None
vae_encoder_output = None

######################## WRITE YOUR CODE BELOW ########################

# define your convolution layers here


# define mean and var dense layers outputed by your model

########################### END OF YOUR CODE ##########################

# here is your model outputing mu and var seperately
# now we will take samples from these paramters of distributions
# and these samples will be our latent vector, which 
# can be feed to our decoder


# Defining a function for sampling
# this function takes mean and var vectors and 
def sample_from_distribution(args):
  mean_mu, var = args
  epsilon = K.random_normal(shape=K.shape(mean_mu), mean=0., stddev=1.) 
  return mean_mu + K.exp(var/2)*epsilon   
  
# Using a Keras Lambda Layer to include the sampling function as a layer in the model
vae_encoder_output = Lambda(sample_from_distribution, name='encoder_output')([mean_mu, var])

vae_encoder = Model(vae_encoder_input, vae_encoder_output)
vae_encoder.summary()
# tf.keras.utils.plot_model(vae_encoder, "vae_encoder.png", show_shapes=True)

#### Decoder
Since the Decoder remains the same, you can use the same architecture of the decoder of Autoencoder.

In [None]:
vae_decoder_input = None
vae_decoder_output = None
vae_decoder = None

######################## WRITE YOUR CODE BELOW ########################


########################### END OF YOUR CODE ##########################


vae_decoder.summary()

#### Attaching the Decoder to the Encoder

Just like in the case of autoencoder, you will connect your encoder with docer and make the final model.


In [None]:
######################## WRITE YOUR CODE BELOW ########################


# The input to the model will be the image fed to the encoder.


# Output will be the output of the decoder. The term - decoder(encoder_output) 
# combines the model by passing the encoder output to the input of the decoder.


# Input to the combined model will be the input to the encoder.
# Output of the combined model will be the output of the decoder.


########################### END OF YOUR CODE ##########################

vae_model.summary()

### Loss Function

The loss function is a sum of MSE and KL Divergence. MSE error contorls the quality of images (as already seen in the simple autoencoder) while including the KL divergence loss in addition to the MSE loss, the VAE is forced to ensure that the encodings are very similar to a multivariate standard normal distribution. Since a multivariate standard normal distribution has a zero mean, it is centered around the origin. Mapping each image to a standard normal distribution as opposed to a fixed point ensures that the latent space is continuous and the latent vectors are centered around the origin. Here is equation of KL loss, where <b>$\mu$</b> and <b>$\sigma$</b> are the vectors returned by encoder.

<br>
<center>
$D_{KL}[N(\mu,\sigma) \ || \ N(0,1)] = \frac{1}{2}\sum_{i=1}^{z}(1+\log(\sigma_i^2) - \mu_i^2 - \sigma_i^2)$
</center>
</br>

A weight (loss factor) is assigned to the MSE loss. This penalizes the model by loss factor more than KL Diverge to ensure that images produced are of good quality. If we make this loss factor small, the images will be not of good quality. If we make this loss factor large, then our model will simply act as a simple AE.

Hence, this (loss_factor) is also a hyperparamter that you need to take care of. 

Here is the [link](https://keras.io/api/losses/) to Keras documentation on how to create custom loss functions. 

In [None]:
def total_loss(y_true, y_pred):

    mean_vector = mean_mu   # mean vector outputed by encoder
    var_vector = var        # var vector outputed by encoder
    mse_loss = 0
    kl_loss = 0 

    ######################## WRITE YOUR CODE BELOW ########################

    # calculate mse loss here

    # calculate kl loss here

    ########################### END OF YOUR CODE ##########################

    return LOSS_FACTOR * mse_loss + kl_loss

The loss function is a sum of RMSE and KL Divergence. A weight is assigned to the RMSE loss, known as the loss factor. The loss factor is multiplied with the RMSE loss. If we use a high loss factor, the drawbacks of a Simple Autoencoder start to appear. However, if we use a loss factor too low, the quality of the reconstructed images will be poor. Hence the loss factor is a hyperparameter that needs to be tuned.

In [None]:
LEARNING_RATE = None
N_EPOCHS = None
LOSS_FACTOR = None

### Training the VAE


Compile your model below.

In [None]:
######################## WRITE YOUR CODE BELOW ########################


########################### END OF YOUR CODE ##########################

Now call fit function on your model with appropriate paramters.

In [None]:
######################## WRITE YOUR CODE BELOW ########################


########################### END OF YOUR CODE ##########################

### Reconstruction
The reconstruction process is the same as that of the Simple Autoencoder.

In [None]:
test_batch = next(data_flow)[0]
test_images = test_batch[:10]

reconst_images = vae_model.predict(test_images)

display_image_grid(test_images, 2, 5, "Original Images")

display_image_grid(reconst_images, 2, 5, "Reconstructed Images from Variational Autoencoder")

###Generation
Generating new faces from random vectors sampled from a standard normal distribution. 

In [None]:
reconst_images = vae_decoder.predict(np.random.normal(0,1,size=(10, Z_DIM)))

display_image_grid(reconst_images, 2, 5, "Images generated by sampling from normal distribution")

The VAE is evidently capable enough of producing new faces from vectors samped from a standard normal distribution. The fact that a neural network is capable of generating new faces from random noise shows how powerful it is in performing extremely complex mappings!

## REPORT

Report your results for different values of <b>Z_DIM</b>, <b>learning rate</b>, <b>optimizers</b>, <b> encoder and decoder model and Loss Factor</b> and tell us for which configuration you acheived the best results (The best run model should be the last run model in this notebook, showing the results in the cell above).

Answer: