# An Introduction to Variational Autoencoders

Variational autoencoders (VAEs) are one of the most interesting applications of deep learning. Although they are not used for many practical applications, they still provide an insight into the power of deep generative modeling.  
  
<img src="https://www.jeremyjordan.me/content/images/2018/03/Screen-Shot-2018-03-06-at-3.17.13-PM.png" width=700>  
  
* The basic idea behind an autoencoder is to generate a low-dimensional (latent) representation of a high-dimensional input.
* We achieve this by asking the model to simply recreate the input it is given. However, we impose an *information bottleneck* upon the model, so that it is forced to lose a massive amount of information from the original input in the process.
* The model is therefore encouraged to encode and retain as much useful information as it passes through the bottleneck. This results in the development of two submodels: an **encoder** and a **decoder**.  
* The **encoder** is the part of the model *before the bottleneck*: devoted to encoding the input into a information-rich low-dimensional form. We call this low-dimensional form a **latent representation**.
* The **decoder** is the part of the model *after the bottleneck*: devoted to recreating the original image from the latent representation.  
  
A *variational* autoencoder works in the same way, but instead of directly learning the latent representation, we learn parameters μ (mean) and σ (standard deviation) for a probability distribution from which we sample the latent representation.  
  
<img src="https://i.stack.imgur.com/49HNA.png" width=700>  
  
This notebook aims to provide an intuitive understanding of variational autoencoders by demonstrating their capabilities in a visual manner.  
  
#### **NOTE:** *In order to use the widgets in this notebook, you must make a copy and run the code yourself.*  
  
*Images courtesy of www.jeremyjordan.me*

# Imports

In [None]:
# For working with and visualizing the data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# For training the VAE
import tensorflow as tf

# For creating interactive widgets
import ipywidgets as widgets
from IPython.display import display

In [None]:
# Load the data from a .csv file
pixel_data = pd.read_csv('../input/age-gender-and-ethnicity-face-data-csv/age_gender.csv')['pixels']

In [None]:
# Shuffle the data
pixel_data = pixel_data.sample(frac=1.0, random_state=1)
# Convert the data into a NumPy array
pixel_data = pixel_data.apply(lambda x: np.array(x.split(" "), dtype=np.int))
pixel_data = np.stack(np.array(pixel_data), axis=0)
# Rescale pixel values to be between 0 and 1
pixel_data = pixel_data * (1./255)

In [None]:
# The data is now a NumPy array of 23705 images (each represented as a 1-D vector of 2304 pixels)
# (2304 is 48^2, so we are working with 48x48x1 images)
pixel_data.shape

# Building the VAE  
  
We need to create a custom Sampling layer to sample the latent variables from a normal distribution with mean and variance given by the encoder.

In [None]:
class Sampling(tf.keras.layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return epsilon * tf.exp(z_log_var * 0.5) + z_mean

In [None]:
def build_vae(num_pixels, num_latent_vars=3):
    
    # Encoder
    encoder_inputs = tf.keras.Input(shape=(num_pixels,))
    x = tf.keras.layers.Dense(512, activation='relu')(encoder_inputs)
    x = tf.keras.layers.Dense(128, activation='relu')(x)
    x = tf.keras.layers.Dense(32, activation='relu')(x)
    z_mean = tf.keras.layers.Dense(num_latent_vars)(x)
    z_log_var = tf.keras.layers.Dense(num_latent_vars)(z_mean)
    z = Sampling()([z_mean, z_log_var])
    
    encoder = tf.keras.Model(inputs=encoder_inputs, outputs=z)
    
    # Decoder
    decoder_inputs = tf.keras.Input(shape=(num_latent_vars,))
    x = tf.keras.layers.Dense(32, activation='relu')(decoder_inputs)
    x = tf.keras.layers.Dense(128, activation='relu')(x)
    x = tf.keras.layers.Dense(512, activation='relu')(x)
    reconstruction = tf.keras.layers.Dense(num_pixels, activation='linear')(x)
    
    decoder = tf.keras.Model(inputs=decoder_inputs, outputs=reconstruction)
    
    # Full model
    model_inputs = encoder.input
    model_outputs = decoder(encoder.output)
    
    model = tf.keras.Model(inputs=model_inputs, outputs=model_outputs)
    
    # Compile model for training
    model.compile(
        optimizer='adam',
        loss='mse'
    )
    
    # Return all three models
    return encoder, decoder, model

In [None]:
face_encoder, face_decoder, face_model = build_vae(num_pixels=2304, num_latent_vars=3)

In [None]:
print(face_encoder.summary())

In [None]:
print(face_decoder.summary())

# Train the VAE  
  
We will use *pixel_data* as both the input to the model and the target to compare the output to.

In [None]:
history = face_model.fit(
    pixel_data,
    pixel_data,
    validation_split=0.2,
    batch_size=32,
    epochs=100,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=3,
            restore_best_weights=True
        )
    ]
)

# Image Reconstruction  
  
Let's see how the model does at reconstructing an image that it has already seen.

In [None]:


i = 6

sample = np.array(pixel_data)[i].copy()
sample = sample.reshape(48, 48, 1)

reconstruction = face_model.predict(pixel_data)[i].copy()
reconstruction = reconstruction.reshape(48, 48, 1)

plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.imshow(sample, cmap='gray')
plt.axis('off')
plt.title("Original Image")

plt.subplot(1, 2, 2)
plt.imshow(reconstruction, cmap='gray')
plt.axis('off')
plt.title("Reconstructed Image")

plt.show()

# Specify our own latent variable values  
  
Now let's see how we can use our own values to generate never-before-seen images.

In [None]:
# A function to allow us to specify our own latent variable values and plot the constructed image
def generate_face_image(latent1, latent2, latent3):
    latent_vars = np.array([[latent1, latent2, latent3]])
    reconstruction = np.array(face_decoder(latent_vars))
    reconstruction = reconstruction.reshape(48, 48, 1)
    plt.figure()
    plt.imshow(reconstruction, cmap='gray')
    plt.axis('off')
    plt.show()

In [None]:
# Let's get the min and max for each slider on the interactive widget
latent1_min = np.min(face_encoder(pixel_data).numpy()[:, 0])
latent1_max = np.max(face_encoder(pixel_data).numpy()[:, 0])

latent2_min = np.min(face_encoder(pixel_data).numpy()[:, 1])
latent2_max = np.max(face_encoder(pixel_data).numpy()[:, 1])

latent3_min = np.min(face_encoder(pixel_data).numpy()[:, 2])
latent3_max = np.max(face_encoder(pixel_data).numpy()[:, 2])

In [None]:
# Using ipywidgets, we can create a cool interactive widget for visualizing the results
# NOTE: You must edit the notebook to be able to use the widget
face_image_generator = widgets.interact(
    generate_face_image,
    latent1=(latent1_min, latent1_max),
    latent2=(latent2_min, latent2_max),
    latent3=(latent3_min, latent3_max),
)

display(face_image_generator)

# YouTube Tutorial Included!  
  
***
  
This notebook was made to accompany a YouTube video that I made for my channel.  
  
If you want an in-depth explanation of the steps taken, you can check out the video here:  
https://youtu.be/ZfxNcO6BqDo