# VAE (Graded)

Welcome to your Variational Auto Encoders (required) programming assignment! You will build a **face generation** model using VAE. You will be using [CelebA](https://www.kaggle.com/datasets/jessicali9530/celeba-dataset) dataset which contains around 30k+ images of males and females.

Your goal is to build a robust face image generator model using Variational Auto Encoders.

**Instructions:**
* Do not modify any of the codes.
* Only write code when prompted. For example in some sections you will find the following,
  ```
  # YOUR CODE GOES HERE
  # YOUR CODE STARTS HERE
  # TODO
  ```
Only modify those sections of the code.

**You will learn to:**
* Understanding VAEs including encoder, decoder, latent space and reparameterization trick.
* Preprocessing the images for training
  * Practicing data preprocessing steps like resizing, normalization, and batching.
* Implementing the VAE loss function which consists of reconstruction and KL Divergence loss.
* Training and Generating Images:
  * Training the VAE model on the CelebA dataset.
  * You will learn how to generate new images by sampling random points from the latent space and using the decoder to create corresponding images.
* Using Convolutional Layers:
  * You will learn how to use Convolutional and Transposed Convolutional layers for effective image processing within the encoder and decoder networks.


  <img src='https://user-images.githubusercontent.com/17472642/69832523-e1720c00-11fc-11ea-96a3-df39b8c73a4c.png' width=500>

# Data Loading and Preprocessing

**Instructions:**
1. Set up the training directory path
2. Load the dataset using `tf.keras.preprocessing.image_dataset_from_directory`
    * Set `image_size` and `batch_size`
3. Create a normalization function to scale pixel values to [0, 1]
4. Apply normalization and set up dataset prefetching


In [None]:
# TODO
import tensorflow as tf
from tests import *
from helpers import * 

validator = VAEValidator()

# (TODO)1. Setup the training directory path
train_dir = 'celeba_hq/train'

# (TODO)2. Load the training dataset
train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    train_dir,
    labels=None,  # Set labels=None since we only care about the images
    image_size=   # Resize images (For eg: (64, 64))
    batch_size= # Set the batch size
    shuffle=True
)

# (TODO)3. Normalize the images to [0, 1] range
def normalize(image):
    # YOUR CODE GOES HERE
    return image

# (TODO)3. Apply the normalization
train_dataset = 

# 4.Prefetch to optimize dataset loading
train_dataset = train_dataset.prefetch(tf.data.AUTOTUNE)

validator.validate_dataset(train_dataset)

### But why do we prefetch the data?

Prefetching Optimization for Data Loading

* **Overlapping Computation and Data Loading:** Prefetching aims to overlap the preprocessing and model execution of a training step. While your model is busy training on a batch of data, the input pipeline in the background is already preparing the next batch. This helps reduce idle time for the GPU or CPU, leading to faster training.

* **Improved Efficiency:** By keeping a buffer of prefetched data, prefetching ensures that your model doesn't have to wait for data to be loaded from disk or processed before it can start training on the next batch. This can significantly improve the overall training efficiency.

* `tf.data.AUTOTUNE:` When you use tf.data.AUTOTUNE as the argument for prefetch, TensorFlow automatically determines the optimal buffer size for prefetching based on the characteristics of your dataset and hardware. This further optimizes the data loading process.

* **In simpler terms:** Imagine a restaurant kitchen. Prefetching is like having the chefs prepare some ingredients in advance while the current dishes are being cooked. This way, when an order is ready, the chefs can quickly grab the prepped ingredients and start cooking the next dish without delay. It keeps the kitchen running smoothly and efficiently.

[Learn more](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/data_performance.ipynb)


# Model Building





## Encoder Network

**Instructions**

1. Define the latent dimension for the VAE
2. Lets suppose your image_size is 64, create an encoder that:
   - Takes `64x64x3` images as input
   - Uses `Conv2D` layers with increasing filters (32, 64, 128)
   - Outputs `z_mean` and `z_log_var` for the latent space
3. The architecture should follow atleast 3 `Conv2D` layers followed by a `Flatten` and a final `Dense` layer:
```
Conv2D -> Conv2D -> Conv2D -> Flatten -> Dense
```


In [None]:
# TODO

# (TODO)Import necessary libraries
import tensorflow as tf
from tensorflow.keras import layers, models
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt

# (TODO)Define the encoder
def build_encoder(input_shape, latent_dim):
    # YOUR CODE GOES HERE
    inputs = 

    z_mean = 
    z_log_var = 
    # YOUR CODE ENDS HERE
    
    return models.Model(inputs, [z_mean, z_log_var], name="encoder")


## Sampling Layer

In a VAE, we represent each latent variable $z$ as a Gaussian distribution with a mean $\mu$ and a standard deviation 
$\sigma$ (which can be parameterized by $log(\sigma^2)$ or log variance for numerical stability). The formula for sampling is:

$$
z=\mu+\sigma.\epsilon
$$

where:

* $\mu$ and $log(\sigma^2)$ (log variance) are outputs from the encoder network.
* $\sigma$ = $exp((0.5).log(\sigma^2))$ is the standard deviation
* $\epsilon$ is a random noise sampled from a standard normal distribution. $\epsilon \approx N(0,I)$ where $I$ is the identity matrix. 

**Instructions:**
1. Create a custom layer that implements the reparameterization trick
2. The layer should take `z_mean` and `z_log_var` as inputs
3. Return sampled points from the latent space


In [None]:
# TODO

# (TODO) Complete the following code
class Sampling(layers.Layer):
    def call(self, inputs):
        # YOUR CODE GOES HERE
        z_mean, z_log_var = 
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon =  # Sample epsilon
        return  # Reparameterization trick

## Decoder Network

**Instructions:**
1. Create a decoder that:
   - Takes latent vectors as input
   - Uses `Conv2DTranspose` layers to upsample
   - Outputs reconstructed images of size `64x64x3`
2. The architecture should mirror the encoder in reverse




In [None]:
# TODO

def build_decoder():
    # YOUR CODE GOES HERE
    inputs = 
    outputs = 
    return models.Model(inputs, outputs, name="decoder")


# VAE Loss function

The loss function in a Variational Autoencoder (VAE) combines two components:

**Reconstruction Loss:** This measures how well the VAE can reconstruct the input data from the latent space, ensuring that the generated output is similar to the input.
$$
Reconstruction Loss = MSE(x, \hat{x})
$$
where: 
* $x$ is the input data
* $\hat{x}$ is the reconstructed data

**KL Divergence Loss:** This regularizes the latent space to follow a standard normal distribution, encouraging the model to generate meaningful samples.
$$
KL Divergence Loss = -\frac{1}{2} \sum_{i=1}^{d} \left( 1 + \log(\sigma_i^2) - \mu_i^2 - \sigma_i^2 \right)
$$

where:

* $\mu$ and $\sigma^2$(or equivalently, $log(\sigma^2)$) are the mean and variance of the latent distribution for a given input, as produced by the encoder.

**Instructions:**
1. Implement both **reconstruction loss (MSE) and KL divergence loss**
2. Combine both losses to create the final VAE loss


In [None]:
# TODO

# VAE Loss Function
def vae_loss(inputs, outputs, z_mean, z_log_var):
    reconstruction_loss = 
    
    kl_loss = 
    return reconstruction_loss + kl_loss

## Build VAE model

**Instructions:**
1. Combine the encoder, sampling layer, and decoder
2. Implement the custom training step in `VAEModel` class. Inside the following steps inside `train_method`:
  - `tf.GradientTape`: Record operations for automatic differentiation to calculate gradients.
  - `Forward Pass`: Pass the input data through the encoder, sampling layer, and decoder to get the model's output.
  - `Loss Calculation`: Compute the VAE loss (reconstruction loss + KL divergence).
  - `Gradient Calculation`: Use tape.gradient to obtain gradients of the loss with respect to the model's trainable variables.
  - `Optimizer Update`: Apply the gradients to update the model's weights using the chosen optimizer.
  - **Return loss:** A dictionary containing the **loss** value for monitoring.


In [None]:
# TODO

# (TODO)Define the VAE hyperparameters
input_shape = 
latent_dim =  # Start with 50 and increase it gradually

# (TODO)Build Encoder
encoder = 
validator.validate_encoder(encoder, input_shape, latent_dim)

# (TODO)Build Decoder
decoder = 
validator.validate_decoder(decoder, latent_dim, input_shape)

inputs = layers.Input(shape=input_shape)
z_mean, z_log_var = encoder(inputs)
z = Sampling()([z_mean, z_log_var])
outputs = decoder(z)

# (TODO)Custom model training
class VAEModel(tf.keras.Model):
    def train_step(self, data):
      # (TODO) Perform all the operations mentioned above
        with tf.GradientTape() as tape:
            # YOUR CODE GOES HERE
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
        return {"loss": }

# Model Training

**Instructions:**
1. Create the VAE model instance
2. Compile the model
3. Train the model for as many epochs as you wish
4. Monitor the loss during training


In [None]:
# TODO

num_epochs = 

# (TODO)1. Create the VAE model instance


# (TODO)2. Compile the model with appropriate optimizer


# (TODO)Train the VAE model on CelebA
history = 

**Reflection**

Write your observations here

In [None]:
plot_faces(latent_dim, decoder, n=10)

# Improvement Strategies

Here are some improvement strategies you can consider to improve the model.

1. **Enhanced VAE Architecture**
  * **Deeper and Wider Network Layers**:

    * Increase the number of convolutional layers in the encoder and decoder.
    * Use higher filter sizes for earlier layers to capture more facial details.
    * Example: Start with 32 filters in the first layer and double the number of filters with each additional layer (e.g., 32 → 64 → 128 → 256).

  * **Residual Connections:**

    * Use residual blocks (skip connections) in both encoder and decoder to help the model capture finer details.
    * This prevents loss of information as data passes through deeper layers, aiding better reconstruction.

  * **Leverage Upsampling Layers in Decoder:**

    * Instead of Conv2DTranspose, use a combination of UpSampling2D and Conv2D layers. This can often improve the quality of generated images, especially for faces, as it helps retain image detail.
2. **Optimize the Latent Space**
  * **Increase Latent Dimension:**
    * Increase the latent dimension size from 50 to a higher value like 100 or 128, giving the model more capacity to encode complex face details.

  * **Beta-VAE:**
    * Scale the KL-divergence term with a coefficient (beta) in the VAE loss function to control the trade-off between reconstruction quality and latent space regularization. A larger beta (e.g., 4 to 10) enforces a more structured latent space but may reduce reconstruction quality. Find a balance through experimentation.
3. **Regularization Techniques**
  * **Dropout:**
  Add dropout layers in both the encoder and decoder to prevent overfitting. Dropout rates between 0.3 to 0.5 can help generalize better, especially for large datasets like CelebA.

  * **Batch Normalization:**
  Add batch normalization after each convolutional layer, particularly in the encoder. This can stabilize training and help the model learn faster.

4. **Improving Loss Functions**
  * **Perceptual Loss (Content Loss):**

    * Instead of relying solely on pixel-wise mean squared error for reconstruction, use a perceptual loss. Compute this loss using a pre-trained model (like VGG) on the reconstructed images and the original images, comparing them at higher feature levels.

  This helps the model capture high-level features, like facial structures and expressions.

5. **Training Techniques**
  * **Learning Rate Scheduler:**

    * Use a learning rate scheduler that decays the learning rate as training progresses, such as the cosine decay or exponential decay scheduler.
  * **Data Augmentation:**

    * Add slight data augmentations (like horizontal flipping, random cropping, or brightness adjustments) to introduce diversity and help the model generalize better.
