# Autoencoders and Variational Autoencoders (VAEs)

## Introduction

Autoencoders are a class of unsupervised learning algorithms used for representation learning, dimensionality reduction, and feature extraction. They work by compressing the input data into a lower-dimensional latent space and then reconstructing the original data from this representation. Variational Autoencoders (VAEs) [[1]](#ref1) extend this concept by introducing probabilistic generative models, allowing for more efficient data generation and interpolation.

In this tutorial, we'll explore the architecture of autoencoders and VAEs, delve into their mathematical foundations, and implement them using TensorFlow and Keras. We'll also discuss some of the latest developments in this field.

## Table of Contents

1. [Understanding Autoencoders](#1)
   - [Architecture](#1.1)
   - [Mathematical Foundations](#1.2)
2. [Implementing an Autoencoder](#2)
   - [Dataset Preparation](#2.1)
   - [Building the Autoencoder](#2.2)
   - [Training the Autoencoder](#2.3)
   - [Visualizing the Results](#2.4)
3. [Variational Autoencoders (VAEs)](#3)
   - [VAE Architecture](#3.1)
   - [Mathematical Foundations](#3.2)
4. [Implementing a Variational Autoencoder](#4)
   - [Building the VAE](#4.1)
   - [Training the VAE](#4.2)
   - [Generating New Data](#4.3)
5. [Latest Developments](#5)
   - [Beta-VAE](#5.1)
   - [Conditional VAEs](#5.2)
   - [VQ-VAE (Vector Quantized VAE)](#5.3)
6. [Conclusion](#6)
7. [References](#7)


<a id="1"></a>
## 1. Understanding Autoencoders

Autoencoders consist of two main components:

- **Encoder**: Compresses the input data into a latent-space representation.
- **Decoder**: Reconstructs the input data from the latent representation.

<a id="1.1"></a>
### Architecture

![Autoencoder Architecture](https://miro.medium.com/max/1400/1*BRBEthVexgTzI6OdSuR3VQ.png)

*Image Source: [Medium](https://medium.com/)*

- **Input Layer**: Receives the original data.
- **Hidden Layers**: Maps the input to a latent representation.
- **Latent Space**: The compressed representation of the input.
- **Output Layer**: Attempts to reconstruct the input data.

<a id="1.2"></a>
### Mathematical Foundations

Given input data $( \mathbf{x} )$, the autoencoder aims to learn functions $( f )$ and $( g )$ such that:

$[
\mathbf{z} = f(\mathbf{x}) \quad \text{(Encoding)}
]$
$[
\hat{\mathbf{x}} = g(\mathbf{z}) \quad \text{(Decoding)}
]$

The objective is to minimize the reconstruction loss:

$[
\mathcal{L}(\mathbf{x}, \hat{\mathbf{x}}) = \| \mathbf{x} - \hat{\mathbf{x}} \|^2
]$

<a id="2"></a>
## 2. Implementing an Autoencoder

We'll implement a simple autoencoder using the MNIST dataset.

<a id="2.1"></a>
### Dataset Preparation

In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

# Load the MNIST dataset
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()

# Normalize and reshape the data
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

print('Training data shape:', x_train.shape)
print('Test data shape:', x_test.shape)

<a id="2.2"></a>
### Building the Autoencoder

We'll use a convolutional autoencoder for better performance on image data.

In [None]:
# Encoder
def build_encoder():
    input_img = layers.Input(shape=(28, 28, 1))
    x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
    x = layers.MaxPooling2D((2, 2), padding='same')(x)
    x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
    x = layers.MaxPooling2D((2, 2), padding='same')(x)
    return models.Model(input_img, x)

# Decoder
def build_decoder():
    encoded_input = layers.Input(shape=(7, 7, 16))
    x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(encoded_input)
    x = layers.UpSampling2D((2, 2))(x)
    x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
    x = layers.UpSampling2D((2, 2))(x)
    x = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
    return models.Model(encoded_input, x)

# Build the autoencoder
encoder = build_encoder()
decoder = build_decoder()
autoencoder_input = encoder.input
autoencoder_output = decoder(encoder.output)
autoencoder = models.Model(autoencoder_input, autoencoder_output)

# Compile the autoencoder
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Model summary
autoencoder.summary()

The encoder compresses the input images to a latent representation of shape (7, 7, 16). The decoder reconstructs the images from this representation.

<a id="2.3"></a>
### Training the Autoencoder

In [None]:
# Train the autoencoder
autoencoder.fit(x_train, x_train,
                epochs=10,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

We train the autoencoder to minimize the reconstruction loss between the input and the output.

<a id="2.4"></a>
### Visualizing the Results

In [None]:
# Encode and decode some images
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

# Display original and reconstructed images
n = 10  # Number of images to display
plt.figure(figsize=(20, 4))
for i in range(n):
    # Original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28), cmap='gray')
    plt.title('Original')
    plt.axis('off')

    # Reconstructed
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28), cmap='gray')
    plt.title('Reconstructed')
    plt.axis('off')
plt.show()

The reconstructed images should resemble the original inputs, demonstrating that the autoencoder has learned an effective compression of the data.

<a id="3"></a>
## 3. Variational Autoencoders (VAEs)

VAEs are generative models that learn a probabilistic mapping from a latent space to the data space. They are capable of generating new data samples.

<a id="3.1"></a>
### VAE Architecture

![VAE Architecture](https://miro.medium.com/max/1400/1*VZEHKY1RyqdW5bkj9IyHng.png)

*Image Source: [Medium](https://medium.com/)*

- **Encoder**: Maps input to a distribution over the latent space.
- **Sampling Layer**: Samples a point from the latent distribution.
- **Decoder**: Generates data from the sampled latent point.

<a id="3.2"></a>
### Mathematical Foundations

VAEs aim to maximize the evidence lower bound (ELBO) on the data likelihood:

$[
\mathcal{L}(\theta, \phi; \mathbf{x}) = -\text{KL}(q_{\phi}(\mathbf{z}|\mathbf{x}) \| p_{\theta}(\mathbf{z})) + \mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}[\log p_{\theta}(\mathbf{x}|\mathbf{z})]
]$

- **Encoder Distribution**: $( q_{\phi}(\mathbf{z}|\mathbf{x}) )$
- **Prior Distribution**: $( p_{\theta}(\mathbf{z}) )$
- **Decoder Distribution**: $( p_{\theta}(\mathbf{x}|\mathbf{z}) )$
- **KL Divergence**: Measures the difference between the encoder's distribution and the prior.

**Reparameterization Trick**: Allows backpropagation through stochastic nodes by expressing $( \mathbf{z} )$ as:

$[
\mathbf{z} = \mu + \sigma \odot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)
]$

<a id="4"></a>
## 4. Implementing a Variational Autoencoder

We'll implement a VAE using the MNIST dataset.

<a id="4.1"></a>
### Building the VAE

In [None]:
# VAE Parameters
latent_dim = 2  # Dimensionality of the latent space

# Encoder
def build_vae_encoder():
    encoder_inputs = layers.Input(shape=(28, 28, 1))
    x = layers.Conv2D(32, 3, activation='relu', strides=2, padding='same')(encoder_inputs)
    x = layers.Conv2D(64, 3, activation='relu', strides=2, padding='same')(x)
    x = layers.Flatten()(x)
    x = layers.Dense(16, activation='relu')(x)
    z_mean = layers.Dense(latent_dim, name='z_mean')(x)
    z_log_var = layers.Dense(latent_dim, name='z_log_var')(x)
    return models.Model(encoder_inputs, [z_mean, z_log_var], name='encoder')

# Sampling Layer
class Sampling(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

# Decoder
def build_vae_decoder():
    latent_inputs = layers.Input(shape=(latent_dim,))
    x = layers.Dense(7 * 7 * 64, activation='relu')(latent_inputs)
    x = layers.Reshape((7, 7, 64))(x)
    x = layers.Conv2DTranspose(64, 3, strides=2, padding='same', activation='relu')(x)
    x = layers.Conv2DTranspose(32, 3, strides=2, padding='same', activation='relu')(x)
    decoder_outputs = layers.Conv2DTranspose(1, 3, padding='same', activation='sigmoid')(x)
    return models.Model(latent_inputs, decoder_outputs, name='decoder')

# Build the VAE
encoder = build_vae_encoder()
z_mean, z_log_var = encoder.output
z = Sampling()([z_mean, z_log_var])
decoder = build_vae_decoder()
vae_outputs = decoder(z)
vae = models.Model(encoder.input, vae_outputs, name='vae')

# Define the VAE loss
reconstruction_loss = tf.keras.losses.binary_crossentropy(encoder.input, vae_outputs)
reconstruction_loss *= 28 * 28
kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
kl_loss = -0.5 * tf.reduce_sum(kl_loss, axis=-1)
vae_loss = tf.reduce_mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)

# Compile the VAE
vae.compile(optimizer='adam')

# Model summary
vae.summary()

The encoder maps the input to the parameters of a Gaussian distribution in the latent space. The decoder reconstructs the data from a sample drawn from this distribution.

<a id="4.2"></a>
### Training the VAE

In [None]:
# Train the VAE
vae.fit(x_train, epochs=10, batch_size=128, validation_data=(x_test, None))

The VAE is trained to minimize the reconstruction loss and the KL divergence between the learned latent distribution and the prior.

<a id="4.3"></a>
### Generating New Data

In [None]:
# Display a 2D manifold of the digits
import numpy as np

n = 15  # Figure with 15x15 digits
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))

# Linearly spaced coordinates corresponding to the 2D latent space
grid_x = np.linspace(-4, 4, n)
grid_y = np.linspace(-4, 4, n)[::-1]

for i, yi in enumerate(grid_y):
    for j, xi in enumerate(grid_x):
        z_sample = np.array([[xi, yi]])
        x_decoded = decoder.predict(z_sample)
        digit = x_decoded[0].reshape(digit_size, digit_size)
        figure[i * digit_size: (i + 1) * digit_size,
               j * digit_size: (j + 1) * digit_size] = digit

plt.figure(figsize=(10, 10))
plt.imshow(figure, cmap='Greys_r')
plt.axis('Off')
plt.show()

The resulting image shows a manifold of digits generated from the latent space, demonstrating the generative capabilities of the VAE.

<a id="5"></a>
## 5. Latest Developments

Several extensions and variants of autoencoders and VAEs have been proposed to improve their performance and applicability.

<a id="5.1"></a>
### 5.1 Beta-VAE

**Beta-VAE** [[2]](#ref2) introduces a weighting factor $( \beta )$ to the KL divergence term in the VAE loss:

$[
\mathcal{L}_{\beta\text{-VAE}} = \mathbb{E}_{q_{\phi}(\mathbf{z}|\mathbf{x})}[\log p_{\theta}(\mathbf{x}|\mathbf{z})] - \beta \text{KL}(q_{\phi}(\mathbf{z}|\mathbf{x}) \| p_{\theta}(\mathbf{z}))
]$

- Encourages disentangled latent representations.
- Higher $( \beta )$ values place more emphasis on the KL divergence.

<a id="5.2"></a>
### 5.2 Conditional VAEs

**Conditional VAEs (CVAEs)** [[3]](#ref3) incorporate conditional information (e.g., class labels) into the VAE framework.

- The encoder and decoder are conditioned on additional variables $( \mathbf{y} )$.
- Useful for tasks like controlled data generation and semi-supervised learning.

<a id="5.3"></a>
### 5.3 Vector Quantized VAE (VQ-VAE)

**VQ-VAE** [[4]](#ref4) introduces discrete latent variables using vector quantization.

- Combines VAEs with ideas from discrete representation learning.
- Enables modeling of complex data distributions.

<a id="6"></a>
## 6. Conclusion

Autoencoders and Variational Autoencoders are powerful tools for unsupervised learning, enabling feature extraction, dimensionality reduction, and generative modeling. By understanding their architectures and mathematical foundations, you can apply them to a wide range of tasks, from data compression to anomaly detection and synthetic data generation. Ongoing research continues to enhance their capabilities and extend their applications.

<a id="7"></a>
## 7. References

1. <a id="ref1"></a>Kingma, D. P., & Welling, M. (2013). *Auto-Encoding Variational Bayes*. [arXiv:1312.6114](https://arxiv.org/abs/1312.6114)
2. <a id="ref2"></a>Higgins, I., et al. (2017). *beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework*. [ICLR 2017](https://openreview.net/forum?id=Sy2fzU9gl)
3. <a id="ref3"></a>Sohn, K., Lee, H., & Yan, X. (2015). *Learning Structured Output Representation using Deep Conditional Generative Models*. [NeurIPS 2015](https://papers.nips.cc/paper/5775-learning-structured-output-representation-using-deep-conditional-generative-models)
4. <a id="ref4"></a>van den Oord, A., Vinyals, O., & Kavukcuoglu, K. (2017). *Neural Discrete Representation Learning*. [NeurIPS 2017](https://arxiv.org/abs/1711.00937)

---

This notebook provides an in-depth exploration of autoencoders and variational autoencoders. You can run the code cells to see how they are implemented and experiment with different architectures and datasets.