<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/autoencoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Autoencoders

Autoencoders are a type of neural network architecture designed to learn efficient data encodings in an unsupervised manner. They compress (encode) input data into a lower-dimensional latent representation and then reconstruct (decode) the original input from this representation.

## Architecture Overview

An autoencoder consists of two main components:

1. **Encoder**: Compresses the input into a latent-space representation
2. **Decoder**: Reconstructs the input from the latent-space representation

![Autoencoder Architecture](https://miro.medium.com/v2/resize:fit:1400/1*44eDEuZBEsmG_TCAKRI3Kw@2x.png)

The central idea is to force the network to learn a compressed representation of the data by creating a "bottleneck" in the architecture. This forces the network to retain only the most essential features needed for reconstruction.

## Mathematical Formulation

Given input data $x$, an autoencoder aims to learn:

1. An encoder function $f(x)$ that maps the input to a hidden representation $h = f(x)$
2. A decoder function $g(h)$ that reconstructs the input $\hat{x} = g(h) = g(f(x))$

The network is trained to minimize a reconstruction error or loss function:

$$L(x, \hat{x}) = L(x, g(f(x)))$$

Common loss functions include Mean Squared Error (MSE) for continuous data and Binary Cross-Entropy for binary data.

## Types of Autoencoders

### 1. Vanilla Autoencoder
The simplest form using fully connected layers with a smaller hidden layer to force compression.

### 2. Undercomplete Autoencoder
Uses a hidden layer smaller than the input to learn compressed representations.

### 3. Sparse Autoencoder
Adds a sparsity penalty to the loss function, encouraging the network to activate only a small number of neurons.

### 4. Denoising Autoencoder (DAE)
Trained to reconstruct clean data from corrupted inputs, improving robustness and generalization.

### 5. Contractive Autoencoder (CAE)
Adds a regularization term to make the learned representations more robust to small variations in input.

### 6. Convolutional Autoencoder
Uses convolutional layers instead of fully connected layers, making them suitable for image data.

### 7. Variational Autoencoder (VAE)
A probabilistic variant that learns a probability distribution of the latent space, enabling generative capabilities.

## Implementation Example: Basic Autoencoder

Here's a simple implementation of an autoencoder using TensorFlow/Keras:

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# Input dimension
input_dim = 784  # For MNIST dataset (28x28 images flattened)

# Encoder
input_layer = Input(shape=(input_dim,))
encoded = Dense(128, activation='relu')(input_layer)
encoded = Dense(64, activation='relu')(encoded)
latent = Dense(32, activation='relu')(encoded)  # Bottleneck layer

# Decoder
decoded = Dense(64, activation='relu')(latent)
decoded = Dense(128, activation='relu')(decoded)
output_layer = Dense(input_dim, activation='sigmoid')(decoded)

# Autoencoder model (full)
autoencoder = Model(input_layer, output_layer)

# Separate encoder model
encoder = Model(input_layer, latent)

# Compile the model
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Model summary
autoencoder.summary()

## Implementation Example: Convolutional Autoencoder

In [None]:
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D

# Input
input_img = Input(shape=(28, 28, 1))  # MNIST images: 28x28x1

# Encoder
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)  # 7x7x16

# Decoder
x = Conv2D(16, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

# Autoencoder model
conv_autoencoder = Model(input_img, decoded)
conv_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Model summary
conv_autoencoder.summary()

## Training Example

In [None]:
from tensorflow.keras.datasets import mnist
import numpy as np
import matplotlib.pyplot as plt

# Load and preprocess data
(x_train, _), (x_test, _) = mnist.load_data()

# Normalize and reshape data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape for the dense autoencoder
x_train_flatten = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test_flatten = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

# Reshape for the convolutional autoencoder
x_train_conv = x_train.reshape((len(x_train), 28, 28, 1))
x_test_conv = x_test.reshape((len(x_test), 28, 28, 1))

# Train the dense autoencoder
history = autoencoder.fit(
    x_train_flatten, x_train_flatten,
    epochs=20,
    batch_size=256,
    shuffle=True,
    validation_data=(x_test_flatten, x_test_flatten)
)

# Visualize training progress
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()

## Applications of Autoencoders

1. **Dimensionality Reduction**: Alternative to PCA, especially for non-linear relationships
2. **Image Denoising**: Removing noise from images (with denoising autoencoders)
3. **Feature Extraction**: Learning meaningful representations for downstream tasks
4. **Anomaly Detection**: Identifying data points that differ from the norm
5. **Image Generation**: Particularly with VAEs and other generative variants
6. **Data Compression**: Creating compact representations of data
7. **Recommender Systems**: Learning latent representations of user preferences
8. **Drug Discovery**: Learning molecular fingerprints for drug candidates

## Visualizing Autoencoder Results

In [None]:
# Encode and decode some test images
decoded_imgs = autoencoder.predict(x_test_flatten)

# Plot original and reconstructed images
n = 10  # Number of images to display
plt.figure(figsize=(20, 4))
for i in range(n):
    # Original images
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28), cmap='gray')
    plt.title("Original")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    # Reconstructed images
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28), cmap='gray')
    plt.title("Reconstructed")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

## Visualizing Latent Space

If we reduce the latent space to 2 dimensions, we can visualize the encoded data:

In [None]:
# Create a simpler autoencoder with 2D latent space for visualization
input_layer = Input(shape=(784,))
encoded = Dense(128, activation='relu')(input_layer)
encoded = Dense(64, activation='relu')(encoded)
latent = Dense(2, activation='linear')(encoded)  # 2D bottleneck for visualization

decoded = Dense(64, activation='relu')(latent)
decoded = Dense(128, activation='relu')(decoded)
output_layer = Dense(784, activation='sigmoid')(decoded)

vis_autoencoder = Model(input_layer, output_layer)
vis_encoder = Model(input_layer, latent)
vis_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Train
vis_autoencoder.fit(x_train_flatten, x_train_flatten, epochs=15, batch_size=256, validation_data=(x_test_flatten, x_test_flatten))

# Encode the data to 2D latent space
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train_flatten = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))

# Encode the data
encoded_data = vis_encoder.predict(x_train_flatten)

# Plot the 2D latent space with colors corresponding to digits
plt.figure(figsize=(10, 8))
plt.scatter(encoded_data[:, 0], encoded_data[:, 1], c=y_train, cmap='viridis')
plt.colorbar()
plt.title('2D Latent Space of MNIST Digits')
plt.xlabel('Latent Dimension 1')
plt.ylabel('Latent Dimension 2')
plt.show()

## Advantages and Limitations

### Advantages
- Unsupervised learning capability
- Effective dimensionality reduction for complex data
- Can capture non-linear relationships
- Versatile architecture that can be adapted to different data types

### Limitations
- Simple autoencoders may learn to simply copy the input
- Can be challenging to interpret the learned features
- May require careful tuning of the architecture
- Standard autoencoders are not true generative models (unlike VAEs)

## Historical Context

The concept of autoencoders has been around since the 1980s, with early work by Hinton and the PDP group. They gained renewed interest with the deep learning revolution and have evolved from simple architectures to sophisticated variants like VAEs and denoising autoencoders.

Key milestones include:
- 1986: First descriptions of autoencoders for dimensionality reduction
- 2006: Deep belief networks and layerwise pretraining
- 2008: Denoising autoencoders introduced by Vincent et al.
- 2013: Variational autoencoders introduced by Kingma and Welling

## References

- Hinton, G. E., & Zemel, R. S. (1994). [Autoencoders, Minimum Description Length, and Helmholtz Free Energy](https://proceedings.neurips.cc/paper/1993/file/9e3cfc48eccf81a0d57663e129aef3cb-Paper.pdf). NIPS.
- Vincent, P., et al. (2008). [Extracting and Composing Robust Features with Denoising Autoencoders](https://www.cs.toronto.edu/~larocheh/publications/icml-2008-denoising-autoencoders.pdf). ICML.
- Kingma, D. P., & Welling, M. (2013). [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114). ICLR.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). [Deep Learning](https://www.deeplearningbook.org/). MIT Press. Chapter 14.
- Bank, D., Koenigstein, N., & Giryes, R. (2020). [Autoencoders](https://arxiv.org/abs/2003.05991). arXiv preprint.