# Autoencoders

The `15_autoencoders` notebook explores the architecture and training of autoencoders, which are neural networks designed to learn compressed representations of data in an unsupervised manner. This notebook covers preparing the dataset, building the Encoder and Decoder models, and combining them into an Autoencoder. 

It also focuses on training the model, visualizing reconstructed outputs, exploring the learned latent space, and experimenting with the size of the latent dimension to understand its impact on reconstruction quality.

## Table of contents

1. [Understanding Autoencoders](#understanding-autoencoders)
2. [Setting up the environment](#setting-up-the-environment)
3. [Preparing the dataset](#preparing-the-dataset)
4. [Building the Encoder model](#building-the-encoder-model)
5. [Building the Decoder model](#building-the-decoder-model)
6. [Combining Encoder and Decoder into an Autoencoder](#combining-encoder-and-decoder-into-an-autoencoder)
7. [Training the Autoencoder](#training-the-autoencoder)
8. [Visualizing reconstructed outputs](#visualizing-reconstructed-outputs)
9. [Exploring the learned latent space](#exploring-the-learned-latent-space)
10. [Experimenting with latent dimension size](#experimenting-with-latent-dimension-size)

## Understanding Autoencoders

Autoencoders are a type of neural network used for unsupervised learning, primarily for tasks such as dimensionality reduction, data compression, and feature extraction. The core idea is to learn a compressed representation of the input data that can be used to reconstruct the original input as accurately as possible. They are composed of two main components: the **encoder** and the **decoder**.

The encoder compresses the input data into a lower-dimensional representation, known as the **latent space** or **bottleneck**. The decoder then attempts to reconstruct the original data from this compressed representation. The goal is for the output to be as close as possible to the original input, allowing the network to learn meaningful features.

### **Why autoencoders?**

Autoencoders are valuable due to their ability to learn compact, efficient representations of data. This makes them useful for several applications, such as:
- **Dimensionality reduction**: Autoencoders reduce the number of features while preserving the most important information. This is similar to PCA (Principal Component Analysis) but with the flexibility of learning non-linear representations.
- **Denoising**: They can be trained to remove noise from data by learning to map noisy inputs to clean outputs.
- **Anomaly detection**: Autoencoders can identify unusual data by learning the normal patterns in a dataset. When an input doesn't fit the learned patterns, the reconstruction error will be high, signaling a potential anomaly.

### **Key components of autoencoders**

Autoencoders consist of two parts:

#### **Encoder**
The encoder compresses the input data into a lower-dimensional latent space. This is typically done through a series of neural network layers that progressively reduce the dimensionality of the input. The encoder learns to represent the data in a more compact form while retaining important information. 

#### **Decoder**
The decoder reconstructs the input data from the compressed latent representation. It uses the information encoded in the latent space to generate an approximation of the original input. The decoder is typically a mirror image of the encoder, with layers that progressively increase the dimensionality back to the original input size.

The primary objective of an autoencoder is to minimize the difference between the input and the reconstructed output, forcing the network to learn a meaningful and efficient representation.

### **Training process**

Autoencoders are trained in an unsupervised manner. The model learns by comparing the original input to its reconstructed output and minimizing the reconstruction error. Both the encoder and decoder are updated during training to improve their ability to compress and reconstruct the data. Over time, the encoder becomes better at capturing essential features, while the decoder improves in reconstructing the data from these features.

### **Types of autoencoders**

Several variants of autoencoders have been developed to handle different tasks and challenges:

#### **Undercomplete autoencoder**
In an undercomplete autoencoder, the latent space has fewer dimensions than the input data. This forces the model to learn compressed, informative representations, making it useful for dimensionality reduction.

#### **Overcomplete autoencoder**
Overcomplete autoencoders have a latent space larger than the input space. These autoencoders risk learning trivial solutions, such as copying the input to the output. To prevent this, regularization techniques are used to ensure the model learns meaningful representations rather than simply reproducing the input.

#### **Denoising autoencoder**
Denoising autoencoders are designed to remove noise from data. The input is intentionally corrupted with noise, and the autoencoder is tasked with reconstructing the clean version of the data. This helps the model learn robust features that capture the essential structure of the data.

#### **Sparse autoencoder**
Sparse autoencoders introduce a sparsity constraint on the latent representation, encouraging the model to activate only a few neurons at a time. This promotes the learning of more interpretable and meaningful representations. Sparse autoencoders are particularly useful for feature extraction.

#### **Variational autoencoder (VAE)**
Variational autoencoders are a probabilistic extension of autoencoders. Instead of mapping the input to a fixed vector, the encoder learns a distribution, allowing the model to generate new data by sampling from this distribution. VAEs are widely used for generative tasks, as they can create new data that resembles the input data.

### **Bottleneck and latent space**

The bottleneck, or latent space, is the compressed representation learned by the encoder. The size of the bottleneck determines how much information can be captured and compressed. A smaller bottleneck forces the model to focus on the most important features, while a larger bottleneck allows more information to be retained, but risks overfitting or learning trivial mappings.

The latent space can be used for various downstream tasks, such as data visualization, clustering, or generating new samples in the case of VAEs.

### **Applications of autoencoders**

Autoencoders are widely used in a variety of applications, including:

- **Dimensionality reduction**: By compressing data into a smaller latent space, autoencoders can reduce the number of features while preserving essential information. This is helpful for simplifying high-dimensional datasets.
- **Denoising**: Denoising autoencoders are used to clean data, such as images or signals, by learning to remove noise while preserving the underlying structure.
- **Anomaly detection**: Autoencoders can be trained on normal data to detect anomalies. Data that deviates from normal patterns will result in high reconstruction errors, signaling that it may be anomalous.
- **Generative modeling**: Variational autoencoders (VAEs) can be used to generate new data by sampling from the latent space, making them useful for tasks like image or text generation.

### **Maths**

#### **Encoder and decoder mappings**

Autoencoders consist of two key components: the encoder and the decoder. The encoder compresses the input into a lower-dimensional latent space, and the decoder reconstructs the original input from this compressed representation. Formally, these two functions can be described as:

- **Encoder**: The encoder maps the input $ x $ into a latent representation $ z $, where $ f(x) $ represents the encoding function. Mathematically, the encoder performs:

  $$
  z = f(x) = \sigma(W_{\text{enc}} x + b_{\text{enc}})
  $$

  Here:
  - $ W_{\text{enc}} $ and $ b_{\text{enc}} $ are the weight matrix and bias of the encoder, respectively.
  - $ \sigma $ is an activation function, typically a non-linear function like ReLU or sigmoid.

- **Decoder**: The decoder takes the latent representation $ z $ and maps it back to a reconstruction $ \hat{x} $, where $ g(z) $ represents the decoding function. Mathematically, the decoder performs:

  $$
  \hat{x} = g(z) = \sigma(W_{\text{dec}} z + b_{\text{dec}})
  $$

  Here:
  - $ W_{\text{dec}} $ and $ b_{\text{dec}} $ are the weight matrix and bias of the decoder.

#### **Loss function and reconstruction error**

The main objective of an autoencoder is to minimize the reconstruction error, which is the difference between the original input $ x $ and its reconstruction $ \hat{x} $. A common loss function for this purpose is the **mean squared error (MSE)**, which measures the average squared difference between the input and the reconstructed output:

$$
L(x, \hat{x}) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{x}_i)^2
$$

where:
- $ x_i $ and $ \hat{x}_i $ are the individual elements of the input and reconstructed vectors, respectively.
- $ n $ is the number of elements in the input.

For tasks involving binary data, the **binary cross-entropy loss** is often used:

$$
L(x, \hat{x}) = -\sum_{i=1}^{n} \left[ x_i \log(\hat{x}_i) + (1 - x_i) \log(1 - \hat{x}_i) \right]
$$

The goal is to minimize this loss function during training, such that the reconstructed output becomes as close to the input as possible.

#### **Training via backpropagation**

Autoencoders are trained using backpropagation, where the goal is to minimize the reconstruction error. Both the encoder and decoder have their parameters (weights and biases) updated using gradient descent. The gradients of the loss function are computed with respect to these parameters, and the weights are updated as follows:

$$
W_{\text{new}} = W_{\text{old}} - \eta \frac{\partial L}{\partial W}
$$

where:
- $ \eta $ is the learning rate.
- $ \frac{\partial L}{\partial W} $ is the gradient of the loss function with respect to the weight matrix $ W $.

#### **Dimensionality reduction and bottleneck**

In an undercomplete autoencoder, the dimensionality of the latent space $ z $ is smaller than the input space. This bottleneck forces the network to compress the input data and capture the most important features. The encoder learns a lower-dimensional representation that retains the most relevant information, while the decoder learns to reconstruct the input from this compressed representation.

The key challenge is that the autoencoder must learn to extract meaningful features while preserving enough information for accurate reconstruction, which is where the size of the bottleneck plays a critical role.

#### **Regularization techniques**

Autoencoders often incorporate regularization techniques to prevent overfitting and encourage learning of meaningful features.

- **Sparse autoencoders** introduce a sparsity constraint on the activations in the latent space. This is typically done by adding a regularization term to the loss function, penalizing the network if too many neurons in the latent space are active at once. A common regularization technique is the **L1 penalty**, which encourages the network to keep most activations close to zero:

  $$
  L_{\text{regularized}} = L(x, \hat{x}) + \lambda \sum_{i=1}^{k} |z_i|
  $$

  where $ z_i $ represents the activations in the latent space and $ \lambda $ is a regularization parameter that controls the strength of the sparsity constraint.

- **Denoising autoencoders** add noise to the input during training but require the network to reconstruct the clean, original input. The corruption process can be Gaussian noise, salt-and-pepper noise, or dropout. The denoising process forces the autoencoder to learn robust features that are less sensitive to small perturbations in the input.

#### **Variational autoencoders (VAE)**

Variational autoencoders (VAEs) introduce a probabilistic element to autoencoders by learning a distribution over the latent space rather than a fixed vector. The encoder maps the input to a distribution, typically a Gaussian, characterized by a mean $ \mu $ and a variance $ \sigma^2 $. During training, the model learns these parameters and samples from the latent distribution to reconstruct the input.

The VAE introduces two loss components:
1. **Reconstruction loss**, which measures how well the decoder reconstructs the input from the sampled latent vector.
2. **KL-divergence**, a regularization term that measures the difference between the learned latent distribution and a prior distribution (usually a standard Gaussian):

$$
L_{\text{VAE}} = L_{\text{reconstruction}} + \text{KL-divergence}
$$

The KL-divergence term ensures that the latent space follows a smooth, continuous distribution, allowing the model to generate new data by sampling from the latent space.

#### **Gradient flow in VAEs**

In a VAE, the backpropagation process involves computing gradients with respect to both the reconstruction loss and the KL-divergence term. The gradients of the reconstruction loss update the weights of the decoder, while the KL-divergence gradients affect the encoder by shaping the learned latent distribution. The VAE leverages the **reparameterization trick** to ensure that gradients can flow through the stochastic sampling step, allowing the encoder to be trained via backpropagation.

## Setting up the environment


##### **Q1: How do you install the necessary libraries for building and training autoencoders in PyTorch?**


##### **Q2: How do you import the required modules for model building, training, and data handling in PyTorch?**


##### **Q3: How do you set up your environment to use GPU, and how do you fallback to CPU in PyTorch?**


##### **Q4: How do you set a random seed in PyTorch to ensure reproducibility during autoencoder training?**

## Preparing the dataset


##### **Q5: How do you load a dataset like MNIST using PyTorch's `torchvision.datasets`?**


##### **Q6: How do you apply transformations such as normalization to prepare the dataset for training the autoencoder?**


##### **Q7: How do you create a DataLoader in PyTorch to load batches of data for training the autoencoder?**


##### **Q8: How do you split the dataset into training and validation sets using PyTorch?**

## Building the Encoder model


##### **Q9: How do you define the architecture of the Encoder model using PyTorch’s `nn.Module`?**


##### **Q10: How do you implement the forward pass of the Encoder to map input data into a latent representation?**


##### **Q11: How do you specify the latent dimension size when building the Encoder, and what does it represent?**

## Building the Decoder model


##### **Q12: How do you define the architecture of the Decoder model using PyTorch’s `nn.Module`?**


##### **Q13: How do you implement the forward pass of the Decoder to reconstruct the original data from the latent representation?**


##### **Q14: How do you apply an activation function in the Decoder to ensure the output values are within the same range as the input data?**

## Combining Encoder and Decoder into an Autoencoder


##### **Q15: How do you combine the Encoder and Decoder models into a single autoencoder architecture?**


##### **Q16: How do you implement the forward pass of the full autoencoder by chaining the Encoder and Decoder together?**


##### **Q17: How do you verify the dimensions of the input and output to ensure the autoencoder is reconstructing the data correctly?**

## Training the Autoencoder


##### **Q18: How do you define the loss function (e.g., Mean Squared Error) to measure the reconstruction error in PyTorch?**


##### **Q19: How do you configure an optimizer (e.g., Adam) to update the model parameters during training?**


##### **Q20: How do you implement a training loop that performs forward pass, loss calculation, and backpropagation for the autoencoder?**


##### **Q21: How do you monitor and log the training loss over epochs to ensure the autoencoder is learning correctly?**

## Visualizing reconstructed outputs


##### **Q22: How do you visualize the original input images alongside the reconstructed outputs generated by the autoencoder?**


##### **Q23: How do you save and display the reconstructed images from the validation set after each training epoch?**

## Exploring the learned latent space


##### **Q24: How do you extract the latent representations of input data from the Encoder?**


##### **Q25: How do you visualize the latent space using techniques such as t-SNE or PCA to explore the structure of the encoded data?**

## Experimenting with latent dimension size


##### **Q26: How do you modify the latent dimension size and observe its impact on the quality of the reconstructed images?**


##### **Q27: How do you evaluate how different latent dimensions affect the autoencoder’s ability to capture the most important features of the data?**

## Conclusion