# **AutoEncoders**

## Pre-requisites

To start this lession you must be aware of the following concepts:
* Latent Variable Models
* Basics of Deep Learning:
    * Supervised learning
    * Neural Network and Activation Functions
    * Gradients and Optimizations

## Learning Objectives
By the end of this lesson, the students will be able to:
- Recall the components visualize the latent space with a real-life example.
- Understand properties and applications of autoencoders.
- Differentiate the types of autoencoders based on their structure and functionality.



## Autoencoder and its Components

**Autoencoders** are unsupervised neural network architectures designed to learn efficient representations of input data by training the network to reconstruct the input from a compressed form. They work by first encoding the input into a lower-dimensional latent representation, and then decoding it back to a form that closely resembles the original input. Since the input and output are the same, autoencoders do not require labeled data and are trained in an unsupervised manner.


<center>
<figure>

<img src="https://i.postimg.cc/HxRW0n7x/Autoencoder.png" height="480" width="620"></p>
<figcaption align="center">Figure 1: Autoencoder with single hidden layer in encoder and decoder</figcaption>
</figure>
</center>

1. **Encoder**: Encoders are a fully connected layer that transforms or compresses the given input ($\mathbf{x}$) to compressed representation or latent-space representation ($\mathbf{h}$). It can be represented by an encoding function.
$$ \phi: \mathbf{x} \rightarrow \mathbf{h} $$
The above function represents the encoding of input to latent space representation situated in the bottleneck.


2. **Latent Space Representation (Compressed Representation)**:
   This is the **bottleneck layer** of the autoencoder, where input data is compressed into a **lower-dimensional vector**, often called the **code** or **latent vector**. It captures the most important features of the input and allows the model to group similar inputs close together in the latent space.

    <center>
    <img src="https://i.postimg.cc/kXHbCBmt/Latent-Space.png" width=60%>  
    <figcaption>Figure 2: 2D visualization of latent space showing digit clusters</figcaption>

    </center>

    The **latent space** is a learned feature space where high-dimensional data is represented compactly, revealing hidden patterns and relationships. The scatter plot above illustrates how the autoencoder organizes data in latent space:

    * Each **color** represents a different digit class.
    * Similar digits are **clustered closely**, reflecting shared features.
    * Dissimilar digits are **clearly separated**, indicating distinct latent representations.
    * For instance, digits **0** and **1** appear far apart, while digits **0** and **6** are closer, mirroring their visual similarity.


3. **Decoder**: Decoders are a fully connected layer that reconstructs the data from latent space representation to be as close to the original input.  It can be represented by a decoding function $$ \theta: \mathbf{h} \rightarrow \mathbf{x'} $$ Where,
 $\mathbf{x'}$ is the reconstructed output of the network. Each subsequent layer in the decoder usually has a greater no. of nodes than the previous one.


## Properties of Autoencoders

1. Autoencoders are **unsupervised** in nature as they train on unlabeled data. They are also called self-supervised network.
2. They are **data-specific**, meaning that they are efficient at compressing/reconstructing the data they have been trained on.
3. They are **lossy**, so the output is a degraded version of input and the exact reconstruction of the original input data cannot be obtained.


## Types of Autoencoder

### Based on Structure
> Based on dimension of latent representation($h$) with respect to input dimension($x$)

- ## **Undercomplete Autoencoder**

    An autoencoder with smaller dimensions of compressed representation(code) than input $h$ < $x$.

    <center>
    <figure>

    <p><img src="https://i.postimg.cc/sxWPZSdm/Undercomplete-Autoencoder.png" height="200" width="350"></p>
    <figcaption align="center">Figure 2: Undercomplete Autoencoder Block Diagram </figcaption>
    </figure>
    </center>

 This model minimizes reconstruction error (e.g., mean squared error) to learn a compact representation, rather than directly copying input to the output.
 The bottleneck(code) acts as implicit regularization, but explicit regularization (e.g., weight decay, noise in denoising autoencoders) is often used to enhance generalization.

 **Applications:** Dimensionality Reduction, Anomaly Detection, and Feature Extraction.

- ## **Overcomplete Autoencoder**

  An autoencoder with higher dimension of compressed representation(code) than input $h$ > $x$.

    <center>
    <figure>

    <p><img src="https://i.postimg.cc/Vv6Dh5tP/Overcomplete-Autoencoder.png" height="220" width="350"></p>
    <figcaption align="center">Figure 3: Overcomplete Autoencoder Block Diagram </figcaption>
    </figure>
    </center>

  Without regularization, it may copy the input to the output, learning trivial features. However, with regularization (e.g., sparsity constraints, dropout), it can learn rich, meaningful representations. Regularization is essential to prevent trivial solutions.
  
  **Applications:** Sparse Coding and Feature Learning for complex data.





### Based on Functionality

> Based on types of neurons in the hidden layer (i.e., latent structure), etc.

- ## **Sparse Autoencoders**

    These encoders generally have more hidden units than inputs, but only a small number of units are activated to learn the features from the data.

    <center>
    <figure>

    <p><img src="https://i.postimg.cc/L6422gGK/Sparse-Autoencoder.png" height="375" width="550"></p>
    <figcaption align='center'>Figure 4: Sparse autoencoder</figcaption>
    </figure>
    </center>

    The sparsity constraint introduced in the hidden layer is to prevent overfitting. This forces model to prevent the output layer copy the input layer.


- ## **Convolutional autoencoders**

    Instead of using a simple dimensionality technique, these encoders use convolution layers to extract essential features from the input and similar structure for reconstructing it.
  
    <center>
    <figure>

    <p><img src="https://i.postimg.cc/bNTyq4yT/Convolutional-Autoencoder.png" height="350" width="700"></p>
    <figcaption align='center'>Figure 5: Convolutional autoencoder</figcaption>
    </figure>
    </center>

  The main benefits of having convolution layers are:
    - Due to convolution nature, realistic-sized high dimensional images can be well scaled
    - Can reconstruct the missing part as well as remove noise from the images



- ## **Variational autoencoders**
    Variational autoencoders are the generative models, unlike the classical models (sparse, denoising, etc.) autoencoders. VAE introduces probabilistic spin on autoencoders to let them generate new data by sampling.

    <center>
    <figure>

    <p><img src="https://i.postimg.cc/mkSyXTQJ/Variational-Autoencoder.png" height="200" width="450"></p>
    <figcaption align='center'>Figure 6: Variational autoencoder</figcaption>
    </figure>
    </center>

  Hence, VAE can act as a generative model like **Generative Adversarial Network (GAN)**, which gives significant control over the modeling of our latent distribution, unlike other models.


For the detailed explanation about these encoders, you can refer to this [link](https://iq.opengenus.org/types-of-autoencoder/).



## Applications of Autoencoders

1. Autoencoders are primarily used for **dimensionality reduction** or **feature extraction** task.
2. They can be used to **generate images** closer to the original input image.
3. They can **remove noise** from the images and even compress the images.
4. They can remove watermarks from the images, which is called as **neural inpainting**.



## Key Takeaways

* Autoencoders are unsupervised neural network architectures which is trained to replicate its input to output

* The components of Autoencoders are Encoder, Compressed Representation and Decoder.
* Latent Space is the vector space of compressed representation of higher dimension input vectors.
* They help in extracting useful relationships from the input data that we may be unaware of.
* Autoencoders are data-specific, unsupervised and lossy in nature.
* Undercomplete autoencoder has a code size dimension lesser than the input size and exactly opposite for the overcomplete autoencoder.
* In sparse autoencoder, neurons are randomly drop-off in different hidden layers.
* In convolutional autoencoder, different convolutional layers are included in the encoder and decoder section.
* Variational autoencoders are the generative models where sampling distribution occurs in between encoder and decoder.


## References

- Papers
  - Pierre B. (2012), [Autoencoders, Unsupervised Learning, and Deep Architectures](http://proceedings.mlr.press/v27/baldi12a/baldi12a.pdf)
    - Refer this paper to understand linear and non-linear autoencoders mathematically.

- Books
  - Ian Goodfellow, [Deep Learning (Adaptive Computation and Machine Learning series)](https://www.amazon.com/Deep-Learning-Adaptive-Computation-Machine/dp/0262035618/ref=sr_1_1?ie=UTF8&qid=1472485235&sr=8-1&keywords=deep+learning+book)
    - Refer [this chapter](https://www.deeplearningbook.org/contents/autoencoders.html) to read more about autoencoders mathematically.

- University Lectures
  - Jean-Pierre B., [Deep Learning Techniques for Music Generation Autoencoder](http://www-desir.lip6.fr/~briot/cours/unirio2/Slides/dlmg-4-autoencoder.pdf)
    - Check this slide to understand how autoencoders can implement in real-world problems.


