# Denoising Autoencoders in PyTorch

The `16_denoising_autoencoders` notebook introduces the concept of denoising autoencoders, a variation of autoencoders designed to remove noise from input data. 

In this notebook, the focus is on preparing datasets with added noise, building the Encoder and Decoder models, and combining them to create a Denoising Autoencoder. It also covers training the model, visualizing the denoised outputs, and experimenting with different noise levels to observe how the model handles varying degrees of noise.

## Table of contents

1. [Understanding Denoising autoencoders](#understanding-denoising-autoencoders)
2. [Setting up the environment](#setting-up-the-environment)
3. [Preparing the dataset and adding noise](#preparing-the-dataset-and-adding-noise)
4. [Building the Encoder model](#building-the-encoder-model)
5. [Building the Decoder model](#building-the-decoder-model)
6. [Combining Encoder and Decoder into a Denoising Autoencoder](#combining-encoder-and-decoder-into-a-denoising-autoencoder)
7. [Training the Denoising Autoencoder](#training-the-denoising-autoencoder)
8. [Visualizing denoised outputs](#visualizing-denoised-outputs)
9. [Experimenting with noise levels](#experimenting-with-noise-levels)
10. [Conclusion](#conclusion)

## Understanding Denoising autoencoders

Denoising autoencoders (DAEs) are a variant of traditional autoencoders designed to reconstruct clean data from noisy inputs. Unlike standard autoencoders, where the input and output are the same, denoising autoencoders introduce noise to the input during training but aim to recover the original, noise-free data. This forces the model to learn more robust and meaningful features that are resistant to noise, making DAEs particularly useful in real-world applications where data is often imperfect or corrupted.

### **Why denoising autoencoders?**

Denoising autoencoders offer several advantages over traditional autoencoders, especially when dealing with noisy or incomplete data. By learning to reconstruct clean data from noisy input, DAEs are able to:
- **Improve robustness**: The network becomes more resilient to small perturbations or noise in the input data, learning features that generalize well to unseen data.
- **Learn meaningful features**: In the process of denoising, the autoencoder learns more meaningful and robust representations of the data, focusing on the underlying structure rather than memorizing exact details.
- **Enhance data quality**: DAEs can be used to clean or preprocess data in fields such as image processing, speech recognition, and sensor data analysis, where noise is a common issue.

### **Key components of denoising autoencoders**

Denoising autoencoders have the same core structure as traditional autoencoders, consisting of two main components: the encoder and the decoder. However, the key difference lies in the corruption of the input during training.

#### **Encoder**
The encoder in a denoising autoencoder takes the noisy input and compresses it into a lower-dimensional latent representation. The encoder's task is to learn a compressed version of the data that still retains enough information for the decoder to reconstruct the clean version of the input.

#### **Decoder**
The decoder receives the latent representation from the encoder and attempts to reconstruct the original clean data. Unlike traditional autoencoders, where the decoder tries to reproduce the input as it is, the decoder in a DAE must "denoise" the input and recover the original, noise-free version.

### **Noise corruption**

The core idea behind denoising autoencoders is the introduction of noise to the input during training. The noise corruption process can take various forms, such as:
- **Gaussian noise**: Randomly adding small amounts of noise to the input, altering pixel values or features in a subtle way.
- **Salt-and-pepper noise**: Randomly replacing some pixels or features with maximum or minimum values, simulating extreme corruption.
- **Dropout noise**: Randomly dropping or zeroing out some input features, similar to dropout regularization used in neural networks.

By corrupting the input data, the model is forced to learn representations that are invariant to noise. The model must effectively ignore the added noise and focus on the essential structure of the data to successfully reconstruct the original clean input.

### **Training process**

During training, the denoising autoencoder is presented with pairs of noisy inputs and clean targets. The noisy input is passed through the encoder and then reconstructed by the decoder. The network is trained to minimize the difference between the reconstructed clean output and the original clean input, despite the input being noisy.

This forces the model to learn how to filter out irrelevant noise and capture the true underlying features of the data. As a result, the network becomes more robust, both in terms of its learned representations and its ability to generalize to new, unseen data.

### **Advantages of denoising autoencoders**

Denoising autoencoders offer several key benefits:

- **Noise resilience**: DAEs are more resilient to noise and small perturbations in the input data. This is particularly useful in domains where data is prone to noise, such as image or audio processing.
- **Feature learning**: By learning to denoise, DAEs focus on capturing the underlying structure of the data, making them better at extracting meaningful and robust features.
- **Preprocessing tool**: DAEs can serve as an effective tool for cleaning data, improving the quality of input for downstream tasks such as classification, segmentation, or clustering.
- **Improved generalization**: Since DAEs learn to filter out noise and focus on key features, they tend to generalize better to new data, especially when the new data is slightly noisy or corrupted.

### **Applications of denoising autoencoders**

Denoising autoencoders have a wide range of applications in fields where noisy data is common. Some of the key applications include:

- **Image denoising**: DAEs are widely used in image processing tasks to remove noise from images, enhancing their quality for tasks like object recognition or segmentation.
- **Speech enhancement**: In speech recognition, DAEs can be used to filter out background noise from audio recordings, improving the clarity and accuracy of speech-to-text systems.
- **Data preprocessing**: DAEs can be employed as a preprocessing step to clean noisy datasets before feeding them into more complex models for tasks like classification or regression.
- **Medical image analysis**: In medical imaging, where data quality can be affected by artifacts or sensor noise, DAEs can be used to denoise images for more accurate diagnoses and analysis.

### **Limitations of denoising autoencoders**

While denoising autoencoders are powerful, they have some limitations:
- **Training complexity**: DAEs can require careful tuning of the noise level during training. Too much noise can make the task too difficult, while too little noise may not encourage the model to learn robust features.
- **Data dependency**: DAEs are trained on specific types of noise, so they may not generalize well to different types of noise that they haven't encountered during training.
- **Overfitting**: If the network is too complex or the training data is limited, DAEs can overfit, particularly if the noise pattern in the training data is not representative of real-world noise.

### **Maths**

#### **Encoder and decoder mappings**

A denoising autoencoder, like a standard autoencoder, consists of an encoder and a decoder. The encoder takes a corrupted version of the input $ \tilde{x} $ and maps it to a latent space representation $ z $, while the decoder reconstructs the original clean input $ x $ from the latent representation. The encoding and decoding functions can be represented as:

- **Encoder**: The encoder maps the noisy input $ \tilde{x} $ to the latent representation $ z $:

  $$
  z = f(\tilde{x}) = \sigma(W_{\text{enc}} \tilde{x} + b_{\text{enc}})
  $$

  Where $ W_{\text{enc}} $ is the weight matrix, $ b_{\text{enc}} $ is the bias term, and $ \sigma $ is a non-linear activation function (such as ReLU or sigmoid).

- **Decoder**: The decoder takes the latent representation $ z $ and attempts to reconstruct the clean input $ \hat{x} $:

  $$
  \hat{x} = g(z) = \sigma(W_{\text{dec}} z + b_{\text{dec}})
  $$

  Here, $ W_{\text{dec}} $ and $ b_{\text{dec}} $ are the weight matrix and bias of the decoder, and $ \hat{x} $ is the reconstruction of the original clean input $ x $.

#### **Corruption process**

In a denoising autoencoder, the input data is corrupted before being passed to the encoder. The corruption process adds noise to the input $ x $, producing $ \tilde{x} $, the noisy version of the original input. The corruption process can take various forms:
- **Gaussian noise**: Adding random Gaussian noise to each feature of the input.
- **Salt-and-pepper noise**: Replacing random pixels or features with maximum or minimum values.
- **Dropout noise**: Randomly setting some features to zero.

The corrupted input $ \tilde{x} $ is used as the input to the encoder, while the clean input $ x $ serves as the target output during training.

#### **Loss function and reconstruction error**

The primary objective of a denoising autoencoder is to minimize the reconstruction error between the clean input $ x $ and the reconstructed output $ \hat{x} $. A common loss function used for this is **mean squared error (MSE)**, which calculates the average squared difference between the clean input and its reconstruction:

$$
L(x, \hat{x}) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{x}_i)^2
$$

Where:
- $ x_i $ is the original clean input,
- $ \hat{x}_i $ is the corresponding reconstructed output,
- $ n $ is the number of features.

For binary input data, the **binary cross-entropy loss** can also be used, especially in tasks involving image data with pixel values normalized between 0 and 1:

$$
L(x, \hat{x}) = -\sum_{i=1}^{n} \left[ x_i \log(\hat{x}_i) + (1 - x_i) \log(1 - \hat{x}_i) \right]
$$

The goal of the network is to minimize this loss during training, encouraging the model to learn how to map noisy inputs $ \tilde{x} $ to clean outputs $ \hat{x} $.

#### **Gradient updates and backpropagation**

The denoising autoencoder is trained using backpropagation to minimize the reconstruction error. The gradients of the loss function $ L(x, \hat{x}) $ with respect to the encoder and decoder parameters are computed, and the weights are updated accordingly using gradient descent:

$$
W_{\text{new}} = W_{\text{old}} - \eta \frac{\partial L}{\partial W}
$$

Where:
- $ W_{\text{old}} $ is the current weight matrix (either for the encoder or the decoder),
- $ \eta $ is the learning rate,
- $ \frac{\partial L}{\partial W} $ is the gradient of the loss function with respect to the weight matrix $ W $.

Both the encoder and decoder weights are updated during training to minimize the reconstruction loss and improve the model's ability to recover clean data from noisy inputs.

#### **Regularization in denoising autoencoders**

Denoising autoencoders inherently apply a form of regularization by adding noise to the input during training. This noise corruption forces the network to learn robust features that are invariant to minor perturbations in the data. However, additional regularization techniques, such as **L2 regularization** or **dropout**, can be applied to further enhance the model’s generalization abilities.

- **L2 regularization** (also known as weight decay) adds a penalty proportional to the sum of the squares of the weights to the loss function:

  $$
  L_{\text{regularized}} = L(x, \hat{x}) + \lambda \sum W^2
  $$

  Where $ \lambda $ is a regularization parameter that controls the strength of the weight penalty.

- **Dropout** involves randomly deactivating neurons during training, which prevents the network from becoming overly reliant on any particular neuron and encourages the learning of more distributed representations.

#### **Latent space and feature learning**

The encoder compresses the noisy input $ \tilde{x} $ into a lower-dimensional latent representation $ z $. This latent space is designed to capture the most important features of the input, even in the presence of noise. The decoder then attempts to reconstruct the original input $ x $ from this latent representation.

The size of the latent space, or the number of neurons in the bottleneck layer, controls the degree of compression. A smaller latent space forces the model to focus on the most important features, while a larger latent space allows the network to capture more detail but may risk overfitting.

#### **Applications of reconstruction error**

In denoising autoencoders, the reconstruction error $ L(x, \hat{x}) $ can be used as a measure of how well the model is performing. A low reconstruction error indicates that the model is effectively denoising the input, while a high reconstruction error may indicate that the input data contains patterns or features that the model hasn’t learned to recover accurately.

Reconstruction error can also be applied in anomaly detection tasks. When the model is trained on normal data, it learns to reconstruct the typical patterns of that data. When presented with anomalous data, the model struggles to reconstruct it, leading to higher reconstruction error, which signals an anomaly.

## Setting up the environment


##### **Q1: How do you install the necessary libraries for building and training a denoising autoencoder in PyTorch?**


##### **Q2: How do you import the required modules for data loading, model building, and noise addition in PyTorch?**


##### **Q3: How do you set up the environment to use a GPU for training the denoising autoencoder, and how do you fallback to CPU if necessary?**


##### **Q4: How do you set random seeds in PyTorch to ensure reproducibility in denoising autoencoder training?**

## Preparing the dataset and adding noise


##### **Q5: How do you load a dataset like MNIST or CIFAR-10 using `torchvision.datasets` in PyTorch?**


##### **Q6: How do you apply transformations such as normalization to the dataset to prepare it for training?**


##### **Q7: How do you add Gaussian noise to the dataset, and how do you ensure that it doesn't exceed a certain noise level?**


##### **Q8: How do you create DataLoaders in PyTorch to handle both the noisy input data and the clean target data?**

## Building the Encoder model


##### **Q9: How do you define the architecture of the Encoder model using PyTorch’s `nn.Module`?**


##### **Q10: How do you implement the forward pass of the Encoder to map noisy inputs into a latent representation?**


##### **Q11: How do you experiment with different numbers of hidden layers in the Encoder and observe the effects on model performance?**

## Building the Decoder model


##### **Q12: How do you define the architecture of the Decoder model using PyTorch’s `nn.Module`?**


##### **Q13: How do you implement the forward pass of the Decoder to map the latent representation back to a denoised version of the input?**


##### **Q14: How do you apply the appropriate activation function in the Decoder to ensure the output data is in the correct range?**

## Combining Encoder and Decoder into a Denoising Autoencoder


##### **Q15: How do you combine the Encoder and Decoder models into a single denoising autoencoder architecture in PyTorch?**


##### **Q16: How do you implement the forward pass of the denoising autoencoder to take noisy input and produce a clean, denoised output?**


##### **Q17: How do you verify that the input and output dimensions match to ensure the denoising autoencoder reconstructs the data correctly?**

## Training the Denoising Autoencoder


##### **Q18: How do you define the loss function (e.g., Mean Squared Error) to measure the difference between the clean and denoised outputs in PyTorch?**


##### **Q19: How do you configure an optimizer (e.g., Adam) to update the denoising autoencoder's parameters during training?**


##### **Q20: How do you implement a training loop that performs forward pass, loss calculation, and backpropagation for the denoising autoencoder?**


##### **Q21: How do you monitor and log the training loss over epochs to ensure the model is learning to denoise effectively?**

## Visualizing denoised outputs


##### **Q22: How do you visualize the noisy input, clean target, and denoised output side by side to observe the model’s performance?**


##### **Q23: How do you save and display the denoised images from the validation set after each training epoch?**

## Experimenting with noise levels


##### **Q24: How do you experiment with different levels of Gaussian noise and observe how the autoencoder's performance changes?**


##### **Q25: How do you modify the noise type (e.g., from Gaussian to salt-and-pepper noise) and evaluate how well the model performs on different noise types?**


##### **Q26: How do you measure and compare the denoising autoencoder’s performance when trained on light versus heavy noise?**

## Conclusion