# Autoencoders

Autoencoders have a similar backbone (called encoder in this context) that produces a feature vector (called embedding in this context). However, they substitute the fully-connected layers (the head) with a decoder stage whose scope is to reconstruct the input image starting from the embeddings.

Uses of autoencoders:
- Compress data
- Denoise data
- Find outliers (do anomaly detection) in a dataset
- Do inpainting (i.e., reconstruct missing areas of an image or a vector)
- With some modifications, we can use autoencoders as generative models - models capable of generating new images

Autoencoders are a form of **unsupervised learning**, since we learn without having a label.

The loss function for autoencoders is the MSE between each pixel between the input and output image.

## Linear autoencoders

The encoder and decoder are made of simple Multi-Layer Perceptrons. The units that connect the encoder and decoder will be the _compressed representation_ (also called _embedding_).

Since the images are normalized between 0 and 1, you will need to use a **sigmoid activation on the output layer** to get values that match this input value range.

Example of very simple autoencoder with two linear layers:
```python
class Autoencoder(nn.Module):

    def __init__(self, encoding_dim):
        super(Autoencoder, self).__init__()
        ## encoder ##
        self.encoder = nn.Sequential(
            nn.Linear(28*28, encoding_dim),
            nn.ReLU(),
            nn.BatchNorm1d(encoding_dim)
        )

        ## decoder ##
        self.decoder = nn.Sequential(
            nn.Linear(encoding_dim, 28*28),
            nn.Sigmoid()
        )

        self.auto_encoder = nn.Sequential(
            nn.Flatten(),
            self.encoder,
            self.decoder
        )

    def forward(self, x):
        # define feedforward behavior 
        # and scale the *output* layer with a sigmoid activation function

        encoded = self.auto_encoder(x)

        # Reshape the output as an image
        # remember that the shape should be (batch_size, channel_count, height, width)
        return encoded.reshape((x.shape[0], 1, 28, 28))
```

The main parts of autoencoders:
- the encoder = takes the input image and encodes into a 1d vector (the embedding),
- the decoder = takes the embedding and generates an image from it

## Anomaly detection

An anomaly is a data element that is an outlier with respect to the rest of the dataset.

How autoencoders work for anomaly detection:

*Autoencoders compress the visual information contained in images into a compact, latent representation (the embedding) that has a much lower dimensionality than the input image. By asking the decoder to reconstruct the input from this compact representation, we force the network to learn an embedding that stores meaningful information about the content of the image. For example, in the solution I compressed 28 x 28 images (so 784 pixels) into a vector of only 32 elements, but I was still able to reconstruct most of the images very well.*

*When applying it to a test set that the network has never seen, most images were reconstructed well, but some of them were not. This means that the compression that the network has learned on the training dataset works well for the vast majority of the examples in this new set, but not for these anomalous ones. These anomalies have characteristics that the network is not well equipped to reconstruct, and therefore the decoder cannot recreate them faithfully during decoding.*

*Through scoring each example by the loss, we are able to identify anomalies by simply taking the examples with the highest loss.*


## Upsampling
= Resizing an image to increase its size

Instead of using linear layers to decode the image, we could also upsample a compact representation (embedding) into a full resolution image. For example, we could use a Transposed Convolutional Layer, which can learn how to best upsample an image.

Upsampling techniques:
- Transposed convolution: A layer that intelligently upsample an image, by using a learnable convolutional kernel
- Nearest Neighbors upsampling: An upsampling technique that copies the value from the nearest pixel

### Transposed Convolutional Layer

Transposed Convolution can perform an upsampling of the input with learned weights. In particular, a Transposed Convolution with a 2 x 2 filter and a stride of 2 will double the size of the input image.

Whereas a Max Pooling operation with a 2 x 2 window and a stride of 2 reduces the input size by half, a Transposed Convolution with a 2 x 2 filter and a stride of 2 will double the input size.

Example code in PyTorch:
```python
unpool = nn.ConvTranspose2d(input_ch, output_ch, kernel_size, stride=2)
```

## Convolutional Autoencoder

Example code in PyTorch:
```python
class Autoencoder(nn.Module):
    def __init__(self, encoding_dim):
        super(Autoencoder, self).__init__()

        ## encoder ##
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 3, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )

        ## decoder ##
        self.decoder = nn.Sequential(
            # Undo the Max Pooling
            nn.ConvTranspose2d(3, 1, 2, stride=2),
            nn.Sigmoid()
        )

        self.auto_encoder = nn.Sequential(
            self.encoder,
            self.decoder
        )

    def forward(self, x):
        # define feedforward behavior 
        # and scale the *output* layer with a sigmoid activation function

        return self.auto_encoder(x)
```

Note that:
- we don't have to flatten the image (like with the linear autoencoder) since we're using a convolutional layer (we keep spatial information)
- the `ConvTranspose2d` basically undo's the `MaxPool2d` transformation

## Denoising

= the task of removing noise from an image by reconstructing a denoised image.

A denoising autoencoder is a normal autoencoder, but trained in a specific way.

Operations of training a denoising autoencoder:
- loop over each batch in the training data loader
- add noise to the images in the batch
- compute the prediction from the network, i.e. the reconstructed images
- compare the reconstructed images with the input (uncorrupted) images using a loss like `nn.MSELoss()`
- perform backpropagation


## Glossary

Autoencoder: A neural network architecture consisting of an encoder part, which takes an input and compresses it into a low-dimensional embedding vector, and a decoder part, which takes the embedding and tries to reconstruct the input image from it.

Transposed Convolution: A special type of convolution that can be used to intelligently upsample an image or a feature map

Denoising: The task of taking an image corrupted by noise and generating a version of the image where the noise has been removed.

Variational autoencoder (VAE): An extension of the idea of autoencoders that transforms them into proper generative models.