# MLP autoencoders

An autoencoder is a network which consists of two main parts:

- an _encoder_, which projects the data into a latent space to transform it into a compact representation.
- a _decoder_, which reconstructs the input data from the latent representation.

In mathematical terms, a data point $\mathbf{x} \in \mathbb{R}^N$ is transformed into a latent representation $\mathbf{z} \in \mathbb{R}^L$, where $L \ll N$. Then, the latent representation is passed to the decoder, which produces an approximation $\hat{\mathbf{x}} \in \mathbb{R}^N$ of the input data, i.e., such that $\hat{\mathbf{x}} \approx \mathbf{x}$.

Autoencoders are very useful in many applications. For instance, in image processing, they are used for image denoising, compression, and generative models (image synthesis and transformation). They can also be used for transfer learning: first an autoencoder is trained to learn a latent representation of the data, and then this representation can be used for other classification/regression tasks.

<center><a href="https://emkademy.medium.com/1-first-step-to-generative-deep-learning-with-autoencoders-22bd41e56d18">
    <img src="https://miro.medium.com/max/772/1*ztZn098tDQsnD5J6v1eNuQ.png" width="600"></a></center>


In [1]:
import torch
import torch.nn as nn
import torchvision
import torchvision.datasets as datasets
from torch.utils.data import Dataset, DataLoader, random_split, Subset
import numpy as np
import matplotlib.pyplot as plt
import copy

## Dataset


In [None]:
# Define the data repository
data_dir = "data/"

# Load the MNIST dataset
data_transforms = torchvision.transforms.Compose(
    [
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize((0.1307,), (0.3081,)),
    ]
)
train_data = torchvision.datasets.MNIST(
    data_dir, train=True, download=True, transform=data_transforms
)
test_data = torchvision.datasets.MNIST(
    data_dir, train=False, download=True, transform=data_transforms
)
num_classes = len(train_data.classes)

# Take a subset of the train/test data
train_data = Subset(train_data, torch.arange(400))

# We define the train and validation sets and dataloaders as in the previous script
n_train_examples = int(len(train_data) * 0.8)
n_valid_examples = len(train_data) - n_train_examples
train_data, valid_data = random_split(train_data, [n_train_examples, n_valid_examples])

# Create the dataloaders
batch_size = 8
train_dataloader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_dataloader = DataLoader(test_data, batch_size=batch_size)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST\raw\train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting data/MNIST\raw\train-images-idx3-ubyte.gz to data/MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST\raw\train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting data/MNIST\raw\train-labels-idx1-ubyte.gz to data/MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST\raw\t10k-images-idx3-ubyte.gz



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting data/MNIST\raw\t10k-images-idx3-ubyte.gz to data/MNIST\raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST\raw\t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting data/MNIST\raw\t10k-labels-idx1-ubyte.gz to data/MNIST\raw
Processing...
Done!


In [3]:
# Get a batch of images from the train dataloader
batch_example = next(iter(train_dataloader))
image_batch_example = batch_example[0]

## The encoder

First, let us write the encoder. We consider a 3-layer encoder, where each layer consists of a Linear part and a ReLU activation function. The first layer goes from size `input_size` to 128, the second layer from 128 to 64, and the third layer from 64 to 32.


In [None]:
# TO DO: write the encoder class ('__init__' and 'forward' methods)


class MLPencoder(nn.Module):
    def __init__(self, input_size):
        super(MLPencoder, self).__init__()

    def forward(self, x):
        return z

In [None]:
# TO DO:
# - instanciate an encoder (get the proper input size)
# - vectorize image_batch_example into image_batch_example_vec
# - apply the encoder to image_batch_example_vec to produce the latent representation 'z'
# - print the size of z, and the size of the image_batch_example_vec


You can see that the latent representation has a significantly smaller dimension than the original data (vectorized image).

## The decoder

The decoder as a similar structure than the encoder (3 {Linear + activation} layers) but the sizes are flipped: the first layer goes from 32 to 64, the second layer goes from 64 to 128, and the last layer goest from 128 back to the input size. The first and second layers use a ReLU activation, but the last layer uses a Sigmoid: this forces the output to be in the range $[0,1]$, which corresponds to the normalized images.


In [None]:
# TO DO: write the decoder class ('__init__' and 'forward' methods)


class MLPdecoder(nn.Module):
    def __init__(self, input_size):
        super(MLPdecoder, self).__init__()

    def forward(self, z):
        return y

In [None]:
# TO DO:
# - instanciate a decoder
# - apply it to the latent representation z computed before
# - print the size of the output 'y' of the decoder : it should be the same as the input data 'image_batch_example_vec'


## The autoencoder main module

Finally we can write the autoencoder module: it consists of the encoder and the decoder applied sequentially.


In [None]:
# TO DO: write the MLP autoencoder class using the previously written encoder and decoder classes
class MLPAutoencoder(nn.Module):
    def __init__(self, input_size):
        super(MLPAutoencoder, self).__init__()

    def forward(self, x):
        return y

In [None]:
# TO DO: Instanciate an MLP autoencoder and print the number of parameters.


In [None]:
# initialization (ensure reproducibility: everybody should have the same results)
def init_weights(m):
    if isinstance(m, nn.Linear):
        torch.nn.init.xavier_uniform_(m.weight.data)
        if m.bias is not None:
            m.bias.data.fill_(0.01)
    return


torch.manual_seed(0)
model.apply(init_weights)

<span style="color:red">**Q3**</span> How many parameters are in the autoencoder?

## Training

Now we can write the training function (with validation !). It's very similar to the training function for the MLP classifier from the previous script, up to two main differences:

- since we don't try to predict a label, we don't need to load them when iterating over the dataloader.
- the loss function is no longer Cross Entropy (which is for classification), but [MSE](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html?highlight=mse#torch.nn.MSELoss) (_cf._ lab 2).

Regarding the loss, remember that the autoencoder tries to compress the input data (through the encoder) and then produce an approximation of this input data (through the decoder). This means that the input and output of the decoder should have the same dimension, and the loss is computed between these two quantities: $\mathcal{L}(\hat{\mathbf{x}}, \mathbf{x})$.

Finally, unlike in lab 4.1, since here we evaluate the model on the validation set by computing the loss (and not the accuracy), be careful that it should be decreasing (instead of increasing).


In [None]:
# TO DO: write the autoencoder training function with validation (also write the evaluation function)


In [None]:
# TO DO:
# - Train the autoencoder: 50 epochs, learning rate = 0.001, and MSE loss function
# - After training, save the model parameters and plot the loss over epochs


## Test and visualization


In [None]:
# Apply the autoencoder to the image_batch_example
predicted_batch_example = model_tr(image_batch_example_vec).detach()

# Reshape it as a black-and-white image (3D tensor)
predicted_batch_example = predicted_batch_example.reshape(batch_size, 1, 28, 28)

# Plot the original and predicted images
for ib in range(batch_size):
    plt.figure()
    plt.subplot(1, 2, 1)
    plt.imshow(image_batch_example[ib, :].squeeze(), cmap="gray_r")
    plt.xticks([]), plt.yticks([])
    plt.title("Original image")
    plt.subplot(1, 2, 2)
    plt.imshow(predicted_batch_example[ib, :].squeeze(), cmap="gray_r")
    plt.xticks([]), plt.yticks([])
    plt.title("Predicted image")
    plt.show()

<span style="color:red">**Q4**</span> In your report, put the plot with the training/validation losses, and one of the plots above (one original image and its corresponding estimation).
