Written by [Samuel Adekunle](mailto:sja119@ic.ac.uk)

For [AI Core](http://www.theaicore.com)

# Introduction to Autoencoders

## Uses of Autoencoders

### Image/Audio Denoising

Autoencoders are very good at removing noise from images and generating a much clearer picture than the original. Later we will see how this can easily be implemented.

![image](img/denoising_example.png)

### Image Generation

An alternative to GANs are a variant of autoencoders known as [Variational Autoencoders](https://en.wikipedia.org/wiki/Autoencoder#Variational_autoencoder_(VAE)). There's a lot of complicated math involved but in summarhy, te input is an image, and the variational autoencoder learns it's distribution and can generate similar images.

![faces generated with a vae](img/faces.png)

*Faces generated with a Variational Autoencoder Model (source: [Wojciech Mormul on Github](https://github.com/WojciechMormul/vae))*

### Image Inpainting and Photo Restoration

![context encoders](img/inpainting.jpg)

*Faces generated with a Variational Autoencoder Model (source: [Context Encoders: Feature Learning by Inpainting](https://people.eecs.berkeley.edu/~pathak/context_encoder/))*

### Other Uses:
 - Anomaly Detection and Facial Recogniton
 - Feature Extraction and Data Compression
 - Language Translation


## Autoencoder Basic Architecture

An [Autoencoder](https://en.wikipedia.org/wiki/Autoencoder) is a neural network architecture that learns efficient data encodings in an unsupervised manner. What this means is autoencoders learn to recognise the most important features of the data they are fed, and reject the less important ones (i.e. noise). In doing so, they can reduce the dimensionality of the number of features needed to represent the same data. It does this in two steps:

 - Data Encoding: The input data is forced through a bottleneck and transfomed into a feature space, which is typically much smaller than the input space. The encoder is trained so that this feature space represents the most important features in the input space that are needed to reconstruct the data. Note: If the feature space is not smaller than the input space, then the encoder might just learn the identity function.
 
 - Data Decoding: After the input data has been reduced to some feature space, the autoencoder tries to reconstruct the original data from the reduced feature space. This is why an autoencoder is often said to undergo **unsupervised training**. The original input data is what is compared against the output of the network and used to train it. Typically in training the autoencoder, the network tries to minimize a reconstruction loss, such as the Mean Squared Error between the input and the output.

![image](img/transitions.png)

*Mathematical Definition of an Autoencoder (source: [Wikipedia](https://en.wikipedia.org/wiki/Autoencoder))*

# Feed-Forward Autoencoder

This basic architechture will take the input and try to reproduce it at the output.

![feed_foward_autoencoder](img/encoder_decoder.png)

*Basic Reconstruction Autoencoder Architecture (source: [Jeremy Jordan](https://www.jeremyjordan.me/autoencoders/))*

In [6]:
# %pip install -r requirements.txt
# Run only once to install requirements for notebook then restart kernel

In [None]:
# All requirements for this notebook
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import matplotlib.pyplot as plt
import numpy as np


SEED = 5000
torch.manual_seed(SEED)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# We will be using popular MNIST dataset
train_data = torchvision.datasets.MNIST(root='MNIST-data',
                                        transform=torchvision.transforms.ToTensor(),
                                        train=True,
                                        download=True
                                        )
test_data = torchvision.datasets.MNIST(root='MNIST-data',
                                       transform=torchvision.transforms.ToTensor(),
                                       train=False
                                       )

In [None]:
print(f"Shape of MNIST Training Dataset: {train_data.data.shape}")
print(f"Shape of MNIST Testing Dataset: {test_data.data.shape}")

In [None]:
def show_image_helper(image):
    image = image.view(28, 28)
    plt.imshow(image.cpu().detach())
    plt.show()
    print("Max Element: ", rdm_img.max())
    print("Min Element: ", rdm_img.min())
    
def show_losses_helper(losses):
    plt.plot(losses[1:])
    plt.ylabel("Losses")
    plt.xlabel("Epochs")
    plt.title("Autoencoder Losses")
    plt.show()

In [None]:
# What are we working with and what will we be doing
rdm_img = train_data.data[np.random.randint(
    0, 100)] / 255.0  # get a random example
show_image_helper(rdm_img)

In [None]:
# FURTHER SPLIT THE TRAINING INTO TRAINING AND VALIDATION
train_data, val_data = torch.utils.data.random_split(train_data, [
                                                     50000, 10000])

BATCH_SIZE = 128

# MAKE TRAINING DATALOADER
train_loader = torch.utils.data.DataLoader(  # create a data loader
    train_data,  # what dataset should it sample from?
    shuffle=True,  # should it shuffle the examples?
    batch_size=BATCH_SIZE  # how large should the batches that it samples be?
)

# MAKE VALIDATION DATALOADER
val_loader = torch.utils.data.DataLoader(
    val_data,
    shuffle=True,
    batch_size=BATCH_SIZE
)

# MAKE TEST DATALOADER
test_loader = torch.utils.data.DataLoader(
    test_data,
    shuffle=True,
    batch_size=BATCH_SIZE
)

In [None]:
class AutoEncoder(nn.Module):
    def __init__(self, input_size, hidden_size, code_size):
        super().__init__()

        self.encoder = nn.Sequential(
        # TODO: implement encoder
        )

        self.decoder = nn.Sequential(
        # TODO: implement decoder
        )

    def forward(self, x):
        return self.decoder(self.encoder(x))

In [None]:
def train(model, num_epochs=10, learning_rate=0.01):
    global EPOCHS
    model.train()
    losses = []
    
    
    ## add optimiser and criterion

    for _ in range(num_epochs):
        EPOCHS += 1
        total_loss = 0
        num_batches = 0
        for org_img, _ in train_loader:
            optimiser.zero_grad() # reset gradients

            ## transform the image to a suitable input for the model
            
            gen_img = model(org_img).double()

            loss = criterion(gen_img, org_img)
            total_loss += loss
            num_batches += 1

            loss.backward()  # backpropagate
            optimiser.step()

        # calculate average loss
        losses.append(average_loss)
        print(f"Epoch {EPOCHS}:\tScore: {1/average_loss}")
        
    return losses

In [None]:
EPOCHS = 0
INPUT_SIZE = 28*28
HIDDEN_SIZE = 128
CODE_SIZE = 32
LEARNING_RATE = 0.01

autoencoder = AutoEncoder(
    INPUT_SIZE, HIDDEN_SIZE, CODE_SIZE).double().to(device)

In [None]:
num_epochs = 25
losses = train(autoencoder, num_epochs, LEARNING_RATE)
show_losses_helper(losses)

In [None]:
def validate(model):

    model.eval()
    criterion = torch.nn.BCELoss()
#     criterion = torch.nn.MSELoss()
    total_loss = 0
    num_batches = 0
    for val_img, _ in val_loader:
        val_img = val_img.double().view(-1, 784).to(device) / 255.0
        gen_img = model(val_img).double()
        loss = criterion(gen_img, val_img)
        total_loss += loss
        num_batches += 1
    average_loss = total_loss / num_batches
    return 1/average_loss.item()

In [None]:
score = validate(autoencoder)
print("Score: ", score)

In [None]:
def test(model):
    model.eval()
    criterion = torch.nn.BCELoss()
#   criterion = torch.nn.MSELoss()
    total_loss = 0
    num_batches = 0
    stored_images = []
    for test_img, _ in test_loader:
        test_img = test_img.double().view(-1, 784).to(device) / 255.0
        gen_img = model(test_img)
        loss = criterion(gen_img.double(), test_img).item()
        total_loss += loss
        num_batches += 1
        if np.random.random() > 0.80:
            stored_images.append(
                (test_img[0].clone().detach(), gen_img[0].clone().detach()))

    score = average_loss = total_loss / num_batches
    print(f"Score: {1/score}\n")

    for original, generated in stored_images:
        print("Original: ")
        show_image_helper(original)
        print("Generated: ")
        show_image_helper(generated)

In [None]:
test(autoencoder)

## Comparing MSE to BCE

Generally, when dealing with Autoencoders or similar problems, we train using a loss like MSE which would compare the generated image and the original one, pixel by pixel in order to calculate the error. 

This is fine most of the time, but would not have been optimal in our case. Our images have values varying only between 0 and 1 and most of them are zero anyways, so this means the mean square error will always be very low, which will not allow our model to train effectively.

![mean_square_error_loss](img/mse_losses.png)

The alternative we used was the Binary Cross Entropy Error. Typically this is used for categorical problems, but in our case we are trying to distinguish between a high (1.0) and a low(0.0) so the cross entropy loss can still be used. Because our numbers are between 0 and 1 we use a binary cross entropy.

![binary_cross_entropy_loss](img/bce.png)

# Application - Denoising an Image

This adds some noise to the input before passing it in to the autoencoder network but uses the original image as the ground truth, effectively training the autoencoder network to reject the noise and learn the data encodings that represent the data beneath the noise. The only difference is in the training loop

![denoising_autoencoder_architecture](img/denoising.png)

*Denoising Autoencoder Architecture (source: [Jeremy Jordan](https://www.jeremyjordan.me/autoencoders/))*


In [None]:
def add_noise(clean_image, noise_factor=0.0):
    random_noise = torch.randn_like(clean_image)
    random_noise /= random_noise.max() # between -1 and 1
    noisy_image = clean_image + (noise_factor * random_noise)
    return noisy_image

In [None]:
def train_noise(model, num_epochs=10, learning_rate=0.01, noise_factor=0.0):
    global EPOCHS
    model.train()
    losses = []
    
    # add optimiser and criterion

    for _ in range(num_epochs):
        EPOCHS += 1
        total_loss = 0
        num_batches = 0
        for org_img, _ in train_loader:
            optimiser.zero_grad()
            
            org_img = #transform original image
            noisy_img = #add noise to image
            
            gen_img = model(noisy_img).double()

            loss = criterion(gen_img, org_img)
            total_loss += loss
            num_batches += 1

            loss.backward()  # backpropagate
            optimiser.step()

        average_loss = #calculate average losses
        losses.append(average_loss)
        print(f"Epoch {EPOCHS}:\tScore: {1/average_loss}")
    return losses

In [None]:
EPOCHS = 0
INPUT_SIZE = 28*28
HIDDEN_SIZE = 128
CODE_SIZE = 32
LEARNING_RATE = 0.01
NOISE_FACTOR = 0.001

denoise_autoencoder = AutoEncoder(
    INPUT_SIZE, HIDDEN_SIZE, CODE_SIZE).double().to(device)

In [None]:
num_epochs = 25
losses = train_noise(denoise_autoencoder, num_epochs, LEARNING_RATE, NOISE_FACTOR)
show_losses_helper(losses)

In [None]:
def validate_noise(model, noise_factor=NOISE_FACTOR):
    model.eval()
    criterion = torch.nn.BCELoss()
#     criterion = torch.nn.MSELoss()
    total_loss = 0
    num_batches = 0
    for val_img, _ in val_loader:
        val_img = val_img.double().view(-1, 784).to(device) / 255.0
        gen_img = model(add_noise(val_img, noise_factor)).double()

        loss = criterion(gen_img, val_img)
        total_loss += loss
        num_batches += 1
    average_loss = total_loss / num_batches
    return 1/average_loss.item()

In [None]:
score = validate_noise(denoise_autoencoder)
print("Score: ", score)

In [None]:
def test_noise(model, noise_factor=NOISE_FACTOR):
    model.eval()
    criterion = torch.nn.BCELoss()
#   criterion = torch.nn.MSELoss()
    total_loss = 0
    num_batches = 0
    stored_images = []
    for test_img, _ in test_loader:
        test_img = test_img.double().view(-1, 784).to(device) / 255.0
        noisy_img = add_noise(test_img, noise_factor)
        gen_img = model(noisy_img).double()
        
        loss = criterion(gen_img, test_img)
        total_loss += loss
        num_batches += 1
        if np.random.random() > 0.80:
            stored_images.append((test_img[0].clone().detach(
            ), noisy_img[0].clone().detach(), gen_img[0].clone().detach()))

    score = average_loss = total_loss / num_batches
    print(f"Score: {1/score}\n")

    for original, noisy, generated in stored_images:
        print("Original: ")
        show_image_helper(original)
        print("Noisy: ")
        show_image_helper(noisy)
        print("Generated: ")
        show_image_helper(generated)

In [None]:
test_noise(denoise_autoencoder)