<div><img style="float: right; width: 120px; vertical-align:middle" src="https://www.upm.es/sfs/Rectorado/Gabinete%20del%20Rector/Logos/EU_Informatica/ETSI%20SIST_INFORM_COLOR.png" alt="ETSISI logo" />


# Compressing Data with Vanilla Autoencoders<a id="top"></a>

<i><small>Authors: Alberto Díaz Álvarez<br>Last update: 2023-04-24</small></i></div>

***

## Introduction

An autoencoder is a type of neural network that consists of two parts: an encoder and a decoder. The encoder takes in the input data and compresses it into a lower-dimensional representation, while the decoder takes this representation and reconstructs the original input data.


| <img src="https://upload.wikimedia.org/wikipedia/commons/3/37/Autoencoder_schema.png" alt="Autoencoder Schema" width="50%"> | 
|:--:| 
| *Autoencoder schema. Source: Autoencoder_schema.png, <https://commons.wikimedia.org/wiki/File:Autoencoder_schema.png> (last visited April 24).* |

The encoder and decoder are typically symmetric in structure, and the network is trained to minimize the reconstruction error between the original input and the output of the decoder.

## Goals

In this notebook, we will explore how to use a type of neural network called an autoencoder for compressing data. We will use the data provided in the `mnist` dataset.

## Libraries and configuration

Next we will import the libraries that will be used throughout the notebook.

In [None]:
import math
import random

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf

We will also configure some parameters to adapt the graphical presentation.

In [None]:
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams["axes.grid"] = False
plt.rcParams.update({'figure.figsize': (20, 6),'figure.dpi': 64})

***

## Dataset

As stated before, we will use the `mnist` dataset as source to be compressed. 

In [None]:
(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255, x_test / 255

print(f'Training shape: {x_train.shape} input')
print(f'Test shape:     {x_test.shape} input')

Now we can start working with _autoencoders_ to learn how to use them as a compression tool for small grayscale images.

## Vanilla autoencoder

In this example, we are going to create a vanilla autoencoder. The encoder part will be the _input-to-latent-space_ component, whereas the decoder part will be just the opposite, the _latent-space-to-output_ component.

Since we are working with $28 \times 28$ images, we will have to play with flattening and reconstructing the input before going in and out to and from those layers, but there is no mystery to that.

We will create the autoencoder as a `Model` subclass. So far we have created models indicating inputs and outputs defined outside their _scope_. This way we can have our models as classes that we can reuse more easily.

In [None]:
class Autoencoder(tf.keras.models.Model):
    """Represents a vanilla autoencoder."""

    def __init__(self, input_dim, latent_dim, name=None ,*args, **kwargs):
        super().__init__(*args, name=name, **kwargs)

        # We calculate sizes and shapes of inputs, outputs and latent spaces
        flatten_dim = None
        if isinstance(input_dim, (list, tuple)):
            flatten_dim = math.prod(input_dim)
        elif isinstance(input_dim, int):
            flatten_dim = input_dim
            input_dim = (input_dim,)
        else:
            raise ValueError('Argument input_dim must be a tuple or an int')
        
        # Encoder definition: Connection between input and latent space
        self.encoder = tf.keras.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(latent_dim, activation=tf.keras.layers.LeakyReLU()),
        ], name='Encoder')

        # Decoder definition: Connection between latent space and output
        self.decoder = tf.keras.Sequential([
            tf.keras.layers.Dense(flatten_dim, activation='sigmoid'),
            tf.keras.layers.Reshape(input_dim)
        ], name='Decoder')
        
        # Weights construction (just to have summary model's method working)
        self.build((None, *input_dim))

    def call(self, x):
        return self.decoder(self.encoder(x))

Now this model can be used for the creation of a vanilla autoencoder.

In [None]:
autoencoder = Autoencoder(input_dim=(28, 28), latent_dim=64, name='Vanilla')
autoencoder.summary()

Let's train our autoencoder with the `mnist` training set. Since it is a model, we have to compile it beforehand by specifying which loss function and which opticizer we are going to use. We will also be able to invoke model methods on it, such as `fit`.

In [None]:
autoencoder.compile(loss='binary_crossentropy', optimizer='adam')
history = autoencoder.fit(
    x_train,
    x_train,  # Watch out! we are using the inputs also as output
    epochs=50,
)

Let's see how training has evolved:

In [None]:
pd.DataFrame(history.history).plot()
plt.xlabel('Epoch num.')
plt.show()

The training looks pretty good. Let's see how some examples of the training set are encoded and decoded.

In [None]:
n = 4
images = np.array(random.sample(list(x_train), n))

encoded = autoencoder.encoder(images).numpy()
decoded = autoencoder.decoder(encoded).numpy()
for i in range(n):
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(images[i])
    plt.title('Original')

    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded[i])
    plt.title('Reconstructed')

And now, what will happen to data it has theoretically never seen?

In [None]:
images = np.array(random.sample(list(x_test), n))
encoded = autoencoder.encoder(images).numpy()
decoded = autoencoder.decoder(encoded).numpy()
for i in range(n):
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(images[i])
    plt.title('Original')

    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded[i])
    plt.title('Reconstructed')

Almost perfect, so we were able to compress the images from 28x28=784 bytes (they are monochrome) to 100 bytes, which is just over 78% compression ratio.

## Use case: Noise reduction

Even though it is not a Denoising Autoencoder (we will see them later), one of the utilities of autoencoders in general is that of denoising. Once it has learned the fundamental features from which to reconstruct the images, it is able to overcome the noise by bypassing it in the encoding.

Let's take again some images from the test set, because they are the ones it has never seen before, and let's add some noise to see how it is able to extract the original image with almost no noise.

In [None]:
images = np.array(random.sample(list(x_test), n))
noise_factor = 0.3
noisy_images = images + noise_factor * tf.random.normal(shape=images.shape)
noisy_images = tf.clip_by_value(noisy_images, clip_value_min=0, clip_value_max=1)

Now let's see how it regenerates images with noise.

In [None]:
simple_decoded = autoencoder.decoder(autoencoder.encoder(noisy_images)).numpy()

plt.figure(figsize=(9,9)) 
for i in range(n):
    ax = plt.subplot(4, n, i + 1)
    plt.imshow(images[i])
    plt.title('Original')

    ax = plt.subplot(4, n, i + 1 + n)
    plt.imshow(noisy_images[i])
    plt.title('Noisy')

    ax = plt.subplot(4, n, i + 1 + 2 * n)
    plt.imshow(simple_decoded[i])
    plt.title('Simple AE')

Quite well, considering that what we are trying to compress are images in a rough way, without taking advantage of their two-dimensional structure.

## Conclusions

Autoencoders are a very useful and versatile tool in the field of machine learning and artificial intelligence. They are capable of learning complex patterns in data, compressing them into a lower dimensional space and then reconstructing them with high accuracy. In addition, they are highly adaptable and can be tuned and trained for a wide variety of applications.

However, it is important to note that autoencoders have some limitations. One of the main challenges is the selection of the right size of the encoding space, as too small a space can result in a loss of information, while too large a space can be inefficient. In addition, in some cases, autoencoders can be susceptible to overfitting and must be carefully adjusted.

Overall, they are a valuable tool to store in our machine learning toolbox.

***

<div><img style="float: right; width: 120px; vertical-align:top" src="https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-nc-sa.png" alt="Creative Commons by-nc-sa logo" />

[Volver al inicio](#top)

</div>