<div><img style="float: right; width: 120px; vertical-align:middle" src="https://www.upm.es/sfs/Rectorado/Gabinete%20del%20Rector/Logos/EU_Informatica/ETSI%20SIST_INFORM_COLOR.png" alt="ETSISI logo" />


# Stacked autoencoders and the fashion MNIST reconstruction<a id="top"></a>

<i><small>Authors: Alberto Díaz Álvarez<br>Last update: 2023-04-26</small></i></div>

***

## Introduction

In the literature there are different components referred to as **stacked autoencoder**. In this notebook we will develop an example where the concept refers to those autoencoders whose encoder and decoder components are composed of several hidden layers before connecting to the latent space.

## Goals

We will be using the Fashion MNIST dataset, which contains images of various clothing items, to train our stacked autoencoder model. The goal is to train the model to reconstruct the images with as little loss of information as possible.

To make it more interesting, we will make use of the technique called "tying weights" to quicken the training of our model. Let's get started!

## Libraries and configuration

Next we will import the libraries that will be used throughout the notebook.

In [None]:
import itertools
import math
import random

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf

We will also configure some parameters to adapt the graphical presentation.

In [None]:
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams["axes.grid"] = False
plt.rcParams.update({'figure.figsize': (20, 6),'figure.dpi': 64})

***

## Getting the dataset

This time we will use the `fashion mnist` instead the basic `mnist`. This dataset is a collection of 70,000 grayscale images of clothing items, with 60,000 images used for training and 10,000 for testing. Each image is 28x28 pixels in size and belongs to one of 10 classes.

It is a more challenging replacement for the original MNIST dataset, designed to have similar properties as the MNIST dataset, such as being well-balanced and easily accessible, but with more complexity and variety in the images.

In [None]:
(x_train, _), (x_test, _) = tf.keras.datasets.fashion_mnist.load_data()
x_train, x_test = x_train / 255, x_test / 255

print(f'Training shape: {x_train.shape} input')
print(f'Test shape:     {x_test.shape} input')

## The tying weights technique

It's a technique used to reduce the number of parameters in the model. In a traditional autoencoder, there are two sets of weights: the encoder weights and the decoder weights. When tying weights, the **decoder weights are constrained to be equal to the transpose of the encoder weights**. This means that the weights used to reconstruct the original input are the same weights used to compress the input, which reduces the number of parameters in the model (thus, helping to prevent overfitting and improving generalization).

We are using `Dense` layers for our encoder and decoder components, but no `Dense` transposed layers exist in the Keras library. Therefore, we will create a custom layer that will reproduce this behavior in our models.

In [None]:
class TransposedDense(tf.keras.layers.Layer):
    def __init__(self, from_layer, activation=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.from_layer = from_layer
        activation = activation or self.from_layer.activation
        self.activation = tf.keras.activations.get(activation)

    def build(self, batch_input_shape):
        self.biases = self.add_weight(
            shape=self.from_layer.input_shape[-1],
            initializer='zeros',
        )
        return super().build(batch_input_shape)

    def call(self, x):
        z = tf.matmul(x, self.from_layer.weights[0], transpose_b=True)
        return self.activation(z + self.biases)

Bear in mind that it is not always necessary to tie the weights, and in some cases, it may not be beneficial. It depends on the specific architecture and problem at hand.

## Building our stacked autoencoder

Now that we understand how autoencoders work and the concepts of the tying weights technique, lest's move on to building a stacked autoencoder.

In [None]:
class StackedAutoencoder(tf.keras.models.Model):
    """Represents a syacked autoencoder."""
    
    def __init__(self, input_dim, layers, output_activation, name=None, *args, **kwargs):
        super().__init__(*args, name=name, **kwargs)

        # We calculate sizes and shapes of inputs, outputs and latent spaces
        flatten_dim = None
        if isinstance(input_dim, (list, tuple)):
            flatten_dim = math.prod(input_dim)
        elif isinstance(input_dim, int):
            flatten_dim = input_dim
            input_dim = (input_dim,)
        else:
            raise ValueError('Argument input_dim must be a tuple or an int')
        
        # Encoder definition: Connection between input and latent space
        encoder_layers = [
            tf.keras.layers.Dense(units, activation=tf.keras.layers.LeakyReLU(0.2))
            for units in layers
        ]
        self.encoder = tf.keras.Sequential([tf.keras.layers.Flatten()], name='Encoder')
        for dense in encoder_layers:
            self.encoder.add(dense)
        # Decoder definition: Connection between latent space and output
        self.decoder = tf.keras.Sequential([
            TransposedDense(dense) for dense in reversed(encoder_layers[1:])
        ], name='Decoder')
        self.decoder.add(TransposedDense(encoder_layers[0], output_activation))
        self.decoder.add(tf.keras.layers.Reshape(input_dim))
        
        # Weights construction (just to have summary model's method working)
        self.build((None, *input_dim))

    def call(self, x):
        return self.decoder(self.encoder(x))

As we can see, we have created a "mirrored" encoder as the decoder. However, the last layer uses a different activation function because it corresponds to the output of our network and, therefore, the values must be in the interval [0, 1].

In [None]:
sae = StackedAutoencoder(
    input_dim=(28, 28),
    layers=[64, 36,],
    output_activation='sigmoid',
    name='Stacked')
sae.encoder.summary()
sae.decoder.summary()

Well, let's now train our stacked autoencoder with the training set. We compile it Since it is a model, we have to compile it beforehand by specifying which loss function and which optimizer we are going to use. We will also be able to invoke model methods on it, such as `fit`.

In [None]:
sae.compile(loss='binary_crossentropy', optimizer='adam')
history = sae.fit(x_train, x_train, epochs=100)

Let's see how training has evolved:

In [None]:
pd.DataFrame(history.history).plot()
plt.xlabel('Epoch num.')
plt.show()

The training looks pretty good. Let's see how some examples of the training set are encoded and decoded.

In [None]:
n = 4
images = np.array(random.sample(list(x_train), n))

encoded = sae.encoder(images).numpy()
decoded = sae.decoder(encoded).numpy()
for i in range(n):
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(images[i])
    plt.title('Original')

    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded[i])
    plt.title('Reconstructed')

And now, what will happen to data it has theoretically never seen?

In [None]:
images = np.array(random.sample(list(x_test), n))
encoded = sae.encoder(images).numpy()
decoded = sae.decoder(encoded).numpy()
for i in range(n):
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(images[i])
    plt.title('Original')

    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded[i])
    plt.title('Reconstructed')

Almost perfect, so we were able to compress the images from 28x28=784 bytes (they are monochrome) to 100 bytes, which is just over 78% compression ratio.

## Conclusions

We have explored the concept of stacked autoencoder as in using multiple layers in an autoencoder, and applied it to the Fashion MNIST dataset to reconstruct images of clothing items.

We have also visualized the reconstruction performance of the stacked autoencoder on several examples from the test dataset and observed how the model learned to represent the images in a lower-dimensional space. Also, the tying weights between the encoder and decoder layers can reduce the number of parameters in the model and improve its performance. By using a two-layer stacked autoencoder with a bottleneck layer in between, we were able to achieve good reconstruction performance on the Fashion MNIST dataset.

Overall, we found that stacked autoencoders can be a powerful tool for dimensionality reduction and image reconstruction, and that they can be particularly useful for applications where data compression and feature extraction are important, such as in image recognition and computer vision tasks.

***

<div><img style="float: right; width: 120px; vertical-align:top" src="https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-nc-sa.png" alt="Creative Commons by-nc-sa logo" />

[Volver al inicio](#top)

</div>