# Autoencoder

## Preparations
### Load libraries

In [None]:
import numpy as np 
import pandas as pd

import matplotlib.pyplot as plt 
import seaborn as sns
%matplotlib inline

from sklearn.model_selection import train_test_split

In [None]:
import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Reshape, Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.utils import to_categorical, plot_model
from tensorflow.keras.callbacks import EarlyStopping, TensorBoard
from tensorflow.keras import regularizers

In [None]:
import pickle

In [None]:
tf.random.set_seed(123)
np.random.seed(123)

### Prepare data

In [None]:
# Load data:
mnist = tf.keras.datasets.mnist
(train_val_images, train_val_labels), (test_images, test_labels) = mnist.load_data()

# Scale image data:
train_val_images = train_val_images / 255.0
test_images = test_images / 255.0

# Split into training / validation
train_images, val_images, train_labels, val_labels = train_test_split(train_val_images, train_val_labels,
                                                                      test_size=0.20, random_state=42)

In [None]:
train_val_label_df = pd.DataFrame(train_val_labels)
train_val_label_df.columns = ['label']
train_val_label_df['label'].value_counts()

In [None]:
val_label_df = pd.DataFrame(val_labels)
val_label_df.columns = ['label']
val_label_df['label'].value_counts(sort=False, ascending=True)

In [None]:
train_from_scratch = True

## The Autoencoder
Every autoencoder consists of two parts: an encoder and a decoder.

* The **encoder** receives the original data (in our case, the black and white images) as input and generates a lower-dimensional code from it.
* The **decoder** receives the code and decodes it into original data (e.g. the images) in the same format as the encoder's inputs.

### Model definition

The two parts put together form the autoencoder:

**Comments on activation functions:**

***Why ReLU?***

* Simplicity and Efficiency: ReLU (Rectified Linear Unit) is computationally efficient because it involves simple thresholding at zero. This makes it faster to compute compared to other activation functions.
* Sparse Activation: ReLU promotes sparsity in the network by setting negative values to zero, which can help in learning more robust features.
* Gradient Propagation: ReLU helps mitigate the vanishing gradient problem, allowing gradients to propagate more effectively during backpropagation.

***Why Not SELU?***

SELU (Scaled Exponential Linear Unit) is another excellent activation function, especially for self-normalizing neural networks. However, SELU requires careful initialization and specific network architecture (e.g., no Batch Normalization) to maintain its self-normalizing properties. For a straightforward convolutional autoencoder, ReLU is often preferred due to its simplicity and effectiveness.

***Sigmoid*** at the end to get an output between 0 and 1.

In [None]:
mnist_encoder = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (28, 28, 1)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu')
])

mnist_decoder = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (100,)),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(28*28, activation = 'sigmoid'),
    tf.keras.layers.Reshape([28, 28, 1])
])

mnist_ae = tf.keras.Sequential([mnist_encoder, mnist_decoder])

### Training
The autoencoder uses unsupervised learning, i.e. we do not pass any predefined labels or similar. Instead, the output should be as similar as possible to the input. We use the squared error as a measure of the quality of the reconstruction.

For a start, we allow a maximum of 10 epochs to train - as with the other convolutional networks (and with deep neural networks in general), you should allow significantly more epochs for real applications.

The training takes a little longer here... With the following code you can save and reload the learned weights - just set `train_from_scratch` as needed.
Please note that only the weights, not the models themselves, are saved and loaded again. The model definition is made in the code and managed as such.

Loading the weights only works if the model definition is exactly the same.

In [None]:
# compile and fit the model
tf.random.set_seed(42) 
mnist_ae.compile(loss="mse", optimizer="nadam")

In [None]:
nEpochs = 50
nPatience = 5

# define paths:
encoder_weights_path = './mnist_encoder.weights.h5'
decoder_weights_path = './mnist_decoder.weights.h5'
history_path = './mnist_ae.history.h5'


train_from_scratch = True
if train_from_scratch:
    history_ae = mnist_ae.fit(train_images, train_images, epochs=nEpochs, validation_data=(val_images, val_images),
                       callbacks = [ EarlyStopping(monitor='val_loss', patience=nPatience, 
                                                   verbose=False, restore_best_weights=True)])
    
    # Save the weights:
    mnist_encoder.save_weights(encoder_weights_path)
    mnist_decoder.save_weights(decoder_weights_path)

    # Save training history:
    with open(history_path, 'wb') as f:
        pickle.dump(history_ae, f)

else:
    # load previously computed weights
    fashion_encoder.load_weights(encoder_weights_path)
    fashion_decoder.load_weights(decoder_weights_path)

    # load history:
    with open(history_path, 'rb') as f:
        history_ae = pickle.load(f)

In [None]:
def plot_history(history):
    """
    Plot model training history.
    Args:
    - history: tensorflow history object.

    Returns:
    None
    """
    plt.plot(history['loss'], label='Training')
    plt.plot(history['val_loss'], label='Validation')
    plt.legend()
    plt.xlabel('Loss history')
    plt.ylabel('Loss')

    plt.show()

In [None]:
plot_history(history_ae.history)

### Model Summary
We can now get a summary of the model:

In [None]:
mnist_ae.summary()

In [None]:
mnist_encoder.summary()

In [None]:
mnist_decoder.summary()

### Looking at the reconstructions
Let's look at some of the reconstructed images:

In [None]:
def plot_image(image):
    plt.imshow(image, cmap="binary")
    plt.axis("off")
    
def show_reconstructions(model, images=test_images, n_images=5):
    reconstructions = model.predict(images[:n_images])
    fig = plt.figure(figsize=(n_images * 1.5, 3))
    for image_index in range(n_images):
        plt.subplot(2, n_images, 1 + image_index)
        plot_image(images[image_index])
        plt.subplot(2, n_images, 1 + n_images + image_index)
        plot_image(reconstructions[image_index])
    plt.show()

In [None]:
show_reconstructions(mnist_ae)

## Sparse Autoencoders

### Encoding Dimension 30

In [None]:
mnist_encoder_sparse30 = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (28, 28, 1)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(30, activation='relu')
])

mnist_decoder_sparse30 = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (30,)),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(28*28, activation = 'sigmoid'),
    tf.keras.layers.Reshape([28, 28, 1])
])

mnist_ae_sparse30 = tf.keras.Sequential([mnist_encoder_sparse30, mnist_decoder_sparse30])
mnist_ae_sparse30.compile(loss="mse", optimizer="nadam")

In [None]:
# define paths:
mnist_encoder_sparse30_path = './mnist_encoder_sparse30.weights.h5'
mnist_decoder_sparse30_path = './mnist_decoder_sparse30.weights.h5'
history_path = './mnist_decoder_sparse30.history.h5'

if train_from_scratch:
    history30 = mnist_ae_sparse30.fit(train_images, train_images, epochs=nEpochs, validation_data=(val_images, val_images),
                                    callbacks = [ EarlyStopping(monitor='val_loss', patience=nPatience,
                                                                verbose=False, restore_best_weights=True)])
    
    # Save the weights:
    mnist_encoder_sparse30.save_weights(mnist_encoder_sparse30_path)
    mnist_decoder_sparse30.save_weights(mnist_decoder_sparse30_path)

    # Save training history:
    with open(history_path, 'wb') as f:
        pickle.dump(history30, f)

else:
    # load previsously computed weights
    mnist_encoder_sparse30.load_weights(mnist_encoder_sparse30_path)
    mnist_decoder_sparse30.load_weights(mnist_decoder_sparse30_path)

    # load history:
    with open(history_path, 'rb') as f:
        history30 = pickle.load(f)

In [None]:
plot_history(history30.history)

In [None]:
show_reconstructions(mnist_ae_sparse30)

### Encoding Dimension 5

In [None]:
mnist_encoder_sparse5 = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (28, 28, 1)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(30, activation='relu'),
    tf.keras.layers.Dense(5, activation='relu')
])

mnist_decoder_sparse5 = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (5,)),
    tf.keras.layers.Dense(30, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(28*28, activation = 'sigmoid'),
    tf.keras.layers.Reshape([28, 28, 1])
])

mnist_ae_sparse5 = tf.keras.Sequential([mnist_encoder_sparse5, mnist_decoder_sparse5])
mnist_ae_sparse5.compile(loss="mse", optimizer="nadam")

In [None]:
# define paths:
mnist_encoder_sparse5_path = './mnist_encoder_sparse5.weights.h5'
mnist_decoder_sparse5_path = './mnist_decoder_sparse5.weights.h5'
history_path = './mnist_decoder_sparse5.history.h5'

if train_from_scratch:
    history5 = mnist_ae_sparse5.fit(train_images, train_images, epochs=nEpochs, validation_data=(val_images, val_images),
                                    callbacks = [ EarlyStopping(monitor='val_loss', patience=nPatience,
                                                                verbose=False, restore_best_weights=True)])
    
    # Save the weights:
    mnist_encoder_sparse5.save_weights(mnist_encoder_sparse5_path)
    mnist_decoder_sparse5.save_weights(mnist_decoder_sparse5_path)

    # Save training history:
    with open(history_path, 'wb') as f:
        pickle.dump(history5, f)

else:
    # load previsously computed weights
    mnist_encoder_sparse5.load_weights(mnist_encoder_sparse5_path)
    mnist_decoder_sparse5.load_weights(mnist_decoder_sparse5_path)

    # load history:
    with open(history_path, 'rb') as f:
        history5 = pickle.load(f)

In [None]:
plot_history(history5.history)

In [None]:
show_reconstructions(mnist_ae_sparse5)

### Encoding Dimension 2

In [None]:
# Model definition
mnist_encoder_sparse2 = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (28, 28, 1)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(2, activation='relu')
])

mnist_decoder_sparse2 = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (2,)),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(28*28, activation = 'sigmoid'),
    tf.keras.layers.Reshape([28, 28, 1])
])

mnist_ae_sparse2 = tf.keras.Sequential([mnist_encoder_sparse2, mnist_decoder_sparse2])

tf.random.set_seed(42) 
mnist_ae_sparse2.compile(loss="mse", optimizer="nadam")

In [None]:
# define paths:
mnist_encoder_sparse2_path = './mnist_encoder_sparse2.weights.h5'
mnist_decoder_sparse2_path = './mnist_decoder_sparse2.weights.h5'
history_path = './mnist_decoder_sparse2.history.h5'

# Train if needed:
if train_from_scratch:
    history2 = mnist_ae_sparse2.fit(train_images, train_images, epochs=nEpochs, validation_data=(val_images, val_images),
                                    callbacks = [ EarlyStopping(monitor='val_loss', patience=nPatience,
                                                                verbose=False, restore_best_weights=True)])
    
    # Save the weights:
    mnist_encoder_sparse2.save_weights(mnist_encoder_sparse2_path)
    mnist_decoder_sparse2.save_weights(mnist_decoder_sparse2_path)

    # Save training history:
    with open(history_path, 'wb') as f:
        pickle.dump(history2, f)

else:
    # load previsously computed weights
    mnist_encoder_sparse2.load_weights(mnist_encoder_sparse2_path)
    mnist_decoder_sparse2.load_weights(mnist_decoder_sparse2_path)

    # load history:
    with open(history_path, 'rb') as f:
        history2 = pickle.load(f)

In [None]:
plot_history(history2.history)

In [None]:
show_reconstructions(mnist_ae_sparse2)

### Visualizing the Digits in 2D
Since we only have 2 dimensions as "code", we can nicely visualize the representation of the individual digits. To do so, we just pass them through the encoder to get the encodings:

In [None]:
val_encodings_2D = mnist_encoder_sparse2(val_images)
val_encodings_2D.shape

In [None]:
val_encodings_2D[:10,:]

For nicer visualisation, we create a data frame, consisting of both dimensions and the label. We then use the label (i.e., the digit represented by the image) to color the individual representations in 2D:

In [None]:
sparseAE_embedding = pd.DataFrame(val_encodings_2D)
sparseAE_embedding.columns = ['Dimension 1', 'Dimension 2']
sparseAE_embedding['label'] = val_labels

In [None]:
sparseAE_embedding.shape

In [None]:
sns.scatterplot(sparseAE_embedding, x='Dimension 1', y='Dimension 2', hue='label', legend='full', palette='deep')
plt.show()

We also do a plot on logarithmic scale so we can better see the digits that are more concentraged in the encoding space:

In [None]:
sns.scatterplot(sparseAE_embedding, x='Dimension 1', y='Dimension 2', hue='label', legend='full', palette='deep')
plt.xscale('log')
plt.yscale('log')
plt.grid()
plt.show()

### Generating outputs
So far we have used the encoder to encode a digit image an 2D. Now we will use the decoder to decode a code (i.e., a list of 2 elements) to a full image.

**Exercise:** Using the visualisations above, can you predict which inputs will be decoded to images representing which digits? Read off the x- and y-coordinates of a point, and use the color code to determine the expected digit it will represent.
Use the code below to check your finding!

In [None]:
code = np.zeros(shape=[1, 2])
# modify the values of the first and second dimension
code[0, 0] = 10
code[0, 1] = 2
coding = np.squeeze(mnist_decoder_sparse2(code))
plt.imshow(coding, vmin=0, vmax=1)
plt.axis('off')
plt.title(code)
plt.show()

## Visualizing Outputs for Several Inputs

In [None]:
fig, axs = plt.subplots(2, 2, figsize=(6, 6))

for ID1 in range(2):
    for ID2 in range(2):
        code = np.zeros(shape=[1, 2])
        code[0, ID1] = 1
        code[0, ID2] = 1
        coding = np.squeeze(mnist_decoder_sparse2(code))
        im = axs[ID1, ID2].imshow(coding, vmin=0, vmax=1)
        axs[ID1, ID2].axis('off')
        axs[ID1, ID2].set_title(code)

fig.subplots_adjust(right=0.9)
cbar_ax = fig.add_axes([0.95, 0.15, 0.025, 0.7])
fig.colorbar(im, cax=cbar_ax)

plt.show()

In [None]:
fig, axs = plt.subplots(8, 8, figsize=(12, 12))

for val1 in np.linspace(0, 7, 8):
    for val2 in np.linspace(0, 7, 8):
        code = np.zeros(shape=[1, 2])
        if val1>0:
            code[0, 0] = 2**(val1-1)
        if val2>0:
            code[0, 1] = 2**(val2-1)
        coding = np.squeeze(mnist_decoder_sparse2(code))
        im = axs[int(val1), int(val2)].imshow(coding, vmin=0, vmax=1)
        axs[int(val1), int(val2)].axis('off')
        axs[int(val1), int(val2)].set_title(code)

fig.subplots_adjust(right=0.9)
cbar_ax = fig.add_axes([0.95, 0.15, 0.025, 0.7])
fig.colorbar(im, cax=cbar_ax)

plt.show()

## Denoising Autoencoders

### Generate Noisy Data

In [None]:
def add_noise(image_array, noise_factor=0.4):
    """Adds random noise to each image in the supplied array."""
    noisy_array = (1-noise_factor) * image_array + noise_factor * np.random.random(size=image_array.shape)
    return noisy_array

In [None]:
from sklearn.decomposition import PCA

# Add noise:
# - to training images
train_images_noisy = add_noise(train_images)

# - to validation images
val_images_noisy = add_noise(val_images)

# - to test images
test_images_noisy = add_noise(test_images)

# flatten images:
train_images_noisy_flat = train_images_noisy.reshape(train_images_noisy.shape[0], -1)
val_images_noisy_flat = val_images_noisy.reshape(val_images_noisy.shape[0], -1)
test_images_noisy_flat = test_images_noisy.reshape(test_images_noisy.shape[0], -1)

### Denoising with PCA

In [None]:
# Apply PCA to reduce to 32 components
pca = PCA(n_components=32, random_state=42)
pca.fit(test_images_noisy_flat)

In [None]:
reconstructed_pca = pca.inverse_transform(pca.transform(test_images_noisy_flat))

In [None]:
import matplotlib.pyplot as plt

def plot_digits(X, title):
    """Small helper function to plot 100 digits."""
    fig, axs = plt.subplots(nrows=4, ncols=10, figsize=(8, 4))
    for img, ax in zip(X, axs.ravel()):
        ax.imshow(img.reshape((28, 28)), cmap="binary")
        ax.axis("off")
    fig.suptitle(title, fontsize=24)
    plt.show()

In [None]:
# Visualize the PCA results
plot_digits(test_images, "Test images")
plot_digits(test_images_noisy, "Noisy test images")
plot_digits(reconstructed_pca, "Denoised by PCA")


### Denoising with AE

In [None]:
mnist_encoder_denoise = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (28, 28, 1)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(30, activation='relu')
])

mnist_decoder_denoise = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (30,)),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(300, activation='relu'),
    tf.keras.layers.Dense(28*28, activation = 'sigmoid'),
    tf.keras.layers.Reshape([28, 28, 1])
])

mnist_ae_denoise = tf.keras.Sequential([mnist_encoder_denoise, mnist_decoder_denoise])
mnist_ae_denoise.compile(loss="mse", optimizer="nadam")

In [None]:
# define paths:
mnist_encoder_denoise_path = './mnist_encoder_denoise.weights.h5'
mnist_decoder_denoise_path = './mnist_decoder_denoise.weights.h5'
history_path = './mnist_decoder_denoise.history.h5'

if train_from_scratch:
    history_denoise  = mnist_ae_denoise.fit(train_images_noisy, train_images, epochs=nEpochs, validation_data=(val_images_noisy, val_images),
                                           callbacks = [ EarlyStopping(monitor='val_loss', patience=nPatience,
                                                                       verbose=False, restore_best_weights=True)])
                                       
    # Save the weights:
    mnist_encoder_denoise.save_weights(mnist_encoder_denoise_path)
    mnist_decoder_denoise.save_weights(mnist_decoder_denoise_path)

    # Save training history:
    with open(history_path, 'wb') as f:
        pickle.dump(history_denoise, f)

else:
    # load previsously computed weights
    mnist_encoder_denoise.load_weights(mnist_encoder_denoise_path)
    mnist_decoder_denoise.load_weights(mnist_decoder_denoise_path)

    # load history:
    with open(history_path, 'rb') as f:
        history_denoise = pickle.load(f)

In [None]:
plot_history(history_denoise.history)

In [None]:
reconstructed_ae = mnist_ae_denoise(test_images_noisy)

In [None]:
# Visualize the AE results
plot_digits(test_images, "Test images")
plot_digits(test_images_noisy, "Noisy test images")
plot_digits(reconstructed_ae.numpy(), "Denoised by Autoencoder")

## Autoencoder based on Convolutions
Below is the definition of a very simple autoencoder:

In [None]:
CNNmnist_simple_encoder = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (28, 28, 1)),
    tf.keras.layers.Conv2D(16, 3, padding="same", activation="relu"),
    tf.keras.layers.MaxPool2D(pool_size=2),  # output: 14 × 14 x 16
    tf.keras.layers.Conv2D(32, 3, padding="same", activation="relu"),
])

CNNmnist_simple_decoder = tf.keras.Sequential([
    tf.keras.layers.Input(shape = (14, 14, 32)),
    tf.keras.layers.Conv2D(16, kernel_size = (3,3), activation = 'selu', padding = 'same'),
    tf.keras.layers.UpSampling2D((2,2)),
    tf.keras.layers.Conv2D(1, kernel_size = (3,3), activation = 'sigmoid', padding = 'same'),
])

CNNmnist_simple_ae = tf.keras.Sequential([CNNmnist_simple_encoder, CNNmnist_simple_decoder])

**Exercise:** Using inspiration from the commands above, compile and train this autoencoder. Start with 10 epochs, you can always extend the training later. It's probably easiest if you save this notebook under a new name, and then work through it again using CNN layers.

Compare the number of parameters and the results with the autoencoder based on the densely connected layers (i.e., without convolutions). What do you observe?