# LB01b.0 Stacked Autoencoder (60%)

The idea behind Stacked Autoencoders is to stack multiple simple autoencoders where output of each hidden layer is connected to the input of the successive autoencoder. 

<img src="resources/LB01b_stack1_stack2.png"/>

In the image above you can see Stack 1 and Stack 2. These are two simple autoencoders as we know them from the LB01a. As you can see, the latent space of the Stack 1 is used as the input for the Stack 2. 


<img src="resources/LB01b_stacked_autoencoder.png"/>

The stacking of the two autoencoders "Stack 1" and "Stack 2" will result in the architecture depicted in the image above.

In order to train this model you will have to use two separate sequential models. The encoded output of the first layer will serve as input and training of the second layer. 

In [None]:
# Importing the packages needed for this lecture
import sys
import os
import os.path
import numpy as np

import tensorflow as tf
from keras.layers import Input, InputLayer, Dense
from keras.models import Model, Sequential, load_model
from keras.callbacks import Callback, EarlyStopping, TensorBoard
from keras.optimizers import Adam
import keras as K

from keras.datasets import mnist

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import time

from sklearn.svm import LinearSVC, SVC
from sklearn.metrics import accuracy_score, mean_squared_error
from datetime import datetime

In [None]:
# Defining the log folder for tensorboard (helps by visualizing training curves)
logdir = "logs/"
modeldir = "models/"

if not os.path.exists(logdir):
    os.makedirs(logdir)
    
if not os.path.exists(modeldir):
    os.makedirs(modeldir)

In [None]:
# Function for plotting a specified number of images: original vs. encoded vs. decoded
def plot_encoded_img(imgs, encoded_img, rnd_idx, aspect_ratio=0.1, decoded_img= None, title= None):
    plt.figure(figsize=(18, 8))
    if title is not None:
        plt.suptitle(title, fontsize= 16)

    for i, image_idx in enumerate(rnd_idx):
        # plot original image (input, x)
        ax = plt.subplot(3, num_images, i + 1)
        plt.imshow(imgs[image_idx].reshape(28, 28))
        plt.gray()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

        # plot encoded image (latent space, h)
        ax = plt.subplot(3, num_images, num_images + i + 1)
        plt.imshow(encoded_img[image_idx].reshape(-1, 1), aspect=aspect_ratio)
        plt.gray()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

        if decoded_img is not None:
            # plot reconstructed image (output, y)
            ax = plt.subplot(3, num_images, 2 * num_images + i + 1)
            plt.imshow(decoded_img[image_idx].reshape(28, 28))
            plt.gray()
            ax.get_xaxis().set_visible(False)
            ax.get_yaxis().set_visible(False)

In [None]:
# This function is needed later when evaluating the classifier's results.
import itertools

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    fig = plt.figure()

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        cm = np.around(cm, decimals=2, out=None)  
    
    
    thresh = cm.max() / 2.
    
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    fig.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

## LB01b.1 Data preparation

* Load the [MNIST](http://yann.lecun.com/exdb/mnist/) data using `mnist.load_data()`

* Prepare the input images:
    * Convert images to float32-datatype (`.astype()`)
    * Scale images to the interval [0, 1]
    * Images have a 28x28 pixel resolution, transform them to a 1-dimensional vector using `.reshape()`


In [None]:
# TODO: load the mnist image data (28x28px) and its labels (can be used later on for SVM classification)
(x_train, y_train), (x_test, y_test) = ...

# TODO: normalize images within the interval [0,1]
x_train = ...
x_test = ...

# TODO: flatten the 28x28 images into a 1d vector
x_train = ...
x_test = ...

# TODO: print new flattened shapes of x_train and x_test
print(x_train.shape)
print(x_test.shape)

## LB01b.2 Stacked Autoencoder definition

Using the knowledge gathered in the exercise LB01a implement a stacked autoencoder with two hidden layers with the dimensionality of $d_1 = 128$ and $d_2 = 32$

* Each stack has to be trained separately. Hence, two separate sequential models are needed.
* Hint: Use sigmoid activation function for all layers.
* The encoded output of the first layer serves as input of the second layer. 
* The encoded output of the second layer represents the latent space. 
* Extract the weights (`get_weights()`) of all layers in order to create the final model at the end.

In [None]:
# TODO: get dimensions for the AE's input layer
input_dim = ...

# TODO: define size of encoded representations for the 2 stacks
encoding_dim1 = ...
encoding_dim2 = ...

In [None]:
# TODO: compute compression factor, i.e. dimensionality reduction
compression_factor = ...
print('Compression factor: %.1f' % compression_factor)

In [None]:
# TODO: define autoencoder stack 1
autoencoder1= ...
autoencoder1.add(...)
autoencoder1.add(...)

# TODO: define autoencoder stack 2
autoencoder2= ...
autoencoder2.add(...)
autoencoder2.add(...)

In [None]:
# print the architectures of the two stacks of the autoencoder
print(autoencoder1.summary())
print(autoencoder2.summary())

In [None]:
# TODO: compile your first autoencoder
autoencoder1.compile(...)

In [None]:
# TODO: set the number of epochs to 50
max_epochs= ...

In [None]:
tensorboard_callback = TensorBoard(log_dir=logdir + "AE_Stacked_Stack1_" + datetime.now().strftime("%Y.%m.%d-%H:%M:%S"))

# TODO: train the autoencoder using input and target accordingly. Think about what we want an AE
# TODO: to do. Also shuffle training data and provide a validation split. 
# TODO: Apply the tensorboard callback to the fit command
# TODO: Set the number of epochs to max_epochs.
# input = target
autoencoder1.fit(...)

# TODO: save the entire model graph with weights to the following model path
model_path = modeldir + 'stack1.h5'
...

del(autoencoder1)

In order to see the training curves you can now activate the tensorboard in your docker container using the following command after navigating to the working directory (e.g. `/notebooks/<your-working-directory>/`): 

`tensorboard --logdir logs --host 0.0.0.0`

Please note that the `--logdir` parameter has to be the same as your `logdir` variable. 

Afterwards navigate to [http://localhost:6006](http://localhost:6006) in your internet browser.


In [None]:
# load the entire model (no compilation necessary)
autoencoder1 = load_model(modeldir + "stack1.h5")

# TODO: extract the encoder part of the first autoencoder
stack1_encoder = ...

# TODO: Create a new sequential model with only the encoder part of 
# TODO: the autoencoder. You will use this model to generate input for the 
# TODO: successive autoencoder (Stack 2).
encoder1= ...

# printing the summary and deleting the first autoencoder from memory
print(encoder1.summary())
del(autoencoder1)

In [None]:
# TODO: now use the encoder part of the first autoencoder 
# TODO: to generate latent representations
encoded1_imgs = ...

In [None]:
# TODO: now go on with the second autoencoder, compile the model you created
# TODO: use a suited optimizer and loss function
autoencoder2.compile(...)

In [None]:
tensorboard_callback = TensorBoard(log_dir=logdir + "AE_Stacked_Stack2_" + datetime.now().strftime("%Y.%m.%d-%H:%M:%S"))

# TODO: Train the second autoencoder using input and target accordingly. 
# TODO: Remember to use the output of the first encoder as input for this one.
# TODO: Also shuffle training data and provide a validation split. 
# TODO: Apply the tensorboard callback to the fit command
# TODO: Set the number of epochs to max_epochs.
# input = target
autoencoder2.fit(...)

# TODO: save the entire model graph with weights to the following model path
model_path = modeldir + 'stack2.h5'
...

del(autoencoder2)

In [None]:
autoencoder2 = load_model(modeldir + "stack2.h5")

# TODO: Extract just the second encoder to visualize the encoded representation
# TODO: (latent space), also can serve as feature extractor (or dimensionality reduction)
stack2_encoder = ...

# TODO: Create a new sequential model with only the encoder part of 
# TODO: the second autoencoder. You will use this model to generate 
# TODO: the encoded representations.
encoder2 = ...

# printing the summary 
print(encoder2.summary())

In [None]:
# TODO: Encode the images encoded by autoencoder stack 1, with autoencoder stack 2. 
# TODO: The result are the images in latent space
encoded2_imgs = ...

In [None]:
##### Now lets get ready to build the whole model #####
# TODO: Load the stack 1 and stack 2 from your hard drive.
autoencoder1 = ...
autoencoder2 = ...

# TODO: Create a new sequential model 
stacked_autoencoder = ...
# TODO: Add the layers of the stack 1 and stack 2 accordingly to your new model
...

# TODO: Compile the newly created model
stacked_autoencoder.compile(...)

# printing the summary
print(stacked_autoencoder.summary())

In [None]:
# Once the new stacked autoencoder is created, we will have to 
# reuse the weights of the stack 1 and stack 2 and set them accordingly to the
# layers of the new model.

# TODO: Extract the weights of the encoder and decoder parts of both stacks
stack1_enc_weights = ...
stack1_dec_weights = ...
stack2_enc_weights = ...
stack2_dec_weights = ...

# TODO: set the extracted weigths to the matching layers of the stacked autoencoder
...

In [None]:
# TODO: Create a new sequential model, which will only contain the encoder
# TODO: part of the stacked autoencoder.
stacked_encoder = Sequential(name= "Stacked_Encoder")
stacked_encoder.add(...)
stacked_encoder.add(...)

# TODO: compile your new encoder model
stacked_encoder.compile(...)

# printing the summary
print(stacked_encoder.summary())

In [None]:
# Once the encoder is created, we will have to 
# reuse the weights of the autoencoder and set them accordingly to the
# layers of the new encoder.

# TODO: Reuse the extracted weights of the encoder parts of the 
# TODO: stacked autoencoder and set them to the new encoder.
...

## LB01b.3 Evaluation

* Use the encoder part of the autoencoder in order to create the latent space predictions
* Use your stacked autoencoder in order to predict the test images
* Compute the average mean squared error of all images in the test set
* Use a simple MLP to classify the data based on the generated features
* Use a support vector machine to classify the data

In [None]:
# TODO: Use the encoder part of the stacked autoencoder in order to generate
# TODO: representations of the latent space.
encoded_imgs = ...

# TODO: Encode/decode the test images with the full stacked autoencoder model.
decoded_imgs = ...

In [None]:
# TODO: compute the average MSE over all test images (original/decoded)
...

avg_mse= ...
print('Average MSE for all original/decoded images: %.4f' % avg_mse)

In [None]:
# plot a random selection of images: original vs. encoded vs. decoded
# just an example how to generate a random index in order to select original/encoded/decoded images from
# the test set
num_images = 20
np.random.seed(42)
random_images = np.random.randint(x_test.shape[0], size=num_images)
plot_encoded_img(x_test, encoded_imgs, random_images, aspect_ratio=0.1, decoded_img= decoded_imgs, title='Deep Autoencoder - Reconstructed')

In [None]:
#### Now lets use the generated representations and try to classify #####
#### the input data based on the representations.                   #####

# TODO: Create a sequential model
classification_mlp = ...
# TODO: Add a dense layer with 16 nodes and use relu as activation function
classification_mlp.add(...)
# TODO: Add the classification layer. Recall the knowledge from last semester:
# TODO:     - How many nodes do you need?
# TODO:     - Which activation function do you need?
classification_mlp.add(...)

# TODO: Compile the model with a suited optimizer and loss function
classification_mlp.compile(...)

In [None]:
from keras.utils import to_categorical

# TODO: Use the suited function to generate categorical labels for your dataset
y_train_categorical = ...
y_test_categorical = ...

In [None]:
# TODO: Use the encoder part of the stacked autoencoder to generate the representations 
# TODO: for the train and test data
x_train_features = stacked_encoder.predict(...)
x_test_features = stacked_encoder.predict(...)

In [None]:
tensorboard_callback = TensorBoard(log_dir=logdir + "MLP_" + datetime.now().strftime("%Y.%m.%d-%H:%M:%S"))

# TODO: Fit your newly created MLP, use 100 epochs and batch size of 256.
# TODO: Also shuffle your data and use a validation split of 70/30.
classification_mlp.fit(....)

In [None]:
# TODO: Use the fitted classificator and predict the data.
y_pred = classification_mlp.predict(...)

In [None]:
from sklearn.metrics import confusion_matrix
from numpy import argmax

# TODO: You will have to revert the categorical labels to the numerical 
# TODO: labels in order to use the `confusion_matrix` function. Hint: `argmax`
y_pred = ...
cm = confusion_matrix(...)

# TODO: Print the accuracy score
acc_score = ...
print('Accuracy: %.4f' % acc_score)

In [None]:
# TODO: Plot the confusion matrix
plot_confusion_matrix(...)

In [None]:
#### Now lets try another classifier. #####

# TODO: Create a support vector machine classifier using sklearns SVC 
# TODO: (Hint: `sklearn.svm.SVC`)
clf= ...

# TODO: Fit your newly created classifier
clf.fit(...)

# TODO: Predict the test data.
y_pred= clf.predict(...)

# TODO: Print the accuracy score
acc_score = ...
print('Accuracy: %.4f' % acc_score)

In [None]:
# TODO: Compute the confusion matrix.
cm = confusion_matrix(...)

In [None]:
# TODO: Plot the confusion matrix.
plot_confusion_matrix(...)