# LB02.0 Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) are a special type of neural networks that are typically used to process multidimensional data. Especially when dealing with pictures or time series, CNNs may be your best choice. CNNs use convolutions within the network, hence the name. The operation is performed with a kernel that is moved across the grid-like data structures that are fed into the network.

### Convolutions
In general, convolutions may be described as an operation applied to two functions $f$ and $g$ that gives us a third function $f * g$. You can picture convolutions by considering a space ship that is moving through space while its position is tracked by a laser beam. The laser's output may be described by the function $x(t)$, where both $x$ and $t$ consist of real valued measurements. While these measurements may help you to track the aforementioned space ship on its way through the universe, you may also use convolutions to approximate the space ship's position. An approximation such as this may be more convenient in use than querying the laser's measuerment $x(t)$ over and over again every millisecond.

If a weighting function $w(a)$, with $a$ describing the age of a measurement, is applied over time, it results in a function $s$, which gives us a way to aproximate a smoothed average position of the space ship.

\begin{equation}
s(t) = \int x(a)\cdot w(t - a) da
\end{equation}

This operation is referred to as convolution in literature and is typically denominated by a star-like symbol.

\begin{equation}
s(t) = (x * w) (t).
\end{equation}

When facing discrete values, as is typical when processing time values $t$ that are recorded periodically, a discrete convolution may be used:

\begin{equation}\label{discrete_convolution}
s(t) = (x * w)(t) = \sum_{a=-\infty}^{\infty} x(a)\cdot w(t - a).
\end{equation}

In CNNs, the first argument of the function, $x$, is usually denoted as **input** whereas the second argument ($w$) is called **kernel**. As mentioned in the introduction, the input $x$ most commonly consists of multidimensional data while the kernel $w$ holds a multidimensional array of parameters that are to be optimized through machine learning.

#### Convolutions in image processing
In image processing, $\boldsymbol{I}$ is usually used to describe the input image to a convolution, $\boldsymbol{K}$ denotes the kernel and $\boldsymbol{S}$ describes the output picture that results from the convolution.

\begin{equation}
\boldsymbol{S\,}(i, j) = (\boldsymbol{I} * \boldsymbol{K\,}) (i, j) = \sum_{m} \sum_{n} \boldsymbol{I\,}(m,n)\cdot \boldsymbol{K\,}(i - m, j - n).
\end{equation}

In [None]:
# this is just to suppress some annoying warning messages
import warnings
with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=FutureWarning)
    # import h5py

import sys
import os

import numpy as np
import tensorflow as tf

from keras import __version__
from keras.models import Sequential, model_from_json
from keras.layers import Convolution2D, MaxPooling2D, Dropout, Flatten, Dense, Activation, BatchNormalization
from keras import optimizers
from keras.utils import np_utils
from keras import datasets
from keras import callbacks
from keras.callbacks import Callback, EarlyStopping
from keras.models import load_model

import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.lines import Line2D

from datetime import datetime
import time

from sklearn.metrics import accuracy_score

In [None]:
# Defining the log folder for tensorboard (helps by visualizing training curves)
logdir = "cnn_logs/"

if not os.path.exists(logdir):
    os.makedirs(logdir)

In [None]:
# This function is needed later when evaluating the classifier's results.
import itertools

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    fig = plt.figure()

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        cm = np.around(cm, decimals=2, out=None)  
    
    
    thresh = cm.max() / 2.
    
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    fig.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

In [None]:
print('NumPy Version: %s' % np.__version__)
print('Tensorflow Version: %s' % tf.__version__)
print('Keras Version: %s\n' % __version__)

## LB02.1 Data preparation

* Load the [MNIST](http://yann.lecun.com/exdb/mnist/) data using `mnist.load_data()`

* Prepare the input images:
    * Convert images to float32-datatype (`.astype()`)
    * Scale images to the interval [0, 1]
    * Reshape the data in order to get a 4th dimension. You will need a 4d tensor when using CNNs with Keras
    * **Important**: Apply one hot encoding to the labels

In [None]:
# TODO: load the mnist dataset, X_test is used for final performance evaluation only
(X_train, y_train), (X_test, y_test) = ...

# TODO: print the current dimensions of the dataset
print("Training matrix shape", ...)
print("Testing matrix shape", ...)
print("Training labels shape", ...)
print("Testing labels shape", ...)

In [None]:
# TODO: convert the train and test data to a floating number
X_train = ...
X_test = ...

# TODO: normalize images within the interval [0,1]
X_train /= ...
X_test /= ...

In [None]:
# TODO: reshape to add an additional dimension in order to create a needed tensor
X_train= ...
X_test= ...

# TODO: print the new dimensions of the dataset
print("Training matrix shape - dimension added", ...)
print("Testing matrix shape - dimension added", ...)

In [None]:
# TODO: convert the training labels to one hot encoded vectors
y_train = ...

In [None]:
# for selection of just a subset (faster computation to check if everything runs ok)
# X_train= X_train[0:1000,:]
# X_test= X_test[0:200,:]

# y_train= y_train[0:1000]
# y_test= y_test[0:200]

print(X_test.shape)
print(y_test.shape)

## LB02.2 Architecture Definition 

Implement a CNN for a MNIST dataset classification according to the architecture shown in the picture below.

<img src="resources/LB02_cnn_architecture.png"/>

* Train the CNN for 25 epochs and use Adam with default values as an optimizer
* Use rectified linear units for all activations but the final output layer
* Make sure to use a suiting activation in the output layer
* Use a batch size of 256

In [None]:
# TODO: define the parameters for the training
batch_size = ...
max_epochs = ...
patience = ...

In [None]:
# defining a callback in order to enable tensorboard visualizations
tb = callbacks.TensorBoard(log_dir=logdir + "AE_Stacked_Stack2_" + datetime.now().strftime("%Y.%m.%d-%H:%M:%S"))

In [None]:
# defining an early stopping callback
early_stop = callbacks.EarlyStopping(monitor='val_accuracy', min_delta=1e-04, patience=patience, mode='auto')

In [None]:
# TODO: define the cnn model architecture shown in the picture above
model = Sequential()
...

In [None]:
# TODO: define the optimizer 
adam = ...

# TODO: compile the created model
model.compile(...)

# print model structure
print(model.summary())

In [None]:
print('\nStarting training process...\n')

# TODO: train the created model using your training data
# TODO: make sure to use the defined number of epochs and the defined batch size
# TODO: also use a validation split of 70/30 and created callbacks
model.fit(...)

In order to see the training curves you can now activate the tensorboard in your docker container using the following command after navigating to the working directory (e.g. `/notebooks/<your-working-directory>/`): 

`tensorboard --logdir cnn_logs --host 0.0.0.0`

Please note that the `--logdir` parameter has to be the same as your `logdir` variable. 

Afterwards navigate to [http://localhost:6006](http://localhost:6006) in your internet browser.

## LB02.3 Evaluation

In [None]:
# TODO: use your trained model to predict the data
prediction= ...

# TODO: you will have to revert the categorical labels to the numerical 
# TODO: labels in order to use the `confusion_matrix` function. Hint: `argmax`
y_pred = ...

In [None]:
# TODO: print the accuracy score
acc  = ...
print("Accuracy: %.4f" % ...)

In [None]:
from sklearn.metrics import confusion_matrix

# TODO: calculate the confusion matrix
cm = ...

In [None]:
# TODO: plot the confusion matrix
...

## LB02.4 Experimenting with CNNs
In this step you will change the architecture of your network slightly and examine how these changes affect the outcome.

Describe and document the outcome you observe. To do this, you may screenshot tensorboard's training curves and paste those directly into this notebook with a fitting description.

### LB02.4a Dropout

Use the same architecture as in LB02.3 but add a dropout layer after each max-pooling layer and after the first fully connected layer. Use a dropout percentage of 25%.

### LB02.4b Different number of feature maps
Use 16 feature maps for Conv1 and 32 feature Maps for Conv2. 
How does this afffect the number of the trainable parameters?

### LB02.4c Stride
Use the same architecture as in LB02.3 but add a stride of 2 for Conv1. Observe the size of the feature maps and compare results to the architecture without stride.
Describe, in your own words, how stride affects the feature map's dimensions in your CNN.

### LB02.4d Increased number of trainable parameters
Use 64 feature maps for Conv1 and 128 feature maps for Conv2 with 30% dropout.
Increase the number of neurons of the first fully connected layer to 256.
Observe the training progress and the model performance. Is this performance superior in comparison to previous results? 