# Moderne Methoden der Datenanalyse SS2021
# Practical Exercise 10: Deep Learning

The classification of handwritten digits is a standard problem in the field of image classification. In this exercise, we will process labeled images of handwritten digits from the MNIST dataset in order to train and test (deep) neural networks on this task. The goal of this exercise is for you to dive into a state-of-the-art software package for large-scale deep learning, to experiment with your own neural-network designs, and to compare your results with modern setups.

TensorFlow is one of the most popular and powerful tools in the machine learning community and can be used to build, train, and execute large-scale machine-learning models. The core concept of TensorFlow is the representation of the information flow as tensors in a graph. In this exercise, the wrapper Keras is used, which hides this concept to a large extent and makes the library much easier to use.

The exercise is shipped with a script for the download of relevant data and one notebook. All needed software such as TensorFlow (www.tensorflow.org) and Keras (www.keras.io) is already installed on the Jupyter Machine. 

<center><img src="DataAna_MNIST_example.png" width ="500" alt="MNIST_example"></center>
<center>Fig.1: Example images from the MNIST dataset</center>
<br>


The MNIST dataset (https://yann.lecun.com/exdb/mnist) contains a total of 70000 images of handwritten digits. The images are in greyscale with 28 x 28 pixels each (see Fig.1 for some examples). Execute the script `download_dataset.py` to download and extract the dataset as binary. The script also converts some example images from the binary dataset as `png` files (greyscale-inverted images of Fig.1). Have a look at the example images and at the code.

In preparation of the training ([Exercise 10.1](#exercise101_TNT)), read the code in the function [`train()`](#train_me) in the Jupyter notebook and identify the part, where the machine learning model is defined. The model contains many components of modern architectures, e.g., convolutional, dense, and maximum pooling layers. Also specified in the code are the loss function, the optimizer, and the validation metrics used for the training. The full documentation is available on the Keras webpage (www.keras.io).

<a name=exercise101_TNT></a>
## Exercise 10.1: Training and Testing (obligatory)
Now we set up a model, train it, and test its performance.

- Explain the meaning and function of the various parts of the example model and understand how the total number of trainable parameters, 1765, comes about. 
    *Hint*: Look at the output of the code line `model.summary()` and keep in mind that there are bias terms. 

    What would be the number of parameters for a convolutional (dense) neural network with one hidden layer of n nodes and 10 outputs?

- Now, modify the code below ([`train()`](#train_me)) and try to achieve the best global accuracy. *Hint*: You will need to increase the model capacity, e.g., larger number of convolution filters or additional dense layers, and the number of epochs. With increasing model capacity, you will quickly understand why GPUs play such a big role in machine learning.

    What would be a good method to evaluate the optimal number of training epochs? Plot the accuracy of the model on training and validation sets as a function of training epochs.
    
    Can you explain the worse training accuracy compared to the validation accuracy?
    
- For your trained model, produce an estimate of your achieved accuracy by running [`apply()`](#apply_and_test) manually on about twenty images, i.e., `example_input_*.png`. 

    How does your estimate compare with the result on the test dataset computed with the script [`test()`](#apply_and_test) ? 
    Do the results match?
    
<a name=train_me></a>

In [None]:
# Importing usual python packages
import sys, glob, png
import numpy as np
import struct
import matplotlib
#matplotlib.use('Agg') # use this line to change the matplotlib backend
from matplotlib import pyplot as plt

In [None]:
#%%capture output
# Importing ML related packages. 
import tensorflow
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.optimizers import Adam
from keras.utils import np_utils

from download_dataset import load_data, binary_to_png

In [None]:
# unpack some images from binary format
binary_to_png({'images': 'train_images.bin', 'labels': 'train_labels.bin'}, 20)

In [None]:
# check the tensorflow version we are using. Should be 2.8.0
print(tensorflow.__file__)
print(tensorflow.__version__)

In [None]:
def train():
    # Load training data
    images, labels = load_data('train_images.bin', 'train_labels.bin')

    # Convert labels from integers to one-hot vectors
    labels = np_utils.to_categorical(labels, 10)

    # Set up model
    model = Sequential()
    model.add(Conv2D(2, (2, 2), kernel_initializer='glorot_normal', input_shape=(28,28,1)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
    model.add(Flatten())
    model.add(Dense(5, kernel_initializer='glorot_normal'))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(10, kernel_initializer='glorot_uniform'))
    model.add(Activation('softmax'))
    
    # Define loss function, optimizer algorithm and validation metrics
    model.compile(
        loss='categorical_crossentropy',
        optimizer=Adam(),
        metrics=['categorical_accuracy'])

    # Print summary of the model
    model.summary()

    # Train model
    history = model.fit(images, labels, batch_size=128, epochs=10, validation_split=0.25)
    #history = model.fit(images, labels, batch_size=128, epochs=20, validation_split=0.25)

    # Get training and validation loss/accuracy values from history
    loss_training = history.history['loss']
    loss_validation = history.history['val_loss']
    accuracy_training = history.history['categorical_accuracy']
    accuracy_validation = history.history['val_categorical_accuracy']

    # TODO: Plot the training and validation loss/accuracy vs the number of epochs
    #plt.plot(...)
    #plt.savefig('loss_vs_epochs.png')   
    
    # Save model to file
    model.save('model.hd5')
    
    return

In [None]:
train()

In [None]:
# check if the output model exists
!date
!ls -haltr ./*.hd5

The following code blocks are provided to apply the model manually to a list of files, or to test it on the test dataset.
<a name=apply_and_test></a>

In [None]:
def apply(png_list):
    if len(png_list) < 1:
        raise Exception('Please specify at least one PNG image as argument.')

    # Load trained keras model
    model = load_model('model.hd5')

    # Get image names from arguments
    print('Load images:')
    filename_images = []
    for arg in png_list[1:]:
        print('    {}'.format(arg))
        filename_images.append(arg)

    # Load images from files
    images = np.zeros((len(filename_images), 28, 28, 1))
    for i_file, file_ in enumerate(filename_images):
        pngdata = png.Reader(open(file_, 'rb')).asDirect()
        for i_row, row in enumerate(pngdata[2]):
            images[i_file, i_row, :, 0] = row

    # Predict labels for images
    labels = model.predict(images)
    numbers = np.argmax(labels, axis=1)
    print('Predict labels for images:')
    for file_, number in zip(filename_images, numbers):
        print('    {} : {}'.format(file_, number))

In [None]:
example_inputs = glob.glob('example_input_*.png')
apply(example_inputs)

In [None]:
def test():
    # Load trained keras model
    model = load_model('model.hd5')

    # Load test data
    images, labels = load_data('test_images.bin', 'test_labels.bin')

    # Predict written numbers in images
    labels_predicted = model.predict(images)

    # Decode the one-hot vectors to labels
    labels_decoded = np.argmax(labels_predicted, axis=1)

    # Calculate accuracy of prediction
    num_correct = np.sum(labels_decoded == labels)
    print('Accuracy on test dataset: {}'.format(float(num_correct)/len(labels)))

In [None]:
test()

### Exercise 10.2 (voluntary)

Use GIMP (or any other graphics editor) to create images of your own handwritten digits and evaluate whether the performance you experience with your own example images matches the performance achieved during training on the MNIST dataset. You need to create greyscale png images with 28 x 28 pixels, the background of the image has to be black and the digits have to be written in white color. The file `your_own_digit.xcf` can be used as template. Does the model classify your images correctly? If not, what can be possible reasons that it does not work as expected?

Have a look at the website http://yann.lecun.com/exdb/mnist, which holds a leaderboard for the test dataset with the performance of modern (deep) machine learning models. You might want to compare your model with popular computer vision models such as `LeNet-5`, `VGG-16`, `AlexNet` or `Inception-v4` to get an idea of the complexity of modern neural network architectures.