# Classification using FCN and CNN
In this notebook we want to use 2 different approaches for doing classification on 2D image data. Further we want to plot our accuracy and and test our network when importing own handwritten digits
1. We do classification on a fully connected network **FCN**
2. We do classification on a convolutional neural network **CNN**

In [None]:
# TensorFlow and tf.keras
%load_ext autoreload
%autoreload 2
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense, Input
from tensorflow.keras.models import Model

# Commonly used modules
import numpy as np
import os
import sys

# Images, plots, display, and visualization
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import cv2
import IPython
from six.moves import urllib
%reload_ext tensorboard
print(tf.__version__)

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

# Creating the dataset

The MNIST dataset contains 70,000 grayscale images of handwritten digits at a resolution of 28 by 28 pixels. The task is to take one of these images as input and predict the most likely digit contained in the image (along with a relative confidence in this prediction):


<table><tr>
<td> <img src="Pictures/mnist_numbers.png" width="500" height="400px" align="left"> </td>
<td> <img src="Pictures/nn.png" width="500" align="right"> </td>
</tr></table>

Now, we load the dataset. The images are 28x28 NumPy arrays, with pixel values ranging between 0 and 255. The *labels* are an array of integers, ranging from 0 to 9.

## Preprocessing/Normalization
We normalize the images values to a range of 0 to 1 before feeding to the neural network model. For this, we divide the values by 255. It's important that the *training set* and the *testing set* are preprocessed in the same way.
Main purpose of normalization is to make computation efficient and have a faster convergence by reducing values between 0 to 1. The result is that the network learns faster, reduce the chances of getting stuck in local optima and **could** lead to higher accurracy (Thats not always the case)

* **Allows higher learning rates**: 
Gradient descent usually requires small learning rates for the network to converge, this is because of gradient vanishing problem. As networks get deeper, gradients get smaller during back propagation, and so require even more iterations to converse(gradient vanishing problem). Using normalisation allows much higher learning rates, increasing the speed at which networks train.

* **Makes weights easier to initialise**: Choice of initial weights are very important crucial and can also influence training time. Weight initialisation can be difficult, especially when creating deeper networks. Normalisation helps reduce the sensitivity to the initial starting weights.

* **Makes more activation functions viable**: Some activation functions don’t work well in certain situations. 


In [None]:
def preprocess_images(imgs): # should work for both a single image and multiple images
    if imgs.shape != [(28, 28, 1)] and imgs.ndim == 3:
        imgs = cv2.cvtColor(imgs, cv2.COLOR_RGB2GRAY)
        imgs = cv2.resize(imgs,(28,28))
        imgs = cv2.bitwise_not(imgs)
    sample_img = imgs if len(imgs.shape) == 2 else imgs[0]
    assert sample_img.shape in [(28, 28, 1), (28, 28)], sample_img.shape # make sure images are 28x28 and single-channel (grayscale)
    return imgs / 255.0

In [None]:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
print(f"train_images: {train_images.shape}")
print(f"test_images: {test_images.shape}")

# reshape images to specify that it's a single channel image
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)

print(f"train_images: {train_images.shape}")
print(f"test_images: {test_images.shape}")

<img src="https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/paper-pen.png" width="60" align="left" />  

**Exercise**: Display the first 5 images from the *training set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network.

**Hint**: Some helpful function templates
```python 
            # loop function
            for i in iterable:
                pass
            
            # subplots
            plt.subplot(nrows, ncols, index)
                        
            # plot function
            plt.imshow(image, cmap=plt.cm.binary)
```

<details>
    <summary> <b>Click here for one possible solution</b></summary>
    
```python
plt.figure(figsize=(10,2))
for i,x in enumerate(train_images[0:5]):
    plt.subplot(1,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x, cmap=plt.cm.binary)
    plt.xlabel(train_labels[i])
```
</details>

In [None]:
############ YOUR CODE HERE ############

# Part 1: Classification of MNIST using Fully Connected Neural Network
The “dense” or the “fully-connected” neural network (NN) is the simplest form of neural net where a neuron in a given layer is connected to all the neurons in the previous and the next layers as shown in the below diagram.

<img src="Pictures/mnist_2layers.png" width="500px">

The dense NN can only take one-dimensional (1D) input and hence the 2D inputs like images have to be “flattened” as shown in the diagram before feeding them to the dense NN. 

In [None]:
image_vector_size  = train_images.shape[1]*train_images.shape[2]
print(f"Image Vector Size: {image_vector_size}")

# Reshape the training_images
X_train = train_images.reshape((-1, image_vector_size)) # Flatten the 2D input to 1D 
                                                        # One shape dimension can be -1. In this case, the value 
                                                        # is inferred from the length of the array and remaining 
                                                        # dimensions.
print(f"Training 1D input shape: {X_train.shape}")

<img src="https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/paper-pen.png" width="60" align="left" />  

**Exercise**: Create a fully connected (Dense Layers) neural network model for classifing the MNIST numbers dataset.
The last layer activation **must** be **softmax**. 
1. Create a number of hidden layers (Dense)
2. Specify the activation function for each hidden layer
3. Specify the last layer output for the multi classification problem
4. Compile your model and choose a proper (Check if your `train_labels` are `one-hot-encoded or sparse`)                       
    * *Loss function* - measures how accurate the model is during training, we want to minimize this with the optimizer.
        * "sparse_categorical_crossentropy" if `labels not one hot encoded`
        * "categorical_crossentropy" `if labels one hot encoded`
    * *Optimizer* - how the model is updated based on the data it sees and its loss function.
    * *Metrics* - used to monitor the training and testing steps. "accuracy" is the fraction of images that are correctly classified.

5. Train your model for a number of epochs or using a callback_function (e.g. EarlyStopping)

**Hint**: One-Hot enoded `train_labels` are zero based an look like this: `3 = [0,0,0,1]`
```python
labels_cat = keras.utils.to_categorical(labels,num_classes)
# Example
a = tf.keras.utils.to_categorical([0, 1, 2, 3], num_classes=4)
```

<details>
<summary><b>Click here for one possible solution</b></summary>
    
```python
# Create a Sequantial model
model = keras.Sequential()
# Add an Input layer to the model using the flattened shape of the dataset
model.add(Input(shape=(X_train.shape[1]), name='input'))
# First hidden dense layer with 256 logits
model.add(Dense(256, activation='relu'))
# First hidden dense layer with 256 logits
model.add(Dense(128, activation='relu'))
# Third hidden dense layer with 64 logits
model.add(Dense(64, activation='relu'))
# output a softmax to squash the matrix into output probabilities
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
```
</details>


## Creating the Model
Now we want to create the Fully Connected Network for this classification task. Because we want the evaluate the data normalization we create two different models
1. `model_norm`: Model will be trained with normalized data
2. `model_unnorm`: Model will be trained without normalized input data

In [None]:
def get_model(model_name:str):
    # Resets all state generated by Keras
    keras.backend.clear_session()
    ############ YOUR CODE HERE ############
    
    # Create a Sequantial model
    m = keras.Sequential(name=model_name)
    # Add an Input layer to the model using the flattened shape of the dataset
    m.add(Input(shape=(X_train.shape[1]), name='input'))
    # First hidden dense layer with 256 logits
    m.add(Dense(256, activation='relu'))
    # First hidden dense layer with 256 logits
    m.add(Dense(128, activation='relu'))
    # Third hidden dense layer with 64 logits
    m.add(Dense(64, activation='relu'))
    # output a softmax to squash the matrix into output probabilities
    m.add(Dense(10, activation='softmax'))

    # Compile the model
    m.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return m

# Create the models 
model_norm = get_model("model_norm")
model_unnorm = get_model("model_unnorm")

# Print the Model summary
model_norm.summary()
model_unnorm.summary()

## Train the Model
Next we want to train both model using the the `normalized` and `not_normalized` dataset with same hyperparameters

### Training model with normalized training data

In [None]:
import datetime
# We create a tensorboard callback for model evaluation
logdir = os.path.join("logs_FCN", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

# Create some additional callbacks e.g. EarlyStopping and add it to the callbacks list 
# Press Umschalt + Tab for retrieving function details

# Create normalized traing data
X_train = preprocess_images(train_images)
X_train = X_train.reshape((-1, image_vector_size))

# Start Training
history = model_norm.fit(x=X_train, y=train_labels,  # Training images and labels
                    validation_split=0.1,       # Fraction of the training data to be used as validation data
                    epochs=5,                   # Number of epochs to train the model.
                    batch_size=128,             # Number of samples per gradient update.
                    callbacks=[                 # List of `keras.callbacks.Callback` instances to apply during training
                        tensorboard_callback
                    ],
                    verbose=1                   #  0 = silent, 1 = progress bar, 2 = one line per epoch
                   )

### Training model with unnormalized training data

In [None]:
# Create not normalized training data
X_train_unnorm = train_images.reshape((-1, image_vector_size))

# Start Training
history = model_unnorm.fit(x=X_train_unnorm, y=train_labels,  # Training images and labels
                    validation_split=0.1,       # Fraction of the training data to be used as validation data
                    epochs=5,                   # Number of epochs to train the model.
                    batch_size=128,             # Number of samples per gradient update.
                    callbacks=[                 # List of `keras.callbacks.Callback` instances to apply during training
                        tensorboard_callback
                    ],
                    verbose=1                   #  0 = silent, 1 = progress bar, 2 = one line per epoch
                   )

## Visualize the model with Tensorboard
Now we want to visualize our model using tensorboard. Just execute the next cell and see what happens.

**Hint**: If tensorboard did not start, interrupt the kernel and run the cell again

In [None]:
#%tensorboard --logdir logs --port 6006 --bind_all

## Evaluate the model
Now we want to evaluate our model against the test data (test_images). Therefore we use our trained model and call `model.evaluate()`. The `evaluate()` function returns the loss value & metrics values for the model in test mode.

**Hint:** 
1. Prepare the Test Dataset for `normalized` and `unnormalized model`
1. If you used "one hot" encoded `train_labels` you have to convert the `test_labels` also as "one hot" encoded befor passing to the `evaluate()` function.

In [None]:
print("-"*30)
print(f"Evaluate: {model_norm.name}")
print(f"Testing input shape: {test_images.shape[1:]}")
print(f"Needed Model input shape:{model_norm.input_shape[1:]}")

############ YOUR CODE HERE ############
# Prepare the test_images that is suits the model input shape
X_test = 

test_loss, test_acc = model_norm.evaluate(X_test, test_labels)

print('Test loss:', test_loss)
print('Test accuracy:', test_acc)
print("-"*30)

# --------------------------------------------------------------

print("-"*30)
print(f"Evaluate: {model_unnorm.name}")
print(f"Testing input shape: {test_images.shape[1:]}")
print(f"Needed Model input shape:{model_norm.input_shape[1:]}")

############ YOUR CODE HERE ############
#Prepare the test_images that is suits the model input shape
X_test_unnorm = 
test_loss, test_acc = model_unnorm.evaluate(X_test_unnorm, test_labels)

print('Test loss:', test_loss)
print('Test accuracy:', test_acc)
print("-"*30)

<details>
    <summary> <b>Click here for one possible solution</b></summary>
    
``` python
# Normalized Data
X_test = preprocess_images(test_images)
X_test = X_test.reshape((-1, image_vector_size))

# Unnormalized Data
X_test_unnorm = test_images.reshape((-1, image_vector_size))
```
</details>

As mentioned above. You see that you have higher accuracy for the normalized model `model_norm` when comparing the accuracy. This is due to the fact that the loss converges faster and need less training epochs. 

## Model Prediction
Last but not least we want to check if our model really does what we expect. Therefore we can do some predictions on our model using the `test_images` already prepared in `X_test`.

<img src="https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/paper-pen.png" width="60" align="left" />  

**Exercise**: 

         1. Do a prediction for one random test_image (Dont forget the batch-dimension :-)) and print the output
         2. Plot the test_image, test_label and the network prediction with highest confidence (y_pred_label)
            (Keep in mind you get 10 outputs) --> See **Hint**
         3. Find images in the test_images dataset that are not predicted correctly and plot some
         4. Question: What could we do to decrease the missclassification?

**Hint**: We have a multi class prediction of 10 images.
The maximum value, the value with the highest confidence within the predictions, can be extracted using numpy `argmax` function. `argmax` returns the indices of the maximum values along an axis.

<details>
    <summary> <b>Click here for one possible solution</b></summary>
    
``` python
model = model_norm
    
# 1. Do a prediction for one random test_image
idx = np.random.choice(X_test.shape[0],size=1)
y_pred = model.predict(X_test[idx]) 
#y_pred = model.predict(np.expand_dims(X_test[0],0)) # shape is (1,784)
#y_pred = model.predict(X_test[[0]])                 # shape is (1,784) 
y_true_label = test_labels[idx]
y_pred_label = np.argmax(y_pred,1)
print(f"Pred: {y_pred}")
print(f"True_label: {y_true_label} Pred_label: {y_pred_label}")    
 

# 2. Plot the test_image and the network prediction with highest confidence
plt.figure(figsize=(10,2))
for i,x in enumerate(idx):
    plt.subplot(1,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    
    # Predict the random image
    y_pred = model.predict(X_test[[x]])     # use [x] or np.expand_dims(x,0)
    y_pred_label = np.argmax(y_pred,axis=1)[0] # Get the highest confidence out out the prediction
    y_true_label = test_labels[x]
    
    # Show the image
    plt.imshow(X_test[x].reshape(28, 28), cmap=plt.cm.binary)
    plt.title(f"True: {y_true_label}")
    plt.xlabel(f"Pred:{y_pred_label}")
    
# 3. Find images in the test_images dataset that are not predicted correctly and plot some
# Predict all images
y_pred_label = np.argmax(a=model.predict(X_test),axis=1)

#Get incorred classified predictions
y_pred_idx_incorr = np.nonzero(y_pred_label != test_labels)[0]  

#Get a specific class that is classified incorrect 
y_pred_idx_equals = [a for a in y_pred_idx_incorr if test_labels[a] == 4]

plt.figure(figsize=(10,2))
for i,x in enumerate(y_pred_idx_incorr[3:8]):
    plt.subplot(1,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    
    # Predict the random image
    y_pred = model.predict(X_test[[x]])     # use [x] or np.expand_dims(x,0)
    y_pred_label = np.argmax(y_pred,axis=1)[0] # Get the highest confidence out out the prediction
    y_true_label = test_labels[x]
    
    # Show the image
    plt.imshow(X_test[x].reshape(28, 28), cmap=plt.cm.binary)
    plt.title(f"True: {y_true_label}")
    plt.xlabel(f"Pred:{y_pred_label}")
```
</details>


In [None]:
# Set the model to the normalized_model
model = model_norm

############ YOUR CODE HERE ############



### Evaluation with Confusion matrix
A confusion matrix or error matrix is a table that is often used to describe the performance of a classification model 
(or "classifier") on a set of test data for which the true values are known. The confusion matrix itself is relatively 
simple to understand, but the related terminology can be confusing.

<img src="Pictures/confusion_matrix2.png" width="500px">

What can we learn from this matrix?

* There are two possible predicted classes: `yes` and `no`. If we were predicting the presence of `Covid-19`, for example, `yes` would mean they have the disease, and `no` would mean they don't have the disease.
* The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of Corona).
* Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.
* In reality, 105 patients in the sample have the disease, and 60 patients do not.

Let's now define the most basic terms, which are whole numbers (not rates):

* true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
* true negatives (TN): We predicted no, and they don't have the disease.
* false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")
* false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.")

In [None]:
def plot_confusion_matrix(y_true, y_pred, num_classes):
    from sklearn.metrics import confusion_matrix, accuracy_score
    import itertools
    plt.figure(figsize=(10,6))
    # Build the plot
    cm=confusion_matrix(y_true,y_pred)
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    tick_marks = np.arange(len(num_classes))
    plt.xticks(tick_marks, tick_marks, rotation=45)
    plt.yticks(tick_marks, tick_marks)

    accuracy = accuracy_score(y_true, y_pred)
    misclass = 1. - accuracy
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], 'd'),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.title(f"Confusion Matrix \naccuracy={accuracy:0.4f}; misclass={misclass:0.4f}")
    plt.tight_layout()

In [None]:
# Predict all Test images
model = model_norm
y_pred = model.predict(X_test)
y_true_labels = test_labels
y_pred_labels =  np.argmax(y_pred,1)# YOUR_TURN
print("y_true_labels shape:",y_true_labels.shape)
print("y_pred shape:",y_pred.shape)
print("y_pred_labels shape:",y_pred_labels.shape)

plot_confusion_matrix(y_true_labels,
                      y_pred_labels,
                      np.unique(y_true_labels))

# Part 2: Classification of MNIST with Convolutional Neural Networks

Next, let's build a convolutional neural network (CNN) classifier to classify images of handwritten digits in the MNIST dataset with a twist where we test our classifier on high-resolution hand-written digits from outside the dataset.

### Build the model
<img src="https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/paper-pen.png" width="60" align="left" />  

**Exercise:** Now build a very simple Convolutional neural network that looks like this: 
<img src="Pictures/Class_CNN.png" width="700px" > 

1. Try to figure out the `filter` and the `pooling sizes` yourself using the picture above. All the activations inside the layers should be 'relu' and the last dense layer again 'softmax'
2. Compile your model and choose a proper                        
    * *Loss function* - measures how accurate the model is during training, we want to minimize this with the optimizer.
        * "sparse_categorical_crossentropy" if labels not one hot encoded
        * "categorical_crossentropy" if labels one hot encoded    
    * *Optimizer* - how the model is updated based on the data it sees and its loss function.
    * *Metrics* - used to monitor the training and testing steps. "accuracy" is the fraction of images that are correctly classified.
3. Train your model for a number of epochs or using a callback_function (e.g. EarlyStopping)

**Hint**: To encode the labels as "one hot" (categorical) use 
```python
keras.utils.to_categorical(labels,num_classes)
```

<details>
<summary><b>Click here for one possible solution</b></summary>
    
```python
model = keras.Sequential()
# 32 convolution filters used each of size 3x3
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
# choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
# 64 convolution filters used each of size 3x3
model.add(Conv2D(64, (3, 3), activation='relu'))
# choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
# flatten since too many dimensions, we only want a classification output
model.add(Flatten())
# fully connected to get all relevant data
model.add(Dense(128, activation='relu'))
# one more dropout
# output a softmax to squash the matrix into output probabilities
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
```
</details>


In [None]:
# Resets all state generated by Keras
keras.backend.clear_session()

############ YOUR CODE HERE ############

# Print model summary
model.summary()

## Train the model

Training the neural network model requires the following steps:

1. Feed the training data to the model—in this example, the `train_images` and `train_labels` arrays.
2. The model learns to associate images and labels.
3. We ask the model to make predictions about a test set—in this example, the `test_images` array. We verify that the predictions match the labels from the `test_labels` array. 

To start training,  call the `model.fit` method—the model is "fit" to the training data:

In [None]:
import datetime
# We create a tensorboard callback for model evaluation
logdir = os.path.join("logs_CNN", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

# Create some additional callbacks e.g. EarlyStopping and add it to the callbacks list 
# Press Umschalt + Tab for retrieving function details

# YOUR_TURN: Create normalized traing data
X_train = preprocess_images(train_images)
X_test = preprocess_images(test_images)

# Start training
history = model.fit(x=X_train, y=train_labels,  # Training images and labels
                    validation_split=0.1,       # Fraction of the training data to be used as validation data
                    epochs=5,                   # Number of epochs to train the model.
                    batch_size=128,             # Number of samples per gradient update.
                    callbacks=[                 # List of `keras.callbacks.Callback` instances to apply during training
                        tensorboard_callback
                    ],
                    verbose=1                   #  0 = silent, 1 = progress bar, 2 = one line per epoch
                   )


## Visualize the model with Tensorboard
Now we want to visualize our model using tensorboard. Just execute the next cell and see what happens.

**Hint**: If tensorboard did not start, interrupt the kernel and run the cell again

In [None]:
#%tensorboard --logdir logs --port 6007 --bind_all

As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 98.68% on the training data.

## Evaluate the model
Now we want to evaluate our model against the test data (test_images). Therefore we use our trained model and call `model.evaluate()`. The `evaluate()` function returns the loss value & metrics values for the model in test mode.

**Hint:** If you used "one hot" encoded `train_labels` you have to convert the `test_labels` also as "one hot" encoded befor passing to the `evaluate()` function.

In [None]:
print(f"Testing input shape: {test_images.shape[1:]}")
print(f"Needed Model input shape:{model.input_shape[1:]}")

# YOUR_TURN: Prepare the test_images that is suits the model input shape
#X_test = 

print(f"Testing 1D input shape: {X_test.shape}")
print("-"*30)

test_loss, test_acc = model.evaluate(X_test, test_labels)

print('Test loss:', test_loss)
print('Test accuracy:', test_acc)

### Evaluation with Confusion matrix
Evaluate the CNN with the confusion matrix. Do you see any differences to the FCN?

In [None]:
# Predict all Test images
y_pred = model.predict(X_test)
y_true_labels = test_labels
y_pred_labels =  np.argmax(y_pred,1)# YOUR_TURN
print("y_true_labels shape:",y_true_labels.shape)
print("y_pred shape:",y_pred.shape)
print("y_pred_labels shape:",y_pred_labels.shape)

# YOUR_TURN: Plot the Confusion matrix
plot_confusion_matrix(y_true_labels,
                      y_pred_labels,
                      np.unique(y_true_labels))

Often times, the accuracy on the test dataset is a little less than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*. In our case, the accuracy is better at 99.19%! This is, in part, due to successful regularization accomplished with the Dropout layers.

## Make predictions

With the model trained, we can use it to make predictions about some images. 

In [None]:
def plot_image_with_bar(preds, img, with_bar=True):    
    from mpl_toolkits.axes_grid1.axes_divider import make_axes_locatable
    
    INCLUDED_LABELS = np.unique(test_labels)
    
    def autolabel(rects,ax):
        """Attach a text label above each bar in *rects*, displaying its height."""
        for rect in rects:
            height = rect.get_height()
            perc = height*100
            ax.annotate('{:2.2f}%'.format(perc),
                        xy=(rect.get_x() + rect.get_width() / 2, height),
                        xytext=(0, 3),  # 3 points vertical offset
                        textcoords="offset points",
                        ha='center', va='bottom')    
    # Get the current axis
    f = plt.gcf()
    f.set_size_inches(10,10)
    ax = plt.gca()

    font_size=16

    '''Plot the current image'''
    ax.imshow(img, cmap=plt.cm.binary)
    color = 'red'
    
    predicted_label = np.argmax(preds)
    ax.set_xlabel("Pred: {} {:2.2f}% ".format(INCLUDED_LABELS[predicted_label],
                                100*np.max(preds)),
                                color=color, fontsize=font_size)
    
    ax.grid(False)
    ax.set_xticks([])
    ax.set_yticks([])
    
    ''' Create an divider and add the bar to the right'''   
    if with_bar:
        ax_divider = make_axes_locatable(ax)
        cax = ax_divider.append_axes("right", size="170%", pad="2%")
        ax.grid(False)
        
        ''' Plot the distribution'''
        pred_plot = cax.bar(INCLUDED_LABELS, preds, color="#777777")
        
        
        ''' Here we check the if the indices are correct'''        
        pred_plot[predicted_label].set_color('red')       
        
        cax.set_xticks(np.arange(len(INCLUDED_LABELS)))
        cax.set_xticklabels(INCLUDED_LABELS,rotation=45,fontsize=font_size)
        cax.set_yticks([])
        cax.set_ylim(0.0,1.1)

        # put the y-values on top of the bar
        autolabel(pred_plot,cax) 

# Try it with your own Image
<img src="https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/paper-pen.png" width="60" align="left" />  

**Exercise:** You can now try to predict on your images. Open `template.png` in `Test_Pictures` folder with paint for example and draw a number between 0 and 9. You then can read it into the Notebook by putting it in your workshop folder. 
Keep in mind that we trained our Neural Network on Images with the shape (28,28,1), you might have to resize your own image. (or use the preprocess_images function further up in the script ;) )

1. Save your image as "your_name".png to `Test_Pictures` folder and
2. What do the confidence tell you?
3. What happens if you use switch colors (Black background, white digit)
4. What happens if you load an blank (white image)
3. Share your images with your colleagues. What are their predictions

In [None]:
# Add your image name 
image_filename = "kai.png"

# Read and preprocess the image
test = cv2.imread("Test_Pictures/"+image_filename)
test = preprocess_images(test)
test = np.atleast_3d(test)  # Add channel dimension

# Predict your image
preds = model.predict(np.expand_dims(test,axis=0))[0] # Add Batch Dimension

# Plot the image with all confidences
plot_image_with_bar(preds,test)
# #print(f"Your picture shows the Number {np.argmax(preds)} with {preds[0,np.argmax(preds)]} %")

# Where is the network looking at?

What parts of the images are interesting for the network class prediction? 
The inside of the network is one of the most interesting parts. Now we want to do a small vizualization of the intermediate layer outputs by calculating the gradients of the model output with respect to the layer.
We call this the `class_activation_map (CAM)` of the network.

<div class="row">
  <div class="column">
    <img src="Pictures/gradcam1.png" width="500" height="400" align="center" style="width:60%">
  </div>
</div>
<div class="row">
  <div class="column">
    <img src="Pictures/gradcam_chest.png" width="500" height="400" align="left" style="width:50%">
  </div>
   <div class="column">
    <img src="Pictures/gradcam_gp.png" width="500" height="400" align="center" style="width:50%">
    </div>
  </div>
</div> 



## Class Activation Map (CAM)
Class activation maps are a simple technique to get the discriminative image regions used by a CNN to identify a specific class in the image. In other words, a class activation map (CAM) lets us see which regions in the image were relevant to this class.

In [None]:
tf.keras.backend.clear_session()

# Create a random image
#idx = np.random.choice(X_test.shape[0],1) 4132=6
idx = 4132
image = test#X_test[idx] # YOUR_TEST_IMAGE 
if image.ndim < 4:
    image = keras.preprocessing.image.img_to_array(image)
    image = np.expand_dims(image, axis=0)

upsample_shape = (300,300)


# We are creating a new model that is able to predict intermediate layer outputs as well as the model output
conv_layer = model.get_layer("conv2d_1")
cam_model = tf.keras.models.Model([model.inputs], [conv_layer.output, model.output])


# Now we calulate the gradient of our network loss with w.r.t. to the layer output
with tf.GradientTape() as tape:
    inputs = tf.cast(image, tf.float32)
    (convOuts, preds) = cam_model(inputs)  # preds after softmax
    loss = preds[:, np.argmax(preds[0])]

# This is the gradient of the top predicted class with regard to
# the output feature map of the last conv layer
grads = tape.gradient(loss, convOuts)

# This is a vector where each entry is the mean intensity of the gradient
# over a specific feature map channel
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))

# Remove the batch dimension
convOuts = convOuts[0]

# Now we can compute the CAM by multiplying the pooled_grads with the Convolution Outputs
cam = tf.reduce_mean(tf.multiply(pooled_grads, convOuts), axis=-1)

# grads = grads[0]
# #Normalize the gradients between 
# #norm_grads = tf.divide(grads, tf.reduce_mean(tf.square(grads)) + tf.constant(1e-5))
# norm_grads = tf.divide(grads, (tf.sqrt(tf.reduce_mean(tf.square(grads))) + 1e-5))
# #Apply Global Averave Pooling Technique
# pooled_grads = tf.reduce_mean(norm_grads, axis=(0, 1))
# # Now we can compute the CAM by multiplying the pooled_grads with the Convolution Outputs
# cam = tf.reduce_sum(tf.multiply(pooled_grads, convOuts), axis=-1)


# Apply ReLU on cam data
cam = np.maximum(cam, 0)
if np.max(cam) != 0:
    cam = cam / np.max(cam)
#cam = cam.transpose((1,2,0))

# Resize the CAM and convert the CAM to 3D a
cam = cv2.resize(cam, upsample_shape, interpolation=cv2.INTER_LINEAR)
cam = np.expand_dims(cam, axis=2)
cam = np.tile(cam, [1,1,3])

# Create a heatmap out of the CAM data and give a new colormap
heatmap = np.uint8(255 * cam)
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)

''' Now we want to do an overlay of the heatmap to our image'''

# Resize the image (need to squeeze/reshape because of the dimensions)
image_resized = cv2.resize(image.squeeze(), upsample_shape, interpolation=cv2.INTER_LINEAR) #(300,300)
image_resized = np.expand_dims(image_resized, axis=2)
#image_resized = np.tile(image_resized, [1,1,3])
image_resized = np.uint8(255*image_resized)
#image_resized = cv2.cvtColor(image_resized, cv2.COLOR_GRAY2BGR)

# Create the superimposed image 
superimposed = heatmap*0.5 + image_resized*0.3
superimposed = np.uint8((255*superimposed)/superimposed.max())
superimposed = cv2.cvtColor(superimposed, cv2.COLOR_RGB2BGR)

# Plot the images
plt.figure(figsize=(10,3))
plt.subplot(1,2,1)
plt.imshow(image_resized, cmap=plt.cm.binary)
plt.colorbar()
plt.axis("off")
plt.title("Image")

plt.subplot(1,2,2)
plt.imshow(superimposed,cmap=plt.cm.RdBu_r)
plt.colorbar()
plt.axis("off")
plt.title("CAM")
plt.show()

## Guided Backpropagation (GBP)
Idea: neurons act like detectors of particular image features
* We are only interested in what image features the neuron detects, not in what kind of stuff it doesn’t detect
* So when propagating the gradient, we set all the negative gradients to 0 (ReLU)
* We don’t care if a pixel “suppresses” a neuron somewhere along the part to our neuron

Thus we want to calulate a GBP vor vizualizing the `saliency` (Strahlung) of the nework

In [None]:
def deprocess_image(x):
    """Same normalization as in:
    https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py
    """
    # normalize tensor: center on 0., ensure std is 0.25
    x = x.copy()
    x -= x.mean()
    x /= (x.std() + tf.keras.backend.epsilon())
    x *= 0.25

    # clip to [0, 1]
    x += 0.5
    x = np.clip(x, 0, 1)

    # convert to RGB array
    x *= 255

    x = np.clip(x, 0, 255).astype('uint8')
    return x

In [None]:
tf.keras.backend.clear_session()
tf.compat.v1.reset_default_graph()
gbp_model = None
gbp_model = tf.keras.models.Model([model.inputs], [conv_layer.output])


# @tf.custom_gradient
# def guidedRelu(x):
#     def grad(dy):
#         return tf.cast(dy > 0, "float32") * tf.cast(x > 0, "float32") * dy

#     return tf.nn.relu(x), grad

# layer_dict = [layer for layer in gbp_model.layers[1:] if hasattr(layer, "activation")]
# for layer in layer_dict:
#     if layer.activation == tf.keras.activations.relu:
#         layer.activation = guidedRelu

# Get the Gradient of the input image w.r.t the conv_output
with tf.GradientTape() as tape:
    inputs = tf.cast(image, tf.float32)
    tape.watch(inputs)
    convOuts = gbp_model(inputs)
    
grads_gb = tape.gradient(convOuts, inputs)[0]

# # Zero Out negative values

grads_gb = np.maximum(grads_gb, 0) # Same as Relu
# grads_gb = grads_gb / np.max(grads_gb)


# Resize the gradients to match the heatmap
saliency_resized = cv2.resize(np.asarray(grads_gb).squeeze(), upsample_shape)
saliency_resized = np.expand_dims(saliency_resized, 2)
#saliency_resized = np.tile(saliency_resized, [1,1,3]) # Not necessary but now all have the same shapes

# Now we create the guided backprop by multiplying the saliecy (gradients) with the heatmap
guided_backprop = saliency_resized * heatmap
#guided_backprop = deprocess_image(guided_backprop)
guided_backprop = np.uint8((255*guided_backprop)/guided_backprop.max())
#guided_backprop = cv2.normalize(guided_backprop, guided_backprop, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
guided_backprop = cv2.cvtColor(guided_backprop, cv2.COLOR_BGR2RGB)


# Plot the images
plt.figure(figsize=(15,3))
plt.subplot(1,3,1)
plt.imshow(image_resized, cmap=plt.cm.binary)
plt.colorbar()
plt.axis("off")
plt.title("Image")

plt.subplot(1,3,2)
plt.imshow(superimposed,cmap=plt.cm.RdBu_r)
plt.colorbar()
plt.axis("off")
plt.title("CAM")

plt.subplot(1,3,3)
plt.imshow(guided_backprop,cmap=plt.cm.RdBu_r)
plt.colorbar()
plt.axis("off")
plt.title("GBP")
plt.show()