<a href="https://colab.research.google.com/github/schedldave/cv2022/blob/main/08_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 08 - CNN

## Dr. David C. Schedl

Note: this tutorial is geared towards students **experienced in programming** and aims to introduce you to **Tensorflow and CNNs**.

This notebook has **not been tested** to work on a **local** Python installation.
Therefore, it is highly recommended to run this notebook **on Google Colab**, since it relies on Tensorflow and some related libraries that run flawlessly there. 

For training, it is recommended to use a **GPU**. In Google Colab go to the menu and select **Edit** -> **Notebook settings** -> **Hardware accelerator** -> switch to **GPU**.

Useful links:
* [Tensorflow API documentation](https://www.tensorflow.org/api_docs/python/tf)


#### Acknowledgements
The code of this tutorial is based on a notebook from [datahacker.rs](https://datahacker.rs/lenet-5-implementation-tensorflow-2-0/).


# Initialization

As always let's import useful libraries, first. 
We will work with TensorFlow and MNIST today. 


In [None]:
%%capture 
# use %% capture suppress any output

# make sure to use tensorflow 2.x
%tensorflow_version 2.x

import numpy as np
import matplotlib.pyplot as plt
import datetime
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.keras import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras.layers import Dense, Flatten, Conv2D, AveragePooling2D, MaxPooling2D

from tensorflow.keras import datasets
from tensorflow.keras.utils import to_categorical

from __future__ import absolute_import, division, print_function, unicode_literals


In [None]:
# Load the TensorBoard notebook extension to display training results later
%load_ext tensorboard

# LeNet-5 and MNIST

## Preparing the Dataset

In [None]:
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data() # it would be pretty easy to change the dataset here! :)

In [None]:
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print(x_train[0].shape, 'image shape')

# show example image(s)
plt.subplot(121), plt.imshow(x_train[0], cmap=plt.cm.gray_r), plt.title(f'train: {y_train[0]}')
plt.subplot(122), plt.imshow(x_test[1], cmap=plt.cm.gray_r), plt.title(f'test: {y_test[0]}')
plt.show()

In [None]:
# Add a new axis to reflect that we have grayscale data
x_train = x_train[:, :, :, np.newaxis]
x_test = x_test[:, :, :, np.newaxis]

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print(x_train[0].shape, 'image shape')

In [None]:
# Convert class vectors to binary class matrices.
# For example a test label of 7 is converted to [0, ... 0, 1, 0, 0]
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

In [None]:
# Data normalization 
# Make sure we use floating point images and a range of 0 to 1
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

## The Model: LeNet-5

We will implement the LeNet-5 architecture as proposed by Lecun et al. 
The layer options in `[]` are the ones originally proposed in the paper (legacy). We add the option to also use a slightly modified variant.


| Layer                          | Output Size  | Weight Size     |
| ------------------------------ | ------------ | --------------- |
| Input                          | 1 x 28 x 28  |                 |
| Conv (Cout\=20 [6], K=5, P=2, S=1) | 20 x 28 x 28 | 20 x 1 x 5 x 5  |
| ReLU [Sigmoid]                          | 20 x 28 x 28 |                 |
| MaxPool(K=2, S=2)              | 20 x 14 x 14 |                 |
| Conv (Cout\=50 [16], K=5, P=2, S=1) | 50 x 14 x 14 | 50 x 20 x 5 x 5 |
| ReLU [Sigmoid]                           | 50 x 14 x 14 |                 |
| MaxPool(K=2, S=2)              | 50 x 7 x 7   |                 |
| Flatten                        | 2450         |                 |
| Linear (2450 -> 500)           | 500          | 2450 x 500      |
| ReLU [Sigmoid]                          | 500          |                 |
| Linear (500 -> 10)             | 10           | 500 x 10        |

In [None]:
# LeNet-5 model
class LeNet(Sequential):
  def __init__(self, input_shape, nb_classes, legacy=True):
    super().__init__()

    activation = 'sigmoid' if legacy else 'relu'

    self.add(Conv2D(6 if legacy else 20, kernel_size=(5, 5), strides=(1, 1), activation=activation, input_shape=input_shape, padding="same"))
    self.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'))
    self.add(Conv2D(16 if legacy else 50, kernel_size=(5, 5), strides=(1, 1), activation=activation, padding='same'))
    self.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'))
    self.add(Flatten())
    self.add(Dense(500, activation=activation))
    self.add(Dense(nb_classes, activation='softmax'))

    self.compile(optimizer='adam',
                loss=categorical_crossentropy,
                metrics=['accuracy'])

In [None]:
# create a model
model = LeNet(x_train[0].shape, num_classes, legacy = False)

# print a summary of its structure and parameters
model.summary()

## Training

For training we will use the `tf.keras.Model.fit()` method.
Furthermore, we can monitor the training progress with Tensorboard. Tensorboard is a tool that allows us to visualize the training progress based on the log files created during training.

In [None]:
# Training Preparation

# Place the logs in a timestamped subdirectory
# This allows to easy select different training runs
# In order not to overwrite some data, it is useful to have a name with a timestamp
log_dir="logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
# Specify the callback object
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

# tf.keras.callback.TensorBoard ensures that logs are created and stored
# We need to pass callback object to the fit method
# The way to do this is by passing the list of callback objects, which is in our case just one

In [None]:
model.fit(x_train, y=y_train, 
          epochs=20, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback],
          verbose=True)

In [None]:
%tensorboard --logdir logs/fit
# open the TensorBoard here!

## Inference

For inference we will use the `tf.keras.Model.predict()` method. 
It returns the predicted values for all 10,000 test images. 

In [None]:
# Inference for the 10,000 test images:
prediction_values = model.predict(x_test)

### Displaying Results

First let's display the first 50 images in the MNIST dataset and let's verify what our model predicted. 
For each image we get 10 propabilities for how sure the network is that it sees a certain digit.
If training worked well, our model should correctly classify the first 50 digits and give the highest score for the correct labels.

In [None]:
# set up a figure
fig = plt.figure(figsize=(15, 7))
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)

# plot the images: each image is 28x28 pixels
for i in range(50):
  ax = fig.add_subplot(5, 10, i + 1, xticks=[], yticks=[])
  ax.imshow(x_test[i,:].reshape((28,28)),cmap=plt.cm.gray_r, interpolation='nearest')

  prediction_label = np.argmax(prediction_values[i])

  img_text = f'{prediction_label} [{np.argmax(y_test[i])}]'
  
  if prediction_label == np.argmax(y_test[i]):
    # label the image with the blue text
    ax.text(0.1, 0.1, img_text, color='green', transform=ax.transAxes)
    ax.tick_params(color='green', labelcolor='green')
    for spine in ax.spines.values():
        spine.set_edgecolor('green')
  else:
    # label the image with the red text
    ax.text(0.1, 0.1, img_text, color='red', transform=ax.transAxes)
    ax.tick_params(color='red', labelcolor='red')
    for spine in ax.spines.values():
        spine.set_edgecolor('red')

### What did not work?

Let's also look at what did not work! 
We compute the difference between the ground truth (`y_test`) and the predicitons (`prediction_values`) and look at the pairs with the highest difference.  

In [None]:
diff = np.linalg.norm( prediction_values - y_test, axis=1 )

# set up the figure
fig = plt.figure(figsize=(15, 7))
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)

count = 0

for id in np.flip(np.argsort(diff)):
  #print(np.argmax(y_test[id]))
  #print(np.argmax(prediction_values[id]))

  ax = fig.add_subplot(5, 10, count + 1, xticks=[], yticks=[])
  ax.imshow(x_test[id,:].reshape((28,28)),cmap=plt.cm.gray_r, interpolation='nearest')

  prediction_label = np.argmax(prediction_values[id])

  img_text = f'{prediction_label} [{np.argmax(y_test[id])}]'
  
  if prediction_label == np.argmax(y_test[id]):
    # label the image with the blue text
    ax.text(0.1, 0.1, img_text, color='green', transform=ax.transAxes)
    ax.tick_params(color='green', labelcolor='green')
    for spine in ax.spines.values():
        spine.set_edgecolor('green')
  else:
    # label the image with the red text
    ax.text(0.1, 0.1, img_text, color='red', transform=ax.transAxes)
    ax.tick_params(color='red', labelcolor='red')
    for spine in ax.spines.values():
        spine.set_edgecolor('red')

  count += 1
  if count >= 50:
    break

plt.show()

### ⌨️ Try it yourself: Compute the Numbers!

How many digits have been classified wrong and how many correct?
Compute the numbers (also in percentage), it is a good way to evaluate the quality of the network. 

In [None]:
# Todo: compute the numbers

## ⌨️ Try it yourself: Compare the legacy LeNet-5 to our modern variant!

We ran our script with the modernized LeNet-5 version. How does the original LeNet-5 implementation perform in comparison? 
How many digits get classified wrong in both versions?



In [None]:
# Todo: run the legacy LeNet-5