<a href="https://colab.research.google.com/github/mtwenzel/image-video-understanding/blob/master/Session_1_CNN_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### This is a Jupyter notebook.  

## General functions (of Jupyter environments)
The most important keyboard shortcuts (cf. the "Help" menu) are
* **cursor keys** to select cells
* **Enter** to go from command mode to edit mode (for changing cell contents)
  * (**Esc** would go back to command mode.)
* **Shift+Enter** to *execute and advance* a cell
  * While experimenting with different values in the same cell, **Ctrl+Enter** is also handy, which executes but does not advance the cursor.
  * **ALT+Enter** will execute and advance to a **new** inserted cell.
* There is an edit mode with a green bar to the left, and a execution/command mode with a blue bar.
* In command mode, some keys have a function:
    * `l`: toggle line numbers
    * `a`: new cell above 
    * `b`: new cell below
    * `h`: help / see more keyboard shortcuts
    
## Google Colab extra functions are provided:

* Cells can **hide the code**. This is the case for the "Imports" cell below. Double-clicking still gets you to the code directly.
* Cells can provide convenient **parameter interfaces**, like drop-down lists, sliders, and input fields. You will see this in the "Initialize random data" cell below. Again, double-clicking brings up the code.
* Cells can be run automatically. For this, they have a special first line: ```#@title Cell title {run:"auto"}```. The "Initialize Random Data" cell is such a cell.



# Image and Video Understanding -- Session 1 (Classification)


## 1. First Experiments with Random Data
* Start by importing some required python modules that implement the layers we will use to build the network. 
* We also need a "container" to connect the layers: the "Model"

In [None]:
#@title Imports
#@markdown To edit the imports, double-click on the cell

from tensorflow.keras.layers import InputLayer, Conv2D, MaxPool2D, Flatten, Dense, UpSampling2D, LocallyConnected2D, Dropout
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras import optimizers

import numpy as np

### Create random data

In these examples, we'll use artificial data first, and then switch to real data.

Run the code in the following box, which will create a pair of input data `x_train` and corresponding output data `y_train` for training.

In [None]:
#@title Initialize random data {run:"auto"}
#@markdown This cell needs to be executed only once. When you change parameters later, it auto-executes.
#@markdown
#@markdown ### Create random data sampled from uniform distribution.
#@markdown
#@markdown Set the desired number of instances
NUM_INSTANCES = 100 #@param {type:'slider', min:0, max:10000, step:100}
#@markdown Set the desired number of features (random from uniform distribution)
NUM_FEATURES = 1000 #@param {type:'slider', min:0, max:10000, step:100}
x_train = np.random.random((NUM_INSTANCES, NUM_FEATURES)) 
y_train = np.zeros((NUM_INSTANCES,)) # Label vector (initialized with 0s)
y_train[:int(NUM_INSTANCES/2)] = 1 # set first half of vector to 1

#@markdown The cell produces variable `x_train` and `y_train`. `x_train` contains the set number of instances, with the set number of features. `y_train` contains labels, with the first half `1`, and the second `0`.

### Define the model

In [None]:
model = Sequential() # We choose a simple sequential model without branching
model.add(InputLayer(input_shape = (NUM_FEATURES,)))
#@markdown Play with the number of units (==neurons)
model.add(Dense(units=256, name="Hidden")) 

#@markdown Optionally increase the number of layers.
#model.add(Dense(units=128))
#model.add(Dense(units=64))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adadelta')
model.summary()
# @markdown If only interested in the number of parameters, use this:
# @markdown `print(f"Model parameters: {model.count_params()}")`

### Training the network

In [None]:
history = model.fit(x_train, y_train, batch_size=10, epochs=100)
#@markdown Clicking left to the output once will change the display mode from a scrollable field to a full display and back. Double-clicking it collapses it, so it is not so dominant.
#@markdown In Google Colab, you can savely `x` the output with a click in the top left corner. This removes the printout, but not the cell results.

#### Investigate the "history" object you created
* Try out the following commands and inspect the variables.
* Make use of tab completion, e.g. by typing `hidden_layer.<tab>`

In [None]:
loss_history = history.history['loss']
weights = history.model.get_weights()
hidden_layer = history.model.get_layer("Hidden")
for w in weights:
    print(w.shape)

We want to display the loss function. Below, we display the learning success as measured by the loss.
* `matplotlib` is a python package well suited plotting data and displaying images. Let's import it.
* Then, plot the loss curve "inline" in this notebook.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
fig,ax = plt.subplots(figsize=(18, 4), dpi= 80, facecolor='w', edgecolor='k')
ax.plot(history.history['loss'])
ax.set_xlabel('Epoch')
ax.set_ylabel('Loss')
ax.grid()
plt.show()

#### Optional task: use a validation set

In addition to the training loss, we can also compute the loss on a validation set. For this purpose, we can define a designated validation set or use a fraction of the training set. Check the docs to find out, how to define validation data: https://keras.io/api/models/model_training_apis/#fit-method

After retraining your model, try to find out how to read the validation loss from the history and plot it together with the training loss.

In [None]:
# your code goes here...

In [None]:
#@title Solution
#@markdown Expand code and set "run_solution = True" to see and run the solution.

run_solution = False # set to True to run solution

if run_solution:
  # we added the validation_split
  history = model.fit(x_train, y_train, batch_size=10, epochs=100, validation_split=0.2)
                      
  # alternative, when you have a separate set x_val and y_val
  #history = model.fit(x_train, y_train, batch_size=10, epochs=100, validation_data=(x_val,y_val))

  fig,ax = plt.subplots(figsize=(18, 4), dpi= 80, facecolor='w', edgecolor='k')
  ax.plot(history.history['loss'],label='Training')
  # we access the validation loss
  ax.plot(history.history['val_loss'],label='Validation')
  ax.set_xlabel('Epoch')
  ax.set_ylabel('Loss')
  ax.grid()
  ax.legend()
  plt.show()
else:
  print('Solution is hidden, try for yourself first...')

del run_solution


### A remark on optimization
* Optimizers like SGD, ADAM, ADAGrad ADADelta etc. are variants of Stochastic Gradient Descent (SGD).
* SGD estimates the gradient for parameters based on a batch of examples.
    * The larger the batch, the better the estimated gradiend approximates the gradient for the whole dataset.
* It takes about 300 epochs to converge when creating 1000 instances.

### Quiz: Interpreting the result
* What can you observe regarding the loss?
* Why is that possible?
* Change the number of training instances to 1000. Assure that the classes are equally frequent again. What can you observe?
* Be reminded that you have to re-create the model to reset the weights. To do this, execute the cell with the model definition (important is the `model.compile()` call)

## 2. Image Classification: _MNIST handwritten digits_

### Read the data

* We want to work on images: MNIST, which we import and load next.
* You can import them from Keras with one line, because it is one of the standard datasets used for machine learning.

In [None]:
#@title Import MNIST data
#@markdown If you execute this cell, you will overwrite the data above. In addition, it gives you test data in `x_test` and `y_test`.
from tensorflow.keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# reduce data by factor 10 / 20 for fast execution during course
x_train = x_train[::10]
y_train = y_train[::10]
x_test = x_test[::20]
y_test = y_test[::20]

# verify resulting array shapes
x_train.shape, y_train.shape, x_test.shape, y_test.shape

### Inspecting the data

Look at the shape of the `x_train` variable to understand how the data is organised.
* You can see that the data has 60.000 examples, each of shape 28x28.
* These are images... of size 28x28 pixels.
* The corresponding output is just a long vector of corresponding labels in the range [0...9].

In [None]:
# Inspect the shape of x_train
print(x_train.shape, y_train.shape, y_train.min(), y_train.max())

As we are dealing with *images* now, we want to display them.
* `matplotlib` can also display images. We don't need to import it again.
* Now, load and display one of the images "inline" in this notebook.

In [None]:
# Look at an image
plt.imshow(x_train[600], cmap='gray')

Also, we could be interested in the distribution of labels in our data:

In [None]:
n, bins, patches = plt.hist(y_train, facecolor='#2ab0ff', edgecolor='#e0e0e0', linewidth=0.5, alpha=0.7)
n = n.astype('int') # it MUST be integer
n_range=max(n)-min(n)
# Good old loop. Choose colormap of your taste
for i in range(len(patches)):
    patches[i].set_facecolor(plt.cm.autumn_r((n[i]-min(n))/n_range))
plt.grid(b=False, axis="x")
plt.show()

### Preparing the labels for a classification network
We want to convert the numeric labels to so-called *"one-hot vectors"*.
* One-hot means that the network does not directly output a number between 0 and 9 representing the digit.
* Rather, we want a vector with 10 dimensions, in which only one entry is 1, all others 0, e.g. `[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]` to label a "2".
* *Rationale:* The digits represent different categorical classes, and we want to penalize all confused digits the same; it is not "better" or "closer" if the network outputs 4.2 given an image depicting a "6" than if the output is 1.
* In general, the one-hot encoding helps with classification problems and allows to let the neuron with maximal activation "win".

In [None]:
num_labels = 10
# Code to convert labels
y_train_one_hot = (np.arange(num_labels) == y_train[:,np.newaxis]).astype(np.float32)

In [None]:
# Keras offers a convenience function to achieve the same:
from tensorflow.keras.utils import to_categorical
y_train_one_hot = to_categorical(y_train, num_classes=num_labels)
# Same for the testing data
y_test_one_hot  = to_categorical(y_test, num_classes=num_labels)

### Image classification with a simple neural network
We now want to train the above network on this data. We have to adapt it to use inputs of a different shape, and to produce vector outputs. We have prepared this below:
* Modified the parameter `input_shape=(...)` to adapt to the new data
* Modified the number of dense units in the output layer to reflect the number of classes; 10 in the digits example
* Modified the loss function to deal with multiple classes

In [None]:
model = Sequential()
model.add(InputLayer(input_shape=(28,28)))
model.add(Flatten()) # Layer reshaping the 28x28 arrays into vectors of length 28*28=784
model.add(Dense(units=256)) # Try higher or lower numbers of hidden units!
# Try adding more layers!
model.add(Dense(units=128))
model.add(Dropout(0.5))
model.add(Dense(units=10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta')
model.summary()

In [None]:
# This experiments takes about 1 sec per epoch on an older MacBook Pro.
history = model.fit(x_train, y_train_one_hot, batch_size=500, epochs=100) # In this example, you'll no longer want batches of size 10...

### Evaluate the model on independent test data
* The following cell executes the model on the test data
* The result is a list of 10-vectors (recall the on-hot encoding), only this time there are also values between 0 and 1.
* How can we compare these with the true labels in `y_test_one_hot`? There are many possible ways to evaluate classifiers; in general, you want to define some kind of error, usually based on differences.

In [None]:
pred = model.predict(x_test)
print(x_test.shape, pred.shape)
print(pred[0])

The `argmax()` function may come in handy, which converts from the one-hot representation back to integer indices of the maximally activated classes:

In [None]:
pred.argmax(axis = -1)

**Exercise**: Let's have a look at some images together with their true label and predicted label. We can adapt the code from above...

*Optional task*: Write a for loop to iterate over several images and plot only the images for a certain digit and/or with wrong classification - do you notice anything? (Maybe try to retrain your network for more epochs (e.g. 1000) before doing this.)

In [None]:
k = 0 # choose an image

plt.imshow(x_test[k], cmap='gray')

# your turn
true_label = ... # To do
predicted_label = ... # To do

plt.title(f'True label: {true_label}\nPredicted label: {predicted_label}')
plt.show()

In [None]:
#@title Solution
#@markdown Expand code and set "run_solution = True" to see and run the solution.

run_solution = False # set to True to run solution

if run_solution:

  chosen_digit = 5 # choose a digit you want to check
  max_examples = 3 # maximum number of examples to plot

  counter = 0
  for k in range(1000):
    true_label = y_test[k]
    predicted_label = pred[k].argmax(axis = -1)

    # plot only wrong classifications
    if true_label == chosen_digit and true_label != predicted_label:
      counter += 1
      plt.imshow(x_test[k], cmap='gray')
      plt.title(f'True label: {true_label}\nPredicted label: {predicted_label}')
      plt.show()
    if counter == max_examples:
      break # max number of examples reached

else:
  print('Solution is hidden, try for yourself first...')

del run_solution


## 3. Image classification with a simple convolutional neural network (CNN)
For an introduction into convolutional layers, see course notes.

### Weight sharing
Exploring convolutions with and without weight sharing.
* First, your input data now needs to have a "channel" dimension, as the convolutional filter result will be a multi-channel image.
* Next, you will need to remove the 2D nature again to feed into dense layers. 
  * `Flatten()` does this for you.
  * Train the network as before.
  * What do you observe?

Later, we'll explore how a convolutional layer without weight sharing affects the network.

#### With weight sharing

In [None]:
convnet = Sequential()
convnet.add(InputLayer(input_shape=(28,28,1)))
convnet.add(Conv2D(32, kernel_size=(3,3), padding='same'))
convnet.add(Conv2D(32, kernel_size=(3,3), padding='same'))
convnet.add(MaxPool2D())
convnet.add(Conv2D(32, kernel_size=(3,3), padding='same'))
convnet.add(Conv2D(32, kernel_size=(3,3), padding='same'))
convnet.add(MaxPool2D())
convnet.add(Flatten())
convnet.add(Dense(units=128))
convnet.add(Dropout(0.5))
convnet.add(Dense(units=10, activation='softmax'))
convnet.compile(loss='categorical_crossentropy', optimizer='adadelta')
print("convnet parameters: {0:,}".format(convnet.count_params()))
convnet.summary()

In [None]:
convnet_history = convnet.fit(x_train[...,np.newaxis], y_train_one_hot, batch_size=500, epochs=60)

**Exercise**: Plot the loss (history object, see above):

*Optional task*: Go back to the first loss plot and define a function, which you can simply reuse here. If you plotted the validation loss before, do the same thing here.

In [None]:
#plt.plot(...)

In [None]:
#@title Solution
#@markdown Expand code and set "run_solution = True" to see and run the solution.

run_solution = False # set to True to run solution

if run_solution:

  # define the function
  def plot_log(history):
    fig,ax = plt.subplots(figsize=(18, 4), dpi= 80, facecolor='w', edgecolor='k')
    ax.plot(history.history['loss'],label='Training')
    # we access the validation loss if available
    if 'val_loss' in history.history:
      ax.plot(history.history['val_loss'],label='Validation')
      ax.legend()
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Loss')
    ax.grid()    
    plt.show()

  # apply the function
  plot_log(convnet_history)

else:
  print('Solution is hidden, try for yourself first...')

del run_solution


If you have scikit-learn installed (try `conda install scikit-learn`), that offers utility methods for computing evaluation metrics such as a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix):

In [None]:
import sklearn.metrics
pred = convnet.predict(x_test[...,np.newaxis])
cm = sklearn.metrics.confusion_matrix(pred.argmax(axis = -1), y_test)
cm

It's more intuitive to look at it as a heat map.

In [None]:
plt.matshow(cm)
plt.show()

Side note: Numpy and Matplotlib are two important, central libraries for numeric computing with Python. In addition, there are also more advanced libraries such as Seaborn, which build upon the things introduced above and offer dedicated functions for complex graphics, such as a combined version of the above matrix + heatmap.

In [None]:
import seaborn as sns
ax = sns.heatmap(cm, annot=True)

#### Without weight sharing

Now, let's try convolution without weight sharing.
* Use `LocallyConnected2D` for this. 
* What do you observe? Try training the network.
* Regard the number of parameters. Change the network, if necessary.

In [None]:
without_ws = Sequential()
without_ws.add(InputLayer(input_shape=(28,28,1)))
without_ws.add(LocallyConnected2D(32, kernel_size=(3,3)))
#without_ws.add(LocallyConnected2D(32, kernel_size=(3,3)))
without_ws.add(MaxPool2D())
without_ws.add(LocallyConnected2D(32, kernel_size=(3,3)))
#without_ws.add(LocallyConnected2D(32, kernel_size=(3,3)))
without_ws.add(MaxPool2D())
without_ws.add(Flatten())
without_ws.add(Dense(units=128))
without_ws.add(Dropout(0.5))
without_ws.add(Dense(units=10, activation='softmax'))
without_ws.compile(loss='categorical_crossentropy', optimizer='adadelta')
print("without_ws parameters: {0:,}".format(without_ws.count_params()))
without_ws.summary()

In [None]:
wws_history = without_ws.fit(np.reshape(x_train, x_train.shape+(1,)), y_train_one_hot, batch_size=10, epochs=10)

In [None]:
plt.plot(wws_history.history['loss'])
plt.show()
import sklearn.metrics
pred = without_ws.predict(x_test[...,np.newaxis])
cm = sklearn.metrics.confusion_matrix(pred.argmax(axis = -1), y_test)
cm

### From CNN to FCN
* In the following, explore how a network with dense layers is equivalent to a properly configured fully-convolutional network
* Flattening is replaced by a convolutional layer whose kernel spans the full size of the previous output. (Hint: use the model's `summary` function again to find the correct shape.)
    * Replace the `Flatten` layer and subsequent `Dense` layers by `Conv2D` layers with appropriate kernel size.
    * Note, that a `Flatten` layer is still required to convert into the output vector representation.
* Convince yourself that the number of trainable parameters is indeed unchanged.

In [None]:
# Disable eager execution
import tensorflow as tf

tf.compat.v1.disable_eager_execution()

In [None]:
# We call it FCN already, although it will be your task later to make it a true FCN by replacing the Dense layers. 

fcn = Sequential()
h,w = x_train.shape[1:]
fcn.add(InputLayer(input_shape=(h,w,1)))
fcn.add(Conv2D(32, kernel_size=(3,3), padding='same'))
fcn.add(Conv2D(32, kernel_size=(3,3), padding='same'))
fcn.add(MaxPool2D())
fcn.add(Conv2D(32, kernel_size=(3,3), padding='same'))
fcn.add(Conv2D(32, kernel_size=(3,3), padding='same'))
fcn.add(MaxPool2D())
fcn.add(Flatten())
fcn.add(Dense(128))
fcn.add(Dropout(0.5))
fcn.add(Dense(10, activation='softmax'))
fcn.compile(loss='categorical_crossentropy', optimizer='adadelta')
print(f"fcn parameters: {fcn.count_params()}")

In [None]:
#@title Solution
#@markdown Expand code and set "run_solution = True" to see and run the solution.

run_solution = False # set to True to run solution

if run_solution:
  fcn = Sequential()
  h,w = x_train.shape[1:]
  fcn.add(InputLayer(input_shape=(h,w,1)))
  fcn.add(Conv2D(32, kernel_size=(3,3), padding='same'))
  fcn.add(Conv2D(32, kernel_size=(3,3), padding='same'))
  fcn.add(MaxPool2D())
  fcn.add(Conv2D(32, kernel_size=(3,3), padding='same'))
  fcn.add(Conv2D(32, kernel_size=(3,3), padding='same'))
  fcn.add(MaxPool2D())

  # modified part
  fcn.add(Conv2D(128, kernel_size=(7,7), padding='valid')) 
  fcn.add(Dropout(0.5))
  fcn.add(Conv2D(10, kernel_size=(1,1)))

  fcn.add(Flatten())
  fcn.compile(loss='categorical_crossentropy', optimizer='adadelta')
  print("fcn parameters: {0:,}".format(fcn.count_params()))

  fcn.summary()
else:
  print('Solution is hidden, try for yourself first...')

del run_solution


In [None]:
fcn_history = fcn.fit(x_train[...,np.newaxis],
                      y_train_one_hot,
                      batch_size=500, epochs=10) #0)

Plot the loss once again (or reuse your function, if you implemented it earlier on.)

In [None]:
plt.plot(fcn_history.history['loss'])
plt.show()
pred = fcn.predict(x_test[...,np.newaxis])
cm = sklearn.metrics.confusion_matrix(pred.argmax(axis = -1), y_test)
cm

Exercise 1: 

Visualize this confusion matrix, compare 

In [None]:
# Room for code...

Exercise 2: 

With a fully convolutional classifier, you can easily create a number detector. 
* How?
* Can you show a proof of principle? 

In [None]:
# Room for code...