In [None]:
# please ignore this cell for now

# these settings perform better for CPU training at CIP
# for the particular model we have here
import os
nthreads = 4
os.environ["OMP_NUM_THREADS"] = str(nthreads)
os.environ["MKL_NUM_THREADS"] = str(nthreads)
import tensorflow as tf
import keras.backend as K
config = tf.ConfigProto(
    intra_op_parallelism_threads=nthreads,
    inter_op_parallelism_threads=nthreads,
)
session = tf.Session(config=config)
K.set_session(session)

# Convolutional neural networks

## The convolution operation

Convolutional neural networks make use of the convolution operation. They are mostly used for processing image data with 2D discrete convolutions:

![convolution](figures/convolution.png)

## Let's try it out
Can you guess what the output image for convolutional kernel in the picture above will look like?

It's the Sobel operator (https://en.wikipedia.org/wiki/Sobel_operator) that can be used for edge detection

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
sobel_x = np.array([
    [-1, 0, 1],
    [-2, 0, 2],
    [-1, 0, 1]
])

In [None]:
img = np.load("figures/grumpy.npy")

In [None]:
plt.imshow(img, cmap="Greys_r")

We could use `scipy.signal` to perform the convolution operation but to understand how the convolution works, let's quickly implement it by manually scanning over the image:

In [None]:
def convolve(input_img, kernel):
    # we will do a "valid" convolution
    # that means the output will be 2 pixels smaller in both directions than the input
    output_img = np.empty(shape=(input_img.shape[0]-2, input_img.shape[1]-2))
    for j in range(output_img.shape[1]):
        for i in range(output_img.shape[0]):
            output_img[i][j] = np.sum(kernel * input_img[i:i+kernel.shape[0], j:j+kernel.shape[1]])
    return output_img

In [None]:
plt.imshow(convolve(img, sobel_x), cmap="Greys_r")

Vertical edges got highlighted! In a very simple example:

In [None]:
test_img = np.array([
    [0, 0, 0, 0, 1, 1, 1, 1],
    [0, 0, 0, 0, 1, 1, 1, 1],
    [0, 0, 0, 0, 1, 1, 1, 1],
    [0, 0, 0, 0, 1, 1, 1, 1],
    [0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0],
    [1, 1, 1, 1, 0, 0, 0, 0],
    [1, 1, 1, 1, 0, 0, 0, 0],
    [1, 1, 1, 1, 0, 0, 0, 0],
    [1, 1, 1, 1, 0, 0, 0, 0],
])

In [None]:
convolve(test_img, sobel_x)

The transposed filter will highlight horizontal edges:

In [None]:
sobel_y = sobel_x.T
sobel_y

In [None]:
convolve(test_img, sobel_y)

In [None]:
plt.imshow(convolve(img, sobel_y), cmap="Greys_r")

And if we quadratically add the pictures above we get a nice highlighting of all edges:

In [None]:
plt.imshow(
    np.sqrt(
        convolve(img, sobel_x)**2
        + convolve(img, sobel_y)**2
    ),
    cmap="Greys_r"
)

## Application within a neural network

You might imagine that such filters help a lot when processing images and e.g. trying to learn what they show. What a convolutional neural network (CNN) does is instead of using hand-designed filters have them as trainable parameters. We can then have layers with arbitrary many input features (each of them an image) and arbitrary many output features (each of them as well an image) by essentially sliding a neural network over them:

![cnn_layer](figures/cnn_layer.png)

The top row corresponds to the input features - before the first layer these are typically the 3 colors, red, green, blue and the bottom row corresponds to the output features. For each output feature, the neutral network will learn one convolutional kernel for each input feature. So the black lines in the graphic above correspond to the trainable weights.

[animated version](https://homepages.physik.uni-muenchen.de/~Nikolai.Hartmann/cnn_anim.svg)

In Addition to applying the filter one can (and typically will) also apply an activation function.

Below is another nice animated visualisation from the [CS231 course](http://cs231n.github.io/) -- here with a convolution of "stride 2", meaning the filters move in steps of 2 pixels over the image. The blue boxes are the inputs (padded with zeros), the red boxes are two filters and the two green boxes corresponds to the output for each of the two filters. (Note that the 3 channels are summed over for the output.)

In [None]:
from IPython.display import IFrame
IFrame('http://cs231n.github.io/assets/conv-demo/index.html', width=800, height=700)

## Pooling layers

In addition to convolutional layers, CNNs will typically perform some kind of downsampling (also called pooling or subsampling) in between. This has several reasons:

- The region of the orgininal image that the neural network can "see" will increase. This can help to make use of correlations between more distant areas within an image
- The amount of computation decreases (smaller images further down in the network) - more depth and/or width of the network can be increased
- Especially for classification problems the total NN output should a few numbers, e.g. indicating in which category an image falls. Successively downsampling the image within the network will help to keep the number of parameters in the last layers small.

Pooling typically takes the maximum, average or sum over a fixed sliding window. An example of "Max" pooling with a 2x2 window:

![max_pooling](figures/Max_pooling.png)


## Full CNN Architecture for image classification

A typical CNN architecture for image classification consists of several convolutional layers with pooling layers in between and a simple fully-connected network as a last step:

![max_pooling](figures/Typical_cnn.png)

The fully connected network either has all output pixels of the last convolutional/pooling layer as input ("flatten") or uses the global average of each output feature of the last convolutional/pooling layer

## Let's try it out - CIFAR10

We will use the [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset for trying CNNs. The dataset consists out of 60k 32x32 images, labelled for 10 categories. Our goal is to predict the category from processing the image.

In [None]:
from keras.datasets import cifar10

In [None]:
# copy from /large_tmp/ if at CIP (otherwise keras will just download it from the web)
import os
from shutil import copyfile
path_cifar10_cip = "/large_tmp/LMU_DA_ML_19Adv/cifar-10-batches-py.tar.gz"
path_cifar10_user = os.path.expanduser("~/.keras/cifar-10-batches-py.tar.gz")
if not os.path.exists(path_cifar10_user) and os.path.exists(path_cifar10_cip):
    if not os.path.exist(os.path.expanduser("~/.keras")):
        os.mkdir(os.path.expanduser("~/.keras"))
    copyfile(path_cifar10_cip, path_cifar10_user)

`data` will contain 2 tuples of X, y for training (50k) and testing (10k) data:

In [None]:
data = cifar10.load_data()
x_train, y_train = data[0]
x_test, y_test = data[1]

Pictures are arranged as arrays with indices (x, y, color):

In [None]:
x_train[0].shape

The target vector consists of label indices:

In [None]:
y_train[:10]

The labels are as follows (in that order)

In [None]:
labels = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

Let's look at a few random pictures of cats

In [None]:
def show_random_pictures(x):
    pictures = x
    rnd_idx = np.random.permutation(len(pictures))
    fig, axs = plt.subplots(nrows=3, ncols=10, figsize=(20,6))
    for i, ax in enumerate(axs.reshape(-1)):
        ax.imshow(pictures[rnd_idx[i]])
        ax.set_axis_off()

In [None]:
show_random_pictures(x_train[y_train.reshape(-1) == labels.index('cat')])

### Define the NN
Keras has all the components we need

In [None]:
import keras
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Activation, Dropout

Let's build a model similar to [keras/examples/cifar10_cnn.py](https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py)

In [None]:
model = keras.models.Sequential([
    # lets start with 2 convolutional layers with kernel size 3, 32 output features each
    Conv2D(32, 3, activation="relu", input_shape=(32, 32, 3)),
    Conv2D(32, 3, activation="relu"),
    # Max pooling (default window size is 2x2)
    MaxPooling2D(),
    # Add a 25% Dropout (randomly drops 25% of inputs during training)
    Dropout(0.25),
    # another block of 2 CNN layers with 64 output features each, followed by MaxPooling and Dropout
    Conv2D(64, 3, activation="relu"),
    Conv2D(64, 3, activation="relu"),
    MaxPooling2D(),
    Dropout(0.25),
    # Flatten (reshape) all output pixels of all features into 1D array
    Flatten(),
    # add a fully connected final hidden layer with 512 neurons, followed by 50% dropout
    Dense(512, activation="relu"),
    Dropout(0.5),
    # 10 output neurons that are supposed to represent the 10 categories
    # and output 1 if the image is likely in that category and 0 if not
    Dense(10, activation="softmax")
])

In [None]:
model.summary()

For a multi-classification problems the figure of merit to minimize is the categorical cross entropy. We will use the *Adam* optimizer (a state-of-the-art (2019) adaptive learning rate optimizer that we used for the [NN for the Higgs challenge](HiggsChallenge-NN.ipynb)) and tell keras to monitor the *accuracy* (fraction of correctly classified examples) during the training:

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

For preprocessing, we will simply divide by 255 (r, g, b values are between 0 and 255)

In [None]:
def preprocess(x):
    return x / 255.

Since the NN will output 10 values, we "one-hot-encode" our target vector

In [None]:
y_train_onehot = keras.utils.to_categorical(y_train)

In [None]:
y_train[:10]

In [None]:
y_train_onehot[:10]

Now we are ready to start the training!

In [None]:
history = model.fit(
    # input
    preprocess(x_train),
    # target
    y_train_onehot,
    # number of training examples in each batch
    batch_size=64,
    # shuffle training data after each epoch
    shuffle=True,
    # number of iterations over training dataset
    epochs=5,
    # fraction of training data to split off for validation after each epoch
    validation_split=0.1,
)

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

In [None]:
np.min(history.history['val_loss'])

In [None]:
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])

In [None]:
np.max(history.history['val_acc'])

We got around 70% accuracy on the validation sample! This is not perfect yet, but already quite impressive, given the relatively simple model and fast training. From the plots above we can see that the model is maybe not fully converged yet, so a few percent might be gained by continuing the training (you can try just executing the notebook cell above again).

[Current state-of-the art neural networks](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html) reach accuracies of > 95% on CIFAR-10, so there is still a lot of room for optimization.

But now, let's validate our score on the completely independent test sample:

In [None]:
scores = model.predict(preprocess(x_test))

These scores are now the predicted probabilities for each label

In [None]:
scores

When we take the index of the highest probability, we get the "best-guess" predicted labels

In [None]:
predicted_labels = np.argmax(scores, axis=1)
predicted_labels

In [None]:
acc_test = (predicted_labels == y_test.reshape(-1)).mean()
acc_test

For such multi class problems it is useful to plot a confusion matrix - telling us how often which label is confused with each of the other labels

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
plt.imshow(confusion_matrix(y_test, predicted_labels), plt.cm.Blues)
plt.colorbar()
plt.xticks(range(10), labels, rotation=90)
plt.xlabel("True label")
plt.yticks(range(10), labels)
plt.ylabel("Predicted label")
plt.show()

Our test sample has 10k pictures with 1k for each category, so the perfect confusion matrix would contain the value 1000 all over the diagonal. Overall we see that animals seem to be more difficult to distinguish than vehicles, and vehicles tend to be confused with other vehicles and animals with other animals.

Lets look at a random sample of pictures that our network classifies as "cats":

In [None]:
show_random_pictures(x_test[predicted_labels == labels.index('cat')])

Frogs seem to work rather well, but lets look in particular at images that are incorrectly classified as frogs:

In [None]:
show_random_pictures(x_test[(predicted_labels == labels.index('frog')) & (y_test.reshape(-1) != labels.index('frog'))])

For comparison, some actual frogs:

In [None]:
show_random_pictures(x_test[y_test.reshape(-1) == labels.index('frog')])

What's happening with birds that are confused with airplanes?

In [None]:
show_random_pictures(x_test[(predicted_labels == labels.index('airplane')) & (y_test.reshape(-1) == labels.index('bird'))])

Lets have a look at the actual predicted probabilites for a few examples:

In [None]:
def plot_probabilities(x, y, scores, index):
    fig, ax = plt.subplots(figsize=(4, 2), nrows=1, ncols=2)
    ax[1].imshow(x[index])
    ax[1].set_title(labels[y[index][0]])
    ax[0].barh(labels, scores[index])
    ax[0].set_xlabel("pred. probability")

In [None]:
for index in np.random.randint(0, len(x_test), 5):
    plot_probabilities(x_test, y_test, scores, index)
    

In [None]:
import keras.backend as K

In [None]:
model.get_layer("conv2d_2").output

In [None]:
model.input

In [None]:
def plot_layer_output(img, layer_name):
    f_layer = K.function([model.input], [model.get_layer(layer_name).output])
    layer_output = f_layer([img])
    nfeat = layer_output[0].shape[-1]
    if nfeat == 32:
        fig, axs = plt.subplots(nrows=4, ncols=8, figsize=(20, 10))
    else:
        fig, axs = plt.subplots(nrows=8, ncols=8, figsize=(20, 20))
    for i, ax in enumerate(axs.ravel()):
        ax.imshow(layer_output[0][:,:,:,i].reshape(*layer_output[0].shape[1:3]))

In [None]:
second_layer = K.function([model.input], [model.get_layer("conv2d_1").output])

In [None]:
cat_index = np.random.randint(1000)
cat_picture = x_test[y_test.ravel() == labels.index('cat')][cat_index:cat_index+1]

In [None]:
plt.imshow(cat_picture[0])

In [None]:
plot_layer_output(cat_picture, "conv2d_1")

In [None]:
plot_layer_output(cat_picture, "conv2d_2")

In [None]:
plot_layer_output(cat_picture, "conv2d_3")

In [None]:
plot_layer_output(cat_picture, "conv2d_4")