# Deep learning

The goal of this interactive demo is to show you how a deep learning model can be setup, in this case using [Google's TensorFlow](https://tensorflow.org/) package. More precisely, we will establish a convolutional neural network that is able to differentiate between 10 different object classes. However, keep in mind, that the code in this notebook was simplified for the demo, and should not be used as a plug and play example for real machine learning projects.

## The dataset

For the purpose of this demo we will use the [cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset. This dataset contains 60'000 color images (of size 32 x 32 pixels) of 10 distinct target classes:

1. airplane
2. automobile
3. bird
4. cat
5. deer
6. dog
7. frog
8. horse
9. ship
10. truck

So let's go ahead and download and prepare the dataset:

In [None]:
# Import relevant tensorflow package
from tensorflow import keras

# Load dataset, already pre-split into train and test set
(X_tr, y_tr), (X_te, y_te) = keras.datasets.cifar10.load_data()

# Scale images to a range between -0.5 and +0.5
X_tr = (X_tr.astype("float32") - 128) / 255.0
X_te = (X_te.astype("float32") - 128) / 255.0

# Reduce footprint for demo to manage memory restrictions
X_tr, y_tr = X_tr[::4], y_tr[::4]
X_te, y_te = X_te[::8], y_te[::8]

# Report shape of dataset
print("X_tr shape:", X_tr.shape)
print("X_te shape:", X_te.shape)

print("\nData is ready.")

Let's have a look at the first few hundered images of this dataset.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Create image collage
img_collage = np.concatenate(
    [np.concatenate([X_tr[idx + jdx * 25] for idx in range(25)], axis=1)
     for jdx in range(10)])

# Rescale image for visualization purpose
img_collage = (255 * (img_collage + 0.5)).astype("uint8")

# Plot image collage
plt.figure(figsize=(15, 10))
plt.imshow(img_collage)
plt.axis("off");

## The neural network model

Now that the data is ready, let's go ahead and create the convolutional neural network, or short the `ConvNet`. The architecture of a `ConvNet` consists of two parts:

1. The **convolutional layers** that will help to extract meaning full features from the dataset.
2. The **fully connected dense layers** that will combine the extracted features in non-linear ways to perform the model prediction.

The following model is one way to implement such an architecture:

In [None]:
from tensorflow.keras import layers

# Create input layer
input_layer = layers.Input(shape=(32, 32, 3))

# Create first convolutional layer (with some extra flavors)
x = layers.Conv2D(64, kernel_size=5, strides=2, padding='same')(input_layer)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)

# Create second convolutional layer (with some extra flavors)
x = layers.Conv2D(64, kernel_size=3, strides=2, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)

# Flatten everything to allow the transition to the fully connected dense layers
x = layers.Flatten()(x)

# Create fully connected dense layer (with some extra flavors)
x = layers.Dropout(0.5)(x)
x = layers.Dense(64)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)

# Create output layer
x = layers.Dense(10)(x)
output_layer = layers.Activation("softmax")(x)

# Create model based on input and output layer
model = keras.Model(inputs=input_layer, outputs=output_layer, name="ConvNet")

print("\nModel is ready.")

Once the model is created, we can use the `summary()` method to get an overview of the network's architecture
and the number of parameters in the model.

In [None]:
# Plot model summary
model.summary()

## Model training

Now that the data and the model are ready, we can go ahead and train it.

In [None]:
# Specify some additional model training parameters
batch_size = 32
epochs = 10

# Compile model with appropriate metrics and optimizers
from tensorflow.keras.optimizers import Adam
optimizer = Adam(learning_rate=0.001)
model.compile(
    loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])

# Train model
history = model.fit(
    X_tr, y_tr, batch_size=batch_size, epochs=epochs, validation_split=0.2)

print("\nModel finished training.")

## Model investigation

Once the model has finished training, we can investigate a few interesting things. First, how did the model performance improve over training time?

In [None]:
from utils import plot_convnet_history
plot_convnet_history(history.history)

Ok, it seems that the scores on the training and the validation set, has improved over time. That's great! And how good is our model at the end?

In [None]:
# Compute performance accuracy on training and test set
_, acc_tr = model.evaluate(X_tr, y_tr, verbose=1)
_, acc_te = model.evaluate(X_te, y_te, verbose=1)

# Report scores
print(f"Train accuracy: {acc_tr*100:.2f}%")
print(f"Test accuracy:  {acc_te*100:.2f}%")

Having one single performance metric is often difficult to interpret. So let's go a step further and have a look at the confusion matrix. In this matrix we can see with which target class the model confused a true value.

In [None]:
# Compute model predictions
y_pred = model.predict(X_te, verbose=0)

# Transform class probabilities to prediction labels
predictions = np.argmax(y_pred, 1)

# Create confusion matrix
import pandas as pd
class_labels = ["airplane", "automobile", "bird", "cat", "deer",
                "dog", "frog", "horse", "ship", "truck"]

from tensorflow.math import confusion_matrix
cm = confusion_matrix(y_te, predictions)
cm = pd.DataFrame(cm.numpy(), columns=class_labels, index=class_labels)

# Visualize confusion matrix
import seaborn as sns
plt.figure(figsize=(6, 6))
sns.heatmap(cm, square=True, annot=True, fmt="d", cbar=False, cmap="Spectral_r")
plt.title("Confusion matrix")
plt.show()

As we can see, our model has problem to destinguish cats from dogs, but no issues with airplanes and automobiles or ships and trucks.

# Model visualization

The following model investigation is not always done, or if so, in a much more efficient way. But to better highlight the feature extraction capability of deep learning models, let's have a closer look at the convolutional filters that our model has learned.

In [None]:
# Plot first convolutional layer of our model
from utils import plot_convnet_weights
plot_convnet_weights(model)

Each image that we pass through our model will first be filtered by one of these 64 convolutional filters. Therefore, each of these 64 filters is one way how our model "see's the world". So let's take an image from our dataset and visualize all 64 different ways these filters interpret the input.

In [None]:
# Select random image from the test set
img = X_te[6]

# Plot random image
img -= img.min()
img /= img.max()
plt.imshow(img)
plt.axis("off")
plt.show()

In [None]:
from utils import plot_convnet_activation_map
plot_convnet_activation_map(img, model)