# Convolutional Neural Networks

In this tutorial we continue our journey in deep learning by looking at convolutional neural networks or ConvNets.

Run the notebook in Google colab:
https://colab.research.google.com/github/heprom/cvml/blob/main/tutorials/CNN.ipynb

You should use GPU acceleration to train your network, on google colab, before starting working, go to Execution -> Modifier le Type d'Execution -> select GPU as hardware accelerator

In [None]:
import numpy as np
import matplotlib
%matplotlib inline
from matplotlib import pyplot as plt, cm

## Understanding convolutions

In thisfirst section, we experiment convolutions on images using simple `numpy` operations. We first work with a single channel image from MNIST and then a 3 channel RGB image of a cat.

In [None]:
from sklearn.datasets import load_digits
mnist = load_digits()

create a variable `image` to hold the first $8\times8$ image in the data set

In [None]:
image = ...
print(image)
plt.imshow(..., cmap=cm.gray_r)
plt.axis('off')
plt.show()

create the following $3\times 3$ kernel as a numpy array: $\left[\begin{array}{ccc}-1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1\end{array}\right]$

In [None]:
kernel = np.array(...)
print(kernel)

pad the image with zeros.

In [None]:
kernel_size = ...
pad = ...
im = np.pad(image, ((pad, pad), (pad, pad)), mode='constant')
print('paded image size is now {}'.format(im.shape))

convolve the kernel with the image. Create an algorithm using for loops to output the convolution to a new variable `conv`.

In [None]:
conv = np.empty_like(image)

for i in range(pad, im.shape[0] - pad):
    for j in range(pad, im.shape[1] - pad):
        subset = im[i-pad:i+pad+1, j-pad:j+pad+1]
        conv[i-pad, j-pad] = ...  # element-wise multiplication

print('output size of the convolution is {}'.format(conv.shape))
plt.imshow(conv, cmap=cm.gray)
plt.axis('off')
plt.show()

now this works, make a function called `convolve` which takes for input an image, a kernel and output the result of the convolution. Assume image is in form (n x m x channels) and represented by floats in the [0, 1] range.

In [None]:
def convolve(image, kernel):
    kernel_size = ...
    pad = ...
    im = np.pad(image, ((pad, pad), (pad, pad)), mode='constant')
    conv = np.empty_like(image)

    for i in range(pad, im.shape[0] - pad):
        for j in range(pad, im.shape[1] - pad):
            # get the (i, j) subset of size (2 x pad + 1)
            subset = im[i - pad:i + pad + 1, j - pad:j + pad + 1]
            # perform the convolution
            conv[i - pad, j - pad] = ...
    return conv

In [None]:
plt.imshow(convolve(image, kernel), cmap=cm.gray)
plt.axis('off')
plt.show()

Now let's work with a 3 channel RGB image. Load it, convert if to float representation and in gray scale mode.

In [None]:
from skimage import data
cat = data.chelsea().astype(np.float)
cat = ...  # convert to gray scale
cat /= cat.max()  # with float representation, the range is [0, 1]
print(cat.shape)
print(cat.dtype)
print(cat.max())

In [None]:
plt.imshow(cat, cmap=cm.gray)
plt.axis('off')
plt.show()

create all the following kernel and try them out:

 - Blur kernel: $\left[\begin{array}{ccc}1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1\end{array}\right]$
 - Laplacian kernel: $\left[\begin{array}{ccc}0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0\end{array}\right]$
 - Emboss kernel: $\left[\begin{array}{ccc}-2 & -1 & 0 \\ -1 & 1 & 1 \\ 0 & 1 & 2\end{array}\right]$

In [None]:
# blur filters
blur3 = ...
blur5 = ...
blur7 = ...

# sharpen
sharpen = ...

# Laplacian kernel
laplacian = ...

# construct an emboss kernel
emboss = ...

kernels = [blur3, sharpen, laplacian, emboss]
kernel_labels = ['blur3', 'sharpen', 'laplacian', 'emboss']

In [None]:
fig = plt.figure(figsize=(10, 7))
for i in range(len(kernels)):
    ax = plt.subplot(2, 2, i + 1)
    convolution = ...
    plt.imshow(convolution, cmap=cm.gray, vmin=0.6, vmax=0.9)
    plt.title(kernel_labels[i])
    plt.axis('off')
plt.show()

## Our First ConvNet: ShallowNet architecture

This model only contains a few layers, so it is perfect to get started with CNN. The architecture can be summarized as:

```INPUT => CONV => RELU => FC```

### As usual start by loading our data set

Here we will work with animals JPG pictures with 3 classes: cats, dogs and panda, observe that the data has been preprocessed to crop and resize them to a fixed size.

In [None]:
labels = ['cats', 'dogs', 'panda']
data = np.load('animals.npz')
print(data['X'].shape)
print(data['y'].shape)

Plot a random image to visualize the data

In [None]:
plt.imshow(...)
plt.axis('off')
plt.show()

Partition the data into training and testing splits using 75% of the data for training and the remaining 25% for testing


In [None]:
from sklearn.model_selection import train_test_split

(X_train, X_test, y_train, y_test) = ...

Convert the labels from integers to vectors

In [None]:
from sklearn.preprocessing import LabelBinarizer

y_train = LabelBinarizer().fit_transform(y_train)
y_test = LabelBinarizer().fit_transform(y_test)

In [None]:
# some sanity checks
print(y_train.dtype)
print(X_test[0].shape)
print(y_train[0])
print(y_test[1])

### Build the model with Keras

start importing useful stuff from Keras

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

Initialize the optimizer and model

In [None]:
model = Sequential()

# define the first (and only) CONV => RELU layer
model.add(Conv2D(32, (3, 3), padding='same', input_shape=...))
model.add(Activation('relu'))

# softmax classifier after a FC layer
model.add(Flatten())
model.add(Dense(...))
model.add(Activation('softmax'))

In [None]:
from tensorflow.keras.optimizers import SGD
print('compiling model')
opt = SGD(lr=0.005)
model.compile(
    loss='categorical_crossentropy', 
    optimizer=opt, 
    metrics=['accuracy'])

Train the network using 100 epochs and a mini batch size of 32.

In [None]:
print('training network')
H = model.fit(
    X_train, 
    y_train, 
    validation_data=(X_test, y_test), 
    batch_size=..., 
    epochs=..., 
    verbose=1)

Save our model to the disk (to reuse it later), this is called **serialization**. In Keras, the architecture of the model and the trained weights are save to a HDF5 file.

In [None]:
model.save('my_shallow_cnn.hdf5')

Now evaluate the network using the method `predict`.

In [None]:
predictions = ...

In [None]:
index = 4
plt.imshow(X_test[index])
plt.title('predicted ad %s' % labels[predictions[index].argmax()])
plt.axis('off')
plt.show()

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test.argmax(axis=1), predictions.argmax(axis=1), target_names=labels))

In [None]:
# plot the training loss and accuracy
plt.figure()
plt.plot(H.history["loss"], label="train_loss")
plt.plot(H.history["val_loss"], label="val_loss")
plt.plot(H.history["accuracy"], label="train_acc")
plt.plot(H.history["val_accuracy"], label="val_acc")
plt.title('Training Loss and Accuracy')
plt.xlabel('Epoch #')
plt.ylabel('Loss/Accuracy')
plt.legend()
plt.show()

## A Deeper ConvNet for CIFAR-10

Finally for this tutorial we try this a deeper CNN with the rather difficult CIFAR-10 data set. We will see that we can reach > 80% accuracy which is much better that previous attempts we did.

As a side note, with much deeper networks (outside the scope of this tutorial since they need GPu hardware to train), it is *relatively easy* to acheive >90% (and even 95%) precision on this data set.

In [None]:
from tensorflow.keras.datasets import cifar10
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.astype('float') / 255.0
X_test = X_test.astype('float') / 255.0
labels = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'boat', 'truck']

In [None]:
print(X_train[0].shape)

convert the labels from integers to vectors

In [None]:
lb = LabelBinarizer()
y_train = lb.fit_transform(y_train)
y_test = lb.transform(y_test)

Import all the bells and whistles we need from from Keras

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K

Initialize the optimizer and model, the first series of CONF has 32 filters, the second 64. The kernel sizes are (3, 3).

In [None]:
model = Sequential()

# first CONV (32 filters) => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(..., ..., padding='same', input_shape=...))
model.add(...)  # activation
model.add(...)  # BN
model.add(Conv2D(..., ..., padding='same'))
model.add(...)  # activation
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# second CONV (64 filters) => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(..., ..., padding='same'))
model.add(...)  # activation
model.add(...)  # BN
model.add(Conv2D(..., ..., padding='same'))
model.add(...)  # activation
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# first (and only) set of FC (512 neurons) => RELU layers
model.add(Flatten())
model.add(...)
model.add(...)  # activation
model.add(...)  # BN
model.add(...)  # 50% dropout

# softmax classifier for the 10 classes
model.add(...)
model.add(Activation('softmax'))


Compile our model with SGD + Momentum, crossentropy loss and use accuracy as our metric. use a learning rate of 0.01 and learning rate decay (40 epochs). Use the usual value for momentum and activate nesterov acceleration.

In [None]:
opt = SGD(lr=..., decay=..., momentum=..., nesterov=...)
model.compile(
    loss=...,
    optimizer=opt,
    metrics=["accuracy"])

Train the network using 40 epochs and a mini batch size of 64.

In [None]:
H = model.fit(
    ..., 
    ..., 
    validation_data=(X_test, y_test), 
    batch_size=64, 
    epochs=...,  # should go to 40 if possible
    verbose=1)

Save our trained model to the disk.

In [None]:
model.save('miniVGGnet_cifar10.hdf5')

Now evaluate the model.

In [None]:
predictions = model.predict(X_test, batch_size=64)

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test.argmax(axis=1), predictions.argmax(axis=1), target_names=labels))

In [None]:
# plot the training loss and accuracy
plt.figure()
plt.plot(H.history["loss"], label="train_loss")
plt.plot(H.history["val_loss"], label="val_loss")
plt.plot(H.history["accuracy"], label="train_acc")
plt.plot(H.history["val_accuracy"], label="val_acc")
plt.title('Training Loss and Accuracy on CIFAR-10')
plt.xlabel('Epoch #')
plt.ylabel('Loss/Accuracy')
plt.legend()
plt.show()

Finally label a few images with their prediction

In [None]:
N = 8
M = 4
indices = np.random.randint(0, y_test.shape[0], size=N*M)

In [None]:
plt.figure(figsize=(15, 8))
for i in range(N * M):
    ax = plt.subplot(M, N, i + 1)
    plt.imshow(X_test[indices[i]])
    plt.axis('off')
    plt.title('%s' % labels[predictions[indices[i]].argmax()])
plt.show()