<a href="https://colab.research.google.com/github/marshka/ml-20-21/blob/main/05_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning SP 2020/2021

Prof. Cesare Alippi   
Andrea Cini ([`andrea.cini@usi.ch`](mailto:andrea.cini@usi.ch))   
Ivan Marisca ([`ivan.marisca@usi.ch`](mailto:ivan.marisca@usi.ch))   
Nelson Brochado ([`nelson.brochado@usi.ch`](mailto:nelson.brochado@usi.ch))

---

# Lab 05: Deep Learning

In this lab, we are going to focus on some practical aspects of building deep neural networks, in particular CNNs. 

We will focus on two main tasks: 

1. Building a classifier for images of numerical digits;
3. Building a classifier for hand gestures (using the data we collected);

Let's get started...

## MNIST

The **Modified National Institute of Standards and Technology database** is a large collection of handwritten digits that is widely used in machine learning as a benchmark for computer vision algorithms.   
The dataset consists of 70000 images of handwritted digits. All images are 28 pixels by 28 pixels, in 8-bit grayscale (i.e., each pixel is represented by an integer value in the 0-255 range), and are equally divided into 10 classes.

MNIST is usually considered as a multi-class classification problem, where the goal is to map each image to its corresponding class. 

Although nowadays MNIST is regarded as solved, machine learning practitioners like to joke that while it's true that if something works on MNIST, it may not work in the real world, it is also true that if it **doesn't** work on MNIST, it will surely not work in the real world.


First let's import out libraries and define an helper function to plot images.




In [None]:
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

def plot_sample(imgs, labels, nrows, ncols, resize=None, tograyscale=False):
    # create a grid of images
    fig, axs = plt.subplots(nrows, ncols, figsize=(4*ncols, 4*nrows))
    # take a random sample of images
    indices = np.random.choice(len(imgs), size=nrows*ncols, replace=False)
    for ax, idx in zip(axs.reshape(-1), indices):
        ax.axis('off')
        # sample an image
        ax.set_title(labels[idx])
        im = imgs[idx]
        if isinstance(im, np.ndarray):
            im = Image.fromarray(im)  
        if resize is not None:
            im = im.resize(resize)
        if tograyscale:
            im = im.convert('L')
        ax.imshow(im, cmap='gray')

Let's load the MNIST dataset and visualize a sample of digits.

In [None]:
from tensorflow.keras.datasets import mnist

# Load the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

plot_sample(x_train, y_train, 4, 4)

## Dense classifier
Now let's build a neural network with the tools that we have seen so far, i.e., using only dense layers.

We start by pre-processing the images and reshaping them as vectors.

In [None]:
# Reshape to vectors
x_train = x_train.reshape(-1, 28 * 28)  # shape: (60000, 784)
x_test = x_test.reshape(-1, 28 * 28)    # shape: (10000, 784)

# Normalize to 0-1 range
x_train = x_train / 255.
x_test = x_test / 255.

We also have to pre-process our targets in order to perform multi-class classification. We will use **one-hot encoding** to represent our numerical labels (0-9) as sparse binary vectors. For instance, the one-hot encoding of label 3 will be $[0, 0, 0, 1, 0 ,0 ,0, 0, 0, 0]$.

In [None]:
# Pre-process targets
from tensorflow.keras import utils
n_classes = 10
y_train = utils.to_categorical(y_train, n_classes)
y_test = utils.to_categorical(y_test, n_classes)

Now we build a neural classifier using the same tools that we saw in the previous lab. Remember that we reshaped our inputs to be vectors, so we are in the same familiar setting as always.

However, this time we will be dealing with multi-class classification, which means that our output layer will have 10 possible outputs instead of a single one.
Moreover, the sigmoid activation function that we used in our previous binary classifiers will be replaced by the normalized **softmax** function, which will give us a **probability distribution** over the possible labels:

$$
\sigma(z)_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}
$$

where $K$ is the number of classes that we have. 

In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers

# Build model
model = Sequential()
model.add(Dense(128, activation='tanh', input_shape=x_train.shape[1:]))
model.add(Dense(128, activation='tanh'))
model.add(Dense(n_classes, activation='softmax'))

# Store number of parameters of the model
fcnn_params = model.count_params()

# Compile model
model.compile(optimizer=optimizers.Adam(lr=0.001), 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

model.summary()

We can now train and evaluate the model using Keras' `fit` method. 

In [None]:
# Train model
batch_size = 32
epochs = 10
model.fit(x_train, 
          y_train, 
          shuffle=True,  # True by default
          batch_size=batch_size, 
          epochs=epochs, 
          validation_split=0.1)

# Evaluate model
scores = model.evaluate(x_test, y_test)
print('Test loss: {} - Accuracy: {}'.format(*scores))

Pretty good, hu? Let's look closer at some of the issues of this model:

1. A lot of parameters for a pretty shallow net.
2. It sees the image as a vector...



In [None]:
# Take the test data and shift its contents to the right by p pixels
p = 3
x_test_roll = np.roll(x_test.reshape(-1, 28, 28), p, axis=-1)
plt.subplot(121)
plt.imshow(x_test[0].reshape(28, 28), cmap='gray')
plt.subplot(122)
plt.imshow(x_test_roll[0], cmap='gray')

# Evaluate the model on the shifted data
x_test_roll = x_test_roll.reshape(-1, 28 * 28)
scores = model.evaluate(x_test_roll, y_test)
print('Test loss: {} - Accuracy: {}'.format(*scores))

## Convolutional neural networks

CNNs were first introduced by Kunihiko Fukushima in 1980, and were later popularized by Y. LeCun, when he successfully applied backpropagation to train CNNs on MNIST.

In CNNs, we use our **prior knowledge** about the problem (i.e., the data are images) to **regularize** the network, imposing that neurons **share** some weights (i.e., the convolutional kernels).

![alt text](https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png)

Let's re-build our classifier from scratch, using convolutional layers instead of fully connected ones. 

In [None]:
# Load the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

We still normalize the data to the 0-1 range, but this time we **do not reshape** the images into vectors. 
Instead, we add a **new dimension** which explicitly represents the different channels of our images. In the case of MNIST, we only have one 8-bit channel, so we only need to add a "fake" dimension at the end of our data in order to have a 4D tensor of shape `(None, 28, 28, 1)` (`None` stands for batch size). 

If we had RGB images, they would be represented as tensors of shape `(None, 28, 28, 3)`, where the last dimension represent the three colors (red, green, blue).

We also one-hot encode the labels as we did before.


In [None]:
# Normalize to 0-1 range
x_train = x_train / 255.
x_test = x_test / 255.

# Add channels dimension
x_train = x_train[..., None]
x_test = x_test[..., None]

# Pre-process targets
n_classes = 10
y_train = utils.to_categorical(y_train, n_classes)
y_test = utils.to_categorical(y_test, n_classes)

Let's see how to build a ConvNet with Keras. 

In [None]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, Flatten, Dropout

# Build model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Store number of parameters of the model
cnn_params = model.count_params()

# Compile the model
model.compile(optimizer=optimizers.Adam(lr=0.001), 
              loss='categorical_crossentropy',  
              metrics=['accuracy'])
model.summary()

print("\nFCNN params: {:,} - CNN params: {:,}".format(fcnn_params, cnn_params))

To train and evaluate the model, we do exactly as we did before.

In [None]:
# Train model
batch_size = 32
epochs = 10
model.fit(x_train, 
          y_train, 
          batch_size=batch_size, 
          epochs=epochs, 
          validation_split=0.1)

# Evaluate model
scores = model.evaluate(x_test, y_test)
print('Test loss: {} - Accuracy: {}'.format(*scores))

Now let's see if the CNN is actually more robust to translations w.r.t. the dense net. We can run the same test as before, by shifting images to the right and evaluating the performance on the shifted test set. 

In [None]:
# Take the test data and shift its contents to the right by p pixels
p = 3
x_test_roll = np.roll(x_test, p, axis=2)
plt.subplot(121)
plt.imshow(x_test[0, ..., 0], cmap='gray')
plt.subplot(122)
plt.imshow(x_test_roll[0, ..., 0], cmap='gray');

# Evaluate the model on the shifted data
scores = model.evaluate(x_test_roll, y_test)
print('Test loss: {} - Accuracy: {}'.format(*scores))

## Rock, Paper, Scissors

(Inspired by the [lecture](https://github.com/alessandro-giusti/rock-paper-scissors) of Alessandro Giusti)

In this part, we will try to solve a much more complex problem. We will use CNNs to classify hand gestures used to play rock, paper, scissors.

We will use the data collected through the bot. Since these data are much more eterogenous compared to MNIST, we will need some preprocessing before feeding them to a model.

Let's upload the data to Colab..

In [None]:
!wget https://drive.switch.ch/index.php/s/UuLqhcUVXcaq9L3/download -O data.zip # bigger dataset from https://github.com/alessandro-giusti/rock-paper-scissors
!unzip data.zip -d data > /dev/null 2>&1
!apt-get install tree > /dev/null 2>&1
!tree -d data

Ok, now we have the data. Let's define a function to load them and then let's try to see how they look like.

In [None]:
import os

def load_imgs(path, folders):
    imgs = []
    labels = []
    n_imgs = 0
    for c in folders:
        # iterate over all the files in the folder
        for f in os.listdir(os.path.join(path, c)):
            if not f.endswith('.jpg'):
                continue
            # load the image (here you might want to resize the img to save memory)
            im = Image.open(os.path.join(path, c, f)).copy()
            imgs.append(im)
            labels.append(c)
        print('Loaded {} images of class {}'.format(len(imgs) - n_imgs, c))
        n_imgs = len(imgs)
    print('Loaded {} images total.'.format(n_imgs))
    return imgs, labels

In [None]:
imgs, labels = load_imgs('data', ['rock', 'paper', 'scissors'])
# imgs, labels = load_imgs('data_big', ['rock', 'paper', 'scissors'])

We can use the function that we defined in the previous part to visualize the pictures.

In [None]:
plot_sample(imgs, labels, 5, 5, resize=(64, 64), tograyscale=False)

Before starting to work on them, we want to select the image size that we will use for training.

Let's go back to the previous cell and change the image size until we find a good compromise between input size and sharpness of the image.

Once done, we can build our dataset.

In [None]:
# map class -> idx
label_to_idx = {
    'rock':0,
    'paper':1,
    'scissors':2
}

idx_to_label = {
    0:'rock',
    1:'paper',
    2:'scissors'
}

def make_dataset(imgs, labels, label_map, img_size, rgb=True, keepdim=True):
    x = []
    y = []
    n_classes = len(list(label_map.keys()))
    for im, l in zip(imgs, labels):
        # preprocess img
        x_i = im.resize(img_size)
        if not rgb:
            x_i = x_i.convert('L')
        x_i = np.asarray(x_i)
        if not keepdim:
            x_i = x_i.reshape(-1)
        
        # encode label
        y_i = np.zeros(n_classes)
        y_i[label_map[l]] = 1.
        
        x.append(x_i)
        y.append(y_i)
    return np.array(x).astype('float32'), np.array(y)

In [None]:
x, y = make_dataset(imgs, labels, label_to_idx, (64,64), rgb=True, keepdim=True)
print('x shape: {}, y shape:{}'.format(x.shape, y.shape))

In [None]:
from sklearn.model_selection import train_test_split

np.random.seed(5674)

train_idx, test_idx = train_test_split(np.arange(x.shape[0]), test_size=0.25, shuffle=True, stratify=y)
train_idx, val_idx = train_test_split(train_idx, test_size=0.2, shuffle=True, stratify=y[train_idx])

x_train, y_train = x[train_idx]/255., y[train_idx]
x_val, y_val = x[val_idx]/255., y[val_idx]
x_test, y_test = x[test_idx]/255., y[test_idx]

print('Trainig, validation, Test samples: {}, {}, {}'.format(len(x_train), len(x_val), len(x_test)))

In [None]:
# Define the network
classifier = Sequential()
classifier.add(Conv2D(16, (3,3), activation='relu', padding='same', input_shape=x_train.shape[1:]))
classifier.add(MaxPooling2D((2,2)))
classifier.add(Conv2D(16, (3,3), activation='relu', padding='same'))
classifier.add(MaxPooling2D((2,2)))
classifier.add(Conv2D(32, (3,3), activation='relu', padding='same'))
classifier.add(AveragePooling2D((4, 4)))
classifier.add(Flatten())
classifier.add(Dense(128, activation='relu'))
classifier.add(Dropout(0.25))
classifier.add(Dense(3, activation='softmax'))

classifier.compile(optimizer=optimizers.Adam(learning_rate=0.001),
                   loss='categorical_crossentropy',                   
                   metrics=['acc'],
                  )
classifier.summary()

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

batch_size = 16

es = EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)

history = classifier.fit(x_train, 
                         y_train, 
                         batch_size=batch_size, 
                         epochs=500, 
                         validation_data=(x_val, y_val),
                         verbose=1,
                         callbacks=[es])

scores = classifier.evaluate(x_test, y_test)

print('Test loss: {} - Accuracy: {}'.format(*scores))

In [None]:
def plot_history(history):
    plt.figure(figsize=(15, 5))
    plt.subplot(121)
    best_epoch = np.argmin(history.history['val_loss'])

    # Plot training & validation accuracy values
    plt.plot(history.history['loss'], label='train_loss')
    plt.plot(history.history['val_loss'], label='val_loss')
    plt.axvline(best_epoch, label='best_epoch', c='k', ls='--', alpha=0.25)
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend()

    plt.subplot(122)
    # Plot training & validation accuracy values
    plt.plot(history.history['acc'], label='train_accuracy')
    plt.plot(history.history['val_acc'], label='val_accuracy')
    plt.axvline(best_epoch, label='best_epoch', c='k', ls='--', alpha=0.25)
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend()

plot_history(history)

### Data augmentation

We can manipolate the images to create synthetic data:

Pros:
- Train the network to be resilient to noise and perturbation in the image
- More data...

Cons:
- ... not really

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_gen = ImageDataGenerator(width_shift_range=0.15,    # horizontal translation
                               height_shift_range=0.15,   # vertical translation
                               channel_shift_range=0.3,   # random channel shifts
                               rotation_range=360,        # rotation
                               zoom_range=0.3,            # zoom in/out randomly
                               shear_range=15,            # deformation
                              )

val_gen = ImageDataGenerator()

Let's visualize the transformed data.

In [None]:
def plot_gen_sample(gen, n_cols=5, n_rows=4):
    fig, axs = plt.subplots(n_rows, n_cols, figsize=(4*n_cols, 4*n_rows))
    batch = next(gen)[0]
    for ax, im in zip(axs.reshape(-1), batch):
        ax.axis('off')
        ax.imshow(im)

In [None]:
nr, nc = 5, 5

train_loader = train_gen.flow(x_train, y_train, batch_size=nr*nc)
plot_gen_sample(train_loader, nr, nc)

Now let's train the model again using the generators.

In [None]:
# Define the network
classifier = Sequential()
classifier.add(Conv2D(16, (3,3), activation='relu', padding='same', input_shape=x_train.shape[1:]))
classifier.add(MaxPooling2D((2,2)))
classifier.add(Conv2D(16, (3,3), activation='relu', padding='same'))
classifier.add(MaxPooling2D((2,2)))
classifier.add(Conv2D(32, (3,3), activation='relu', padding='same'))
classifier.add(AveragePooling2D((4,4)))
classifier.add(Flatten())
classifier.add(Dense(128, activation='relu'))
classifier.add(Dropout(0.25))
classifier.add(Dense(3, activation='softmax'))

classifier.compile(optimizer=optimizers.Adam(learning_rate=0.001), 
                   loss='categorical_crossentropy',
                   metrics=['acc'],
                  )
classifier.summary()

In [None]:
batch_size = 16

es = EarlyStopping(monitor='val_loss', patience=200, restore_best_weights=True)

train_loader = train_gen.flow(x_train, y_train, batch_size=batch_size)
val_loader = val_gen.flow(x_val, y_val, batch_size=x_val.shape[0])

history = classifier.fit_generator(train_loader, 
                                   steps_per_epoch=x_train.shape[0]//batch_size,
                                   epochs=2000, 
                                   validation_data=val_loader,
                                   validation_steps=1,
                                   callbacks=[es],)

scores = classifier.evaluate(x_test, y_test)

print('Test loss: {} - Accuracy: {}'.format(*scores))

In [None]:
plot_history(history)

Let's check the misclassified examples.

In [None]:
def plot_prediction(x, y, y_pred, class_map):
    idxs = list(range(y.shape[1]))
    for i in range(x.shape[0]):
        plt.subplot(121)
        plt.axis('off')
        plt.imshow(x[i])
        plt.subplot(122)
        plt.bar(idxs, y_pred[i], color=['g' if i else 'r' for i in y[i]])
        plt.ylim(0., 1.)
        plt.xticks(idxs, [class_map[c] for c in idxs])
        plt.show()

In [None]:
n_samples = 10

# Choose from the whole test set
# idxs = np.arange(len(x_test))

# Choose among the misclassified test samples
err = np.argmax(classifier.predict(x_test), axis=1) != np.argmax(y_test, axis=1)
idxs = np.argwhere(err).ravel()

idx = np.random.choice(idxs, n_samples, replace=False)

y_pred = classifier.predict(x_test[idx])

plot_prediction(x_test[idx], y_test[idx], y_pred, idx_to_label)