<a href="https://colab.research.google.com/github/jpatra72/Computer-Vision/blob/main/CNN_ResNets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
log_root = 'tensorboard_logs'

%matplotlib inline
%tensorflow_version 2.x

import os
import datetime
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import imageio
import cv2
import tensorflow.keras.models as models
import tensorflow.keras.layers as layers
import tensorflow.keras.regularizers as regularizers
import tensorflow.keras.optimizers as optimizers
import tensorflow.keras.callbacks as callbacks
import tensorflow.keras.initializers as initializers
import tensorflow.keras.preprocessing.image as kerasimage

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [2]:
# Just an image plotting function
def plot_multiple(images, titles=None, colormap='gray',
                  max_columns=np.inf, imwidth=4, imheight=4, share_axes=False):
    """Plot multiple images as subplots on a grid."""
    if titles is None:
        titles = [''] *len(images)
    assert len(images) == len(titles)
    n_images = len(images)
    n_cols = min(max_columns, n_images)
    n_rows = int(np.ceil(n_images / n_cols))
    fig, axes = plt.subplots(
        n_rows, n_cols, figsize=(n_cols * imwidth, n_rows * imheight),
        squeeze=False, sharex=share_axes, sharey=share_axes)

    axes = axes.flat
    # Hide subplots without content
    for ax in axes[n_images:]:
        ax.axis('off')
        
    if not isinstance(colormap, (list,tuple)):
        colormaps = [colormap]*n_images
    else:
        colormaps = colormap

    for ax, image, title, cmap in zip(axes, images, titles, colormaps):
        ax.imshow(image, cmap=cmap)
        ax.set_title(title)
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
        
    fig.tight_layout()

In [3]:
(im_train, y_train), (im_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Normalize to 0-1 range and subtract mean of training pixels
### BEGIN SOLUTION
im_train = im_train / 255
im_test = im_test / 255

mean_training_pixel = np.mean(im_train, axis=(0,1,2))
x_train = im_train - mean_training_pixel
x_test = im_test - mean_training_pixel
### END SOLUTION

image_shape = x_train[0].shape
labels = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


## Residual Networks

ResNet is a more modern architecture, introduced by He et al. in 2015 (published in 2016: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf) and is still popular today.

It consists of blocks like the following:

![ResNet Block](resnet_block.png)

Each of these so-called *residual blocks* only have to predict a *residual* (in plain words: the "rest", the "leftover") that will be added on top of its input.
In other words, the block outputs how much each feature needs to be changed in order to enhance the representation compared to the previous block.

There are several ways to combine residual blocks into *residual networks* (ResNets). In the following, we consider ResNet-v1, as used for the CIFAR-10 benchmark in the original ResNet paper (it is simpler compared to the full model that they used for the much larger ImageNet benchmark).

Section 4.2. of the paper describes this architecture as follows: "*The first layer is 3×3 convolutions. Then we use a stack of 6n layers with 3×3 convolutions on the feature maps of sizes {32, 16, 8} respectively, with 2n layers for each feature map size. The numbers of filters are {16, 32, 64} respectively. The subsampling is performed by convolutions with a stride of 2. The network ends with a global average pooling, a 10-way fully-connected layer, and softmax. [...] When shortcut connections are used, they are connected to the pairs of 3×3 layers (totally 3n shortcuts). On this dataset we use identity shortcuts in all cases.*"

Further, they use L2 regularization for training (a standard tool to combat overfitting). This penalizes weights with large magnitude by adding an additional term to the cost function, besides the cross-entropy. The overall function to optimize becomes:

$$
\mathcal{L}_{CE} + \frac{\lambda}{2} \sum_{w\in\text{weights}} w^2,
$$

and in this paper $\lambda=10^{-4}$.

In the previous parts of this exercise we have already seen every major component we need to build this thing. However, ResNet is not a pure sequential architecture due to the skip connections. This means we cannot use `models.Sequential`. Luckily, Keras also offers a functional API. Look below to understand how this API works and fill in the missing pieces to make a ResNet.

In [4]:
def resnet(num_layers=56):
    if (num_layers - 2) % 6 != 0:
        raise ValueError('n_layers should be 6n+2 (eg 20, 32, 44, 56)')
    n = (num_layers - 2) // 6
        
    inputs = layers.Input(shape=image_shape)
    
    # First layer
    x = layers.Conv2D(16, 3, use_bias=False, 
        kernel_regularizer=regularizers.l2(1e-4),
        padding='same', kernel_initializer='he_normal')(inputs)
    x = layers.BatchNormalization(scale=False)(x)
    x = layers.Activation('relu')(x)
    
    # `resnet_block` function call in loops to stack ResNet blocks as per refernce above.
    for i_block in range(n):
        x = resnet_block(x, 16, strides=1)
        
    for i_block in range(n):
        x = resnet_block(x, 32, strides=2 if i_block==0 else 1)
        
    for i_block in range(n):
        x = resnet_block(x, 64, strides=2 if i_block==0 else 1)

    # Global pooling and classifier on top
    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(10, activation='softmax',
            kernel_regularizer=regularizers.l2(1e-4))(x)
    return models.Model(inputs=inputs, outputs=outputs, name=f'resnet{num_layers}')

def resnet_block(x, n_channels_out, strides=1):
    # First conv
    f = layers.Conv2D(n_channels_out, 3, strides, use_bias=False,
            kernel_regularizer=regularizers.l2(1e-4),
            padding='same', kernel_initializer='he_normal')(x)
    f = layers.BatchNormalization(scale=False)(f)
    f = layers.Activation('relu')(f)

    # Second conv
    f = layers.Conv2D(n_channels_out, 3, use_bias=False,
            kernel_regularizer=regularizers.l2(1e-4),
            padding='same', kernel_initializer='he_normal')(f)
    f = layers.BatchNormalization(scale=False)(f)
    
    ## Shortcut Connection:
    # If feature channel counts differ between input and output,
    # zero padding is used to match the depths.
    # It is implemented by a Conv2D with fixed weights.
    n_channels_in = x.shape[-1]
    if n_channels_in != n_channels_out:
        # Fixed weights, np.eye returns a matrix with 1s along the 
        # main diagonal and zeros elsewhere.
        identity_weights = np.eye(n_channels_in, n_channels_out, dtype=np.float32)
        layer = layers.Conv2D(
            n_channels_out, kernel_size=1, strides=strides, use_bias=False, 
            kernel_initializer=initializers.Constant(value=identity_weights))
        # Weight is not learnt
        layer.trainable = False
        x = layer(x)
       
    # the shortcut connection is added to the residual.
    x = layers.add([x, f])
    return layers.Activation('relu')(x)

## Learning Rate Decay and Data Augmentation - Our Final Model

Learning rate decay reduces the learning rate as the training progresses. It can be implemented as a Keras callback as shown below.

If you have a good GPU or a lot of time, train ResNet-56 on the CIFAR-10 dataset for 75 epochs. As a rough idea, it will take about one hour with a good GPU, but on a CPU it could take a day or two. If that's too long, train a smaller ResNet, wih `num_layers`=14 or 20, or do fewer epochs.

To add data augmentation (e.g. random translation or rotation of the input images), look up the documentation for the `ImageDataGenerator` class. The ResNet model presented in the original paper was trained with random translations of $\pm$ 4 px.

Note: `model.fit` with generator input seems to only work when the `y` targets are provided as one-hot vectors

In [5]:
def learning_rate_schedule(epoch):
    if epoch < 45:
        return 1e-3
    if epoch < 60:
        return 1e-4
    return 1e-5

def train_with_lr_decay(model):
    model.compile(
        loss='sparse_categorical_crossentropy', metrics=['accuracy'],
        optimizer=optimizers.Adam(lr=1e-3))

    # Callback for learning rate adjustment
    lr_scheduler = callbacks.LearningRateScheduler(learning_rate_schedule)

    # TensorBoard callback
    timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
    logdir = os.path.join(log_root, f'{model.name}_{timestamp}')
    tensorboard_callback = callbacks.TensorBoard(logdir, histogram_freq=1)
    
    # Fit the model on the batches generated by datagen.flow()
    model.fit(
        x_train, y_train, batch_size=128,
        validation_data=(x_test, y_test), epochs=70, verbose=1, 
        callbacks=[lr_scheduler, tensorboard_callback])
    
def train_with_lr_decay_and_augmentation(model):
    model.compile(
        loss='categorical_crossentropy', metrics=['accuracy'],
        optimizer=optimizers.Adam(lr=1e-3))

    # Callback for learning rate adjustment
    lr_scheduler = callbacks.LearningRateScheduler(learning_rate_schedule)

    # TensorBoard callback
    timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
    logdir = os.path.join(log_root, f'{model.name}_augmented_{timestamp}')
    tensorboard_callback = callbacks.TensorBoard(logdir, histogram_freq=1)

    # Data augmentation: flip and shift horizontally/vertically by max 4 pixels
    datagen = kerasimage.ImageDataGenerator(
        width_shift_range=4, height_shift_range=4,
        horizontal_flip=True, fill_mode='constant')
    
    # y targets as one-hot vectors
    y_train_onehot = tf.keras.utils.to_categorical(y_train, 10)
    y_test_onehot = tf.keras.utils.to_categorical(y_test, 10)
    
    # Fit the model on the batches generated by datagen.flow() using model.fit()
    model.fit(
        datagen.flow(x_train, y_train_onehot, batch_size=128),
        validation_data=(x_test, y_test_onehot),
        steps_per_epoch=len(x_train) / 128, epochs=70, verbose=1, 
        callbacks=[lr_scheduler, tensorboard_callback])

resnet56 = resnet(56)
train_with_lr_decay(resnet56)
resnet56 = resnet(56)
train_with_lr_decay_and_augmentation(resnet56)

  super(Adam, self).__init__(name, **kwargs)


Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 18/70
Epoch 19/70
Epoch 20/70
Epoch 21/70
Epoch 22/70
Epoch 23/70
Epoch 24/70
Epoch 25/70
Epoch 26/70
Epoch 27/70
Epoch 28/70
Epoch 29/70
Epoch 30/70
Epoch 31/70
Epoch 32/70
Epoch 33/70
Epoch 34/70
Epoch 35/70
Epoch 36/70
Epoch 37/70
Epoch 38/70
Epoch 39/70
Epoch 40/70
Epoch 41/70
Epoch 42/70
Epoch 43/70
Epoch 44/70
Epoch 45/70
Epoch 46/70
Epoch 47/70
Epoch 48/70
Epoch 49/70
Epoch 50/70
Epoch 51/70
Epoch 52/70
Epoch 53/70
Epoch 54/70
Epoch 55/70
Epoch 56/70
Epoch 57/70
Epoch 58/70
Epoch 59/70
Epoch 60/70
Epoch 61/70
Epoch 62/70
Epoch 63/70
Epoch 64/70
Epoch 65/70
Epoch 66/70
Epoch 67/70
Epoch 68/70
Epoch 69/70
Epoch 70/70
Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/7

Q: Does the augmentation improve the final performance? What do you observe on the training and validation curves compared to no augmentation?

Yes, the accuracy on the validation set gets better. The training accuracy grows slower due to the augmentation because the model can not overfit as easy anymore.

In [7]:
!zip -r /content/tensorboard_logs_cnn02_01.zip /content/tensorboard_logs
from google.colab import files
files.download("/content/tensorboard_logs_cnn02_01.zip")

  adding: content/tensorboard_logs/ (stored 0%)
  adding: content/tensorboard_logs/resnet56_augmented_20220403-183428/ (stored 0%)
  adding: content/tensorboard_logs/resnet56_augmented_20220403-183428/validation/ (stored 0%)
  adding: content/tensorboard_logs/resnet56_augmented_20220403-183428/validation/events.out.tfevents.1649010956.0b7d6d0ca933.75.3.v2 (deflated 78%)
  adding: content/tensorboard_logs/resnet56_augmented_20220403-183428/train/ (stored 0%)
  adding: content/tensorboard_logs/resnet56_augmented_20220403-183428/train/events.out.tfevents.1649010868.0b7d6d0ca933.75.2.v2 (deflated 66%)
  adding: content/tensorboard_logs/resnet56_20220403-165349/ (stored 0%)
  adding: content/tensorboard_logs/resnet56_20220403-165349/validation/ (stored 0%)
  adding: content/tensorboard_logs/resnet56_20220403-165349/validation/events.out.tfevents.1649004926.0b7d6d0ca933.75.1.v2 (deflated 78%)
  adding: content/tensorboard_logs/resnet56_20220403-165349/train/ (stored 0%)
  adding: content/ten

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>