# ResNet

This notebook is an implement of [___Deep Residual Learning for Image Recognition___](https://arxiv.org/pdf/1512.03385.pdf) by He et al. The original model was trained for ImageNet dataset, but in this notebook we fine-tuned it for Cifar 10 dataset, which is a relatively smaller dataset and is better to store on server. 

We first need to install and import all the dependent libraries in the session.

In [1]:
! pip install -r ../requirements.txt

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
from tensorflow.keras.layers import *
from tensorflow.keras.layers.experimental.preprocessing import Resizing, RandomContrast, RandomFlip, RandomRotation
from tensorflow.keras.regularizers import l2

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)



2022-01-16 19:31:25.930831: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-01-16 19:31:27.709053: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-01-16 19:31:27.870596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1731] Found device 0 with properties: 
pciBusID: 0004:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.00GiB deviceMemoryBandwidth: 836.37GiB/s
2022-01-16 19:31:27.870631: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-01-16 19:31:27.875106: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-01-16 19:31:27.875138: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas

In this part of the program, we get the Cifar 10 dataset using tensorflow dataset and separate it into training set validation set, and test set.

In [2]:
batch_size = 256

def get_data():
    (train_x, train_y), (test_x, test_y) = tf.keras.datasets.cifar10.load_data()
    
    train_x = train_x / 255.0
    test_x = test_x / 255.0
    
    train_size = len(train_y) * 8 // 10

    train = tf.data.Dataset.from_tensor_slices((train_x[:train_size], 
                                                train_y[:train_size])).shuffle(train_size).batch(batch_size)
    val = tf.data.Dataset.from_tensor_slices((train_x[train_size:], 
                                              train_y[train_size:])).batch(batch_size)
    test = tf.data.Dataset.from_tensor_slices((test_x, test_y)).batch(batch_size)
    
    return train, val, test

train, val, test = get_data()

2022-01-16 19:31:29.675174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1731] Found device 0 with properties: 
pciBusID: 0004:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.00GiB deviceMemoryBandwidth: 836.37GiB/s
2022-01-16 19:31:29.677635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1869] Adding visible gpu devices: 0
2022-01-16 19:31:29.679032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1731] Found device 0 with properties: 
pciBusID: 0004:05:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.00GiB deviceMemoryBandwidth: 836.37GiB/s
2022-01-16 19:31:29.681402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1869] Adding visible gpu devices: 0
2022-01-16 19:31:29.681432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1256] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-01-16 19:31:29.681439: I tensorflow/core/com

This is function that constructs a ResNet model. We provide ResNet with layers 18, 34, 50, 101, and 152, with bottleneck structure for models with 50 or more layers, which are provided in the paper. The structure of the model is almost same with the original paper, but we add some preprocessing to make the network better fits Cifar 10 dataset. We define ```weight_decay``` as the hyperparameters of the model for kernel regularization. Although the original paper did not use dropout in training, we still added a few of them because the network is still overfitting the data. In addtion, we also apply data augmentation to original images to reduce overfitting.

In [3]:
def bottleneck(input, f1, f3, stride, weight_decay):
    x = Conv2D(kernel_size = 1, filters = f1, padding = "same", strides = stride, kernel_regularizer = l2(weight_decay))(input)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)
    x = Conv2D(kernel_size = 3, filters = f1, padding = "same", kernel_regularizer = l2(weight_decay))(x)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)
    x = Conv2D(kernel_size = 1, filters = f3, padding = "same", kernel_regularizer = l2(weight_decay))(x)
    x = BatchNormalization()(x)
    
    if stride == 2:
        input = Conv2D(kernel_size = 1, strides = stride, filters = f3, padding = "valid", activation = "relu", kernel_regularizer = l2(weight_decay))(input)
        input = BatchNormalization()(input)
    
    x = input + x
    x = BatchNormalization()(x)
    x = Activation(activation = "relu")(x)

    return x

def block(input, f1, stride, weight_decay):
    x = Conv2D(kernel_size = 3, filters = f1, padding = "same", strides = stride, kernel_regularizer = l2(weight_decay))(input)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)
    x = Conv2D(kernel_size = 3, filters = f1, padding = "same", kernel_regularizer = l2(weight_decay))(x)
    x = BatchNormalization()(x)
    
    if stride == 2:
        input = Conv2D(kernel_size = 1, strides = 2, filters = f1, padding = "valid", kernel_regularizer = l2(weight_decay))(input)
        input = BatchNormalization()(x)
      
    x = input + x
    x = BatchNormalization()(x)
    x = Activation(activation = "relu")(x)

    return x

def residualBlock(input, f1, f3, layers, weight_decay, dr, bottleNeck = False):
    if bottleNeck:
        x = bottleneck(input, f1, f3, 2 if f1 != 64 else 1, weight_decay)

        for i in range(layers - 1):
            x = bottleneck(x, f1, f3, 1, weight_decay)
    else:
        x = block(input, f1, 2 if f1 != 64 else 1, weight_decay)

        for i in range(layers - 1):
            x = block(x, f1, 1, weight_decay)
            
            if dr > 0:
                x = Dropout(dr)(x)
    
    return x

def createResNet(type, weight_decay, dropout, dropout_rate):
    if type == 18:
        params = [2, 2, 2, 2]
    elif type == 34:
        params = [3, 4, 6, 3]
    elif type == 50:
        params = [3, 4, 6, 3]
    elif type == 101:
        params = [3, 4, 23, 3]
    elif type == 152:
        params = [3, 8, 36, 3]
    else:
        raise Exception("The parameter is not valid!")

    input = Input(shape = (32, 32, 3))
    x = Resizing(96, 96)(input)
    x = RandomRotation(.2)(x)
    x = RandomFlip("horizontal")(x)
    x = RandomContrast(.2)(x)
    x = Conv2D(kernel_size = 7, filters = 64, strides = 2, kernel_regularizer = l2(weight_decay))(x)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)
    x = MaxPooling2D(pool_size = 3, strides = 2)(x)
    
    if type >= 50:
        x = Conv2D(kernel_size = 1, filters = 256, strides = 1, kernel_regularizer = l2(weight_decay))(x)

    x = residualBlock(x, 64, 256, params[0], weight_decay, dropout_rate, bottleNeck = type >= 50)
    x = residualBlock(x, 128, 512, params[1], weight_decay, dropout_rate, bottleNeck = type >= 50)
    x = residualBlock(x, 256, 1024, params[2], weight_decay, dropout_rate, bottleNeck = type >= 50)
    x = residualBlock(x, 512, 2048, params[3], weight_decay, dropout_rate, bottleNeck = type >= 50)

    x = GlobalAveragePooling2D()(x)
    x = Flatten()(x)
    
    if dropout:
        x = Dropout(dropout_rate)(x)
        
    x = Dense(10, activation = "softmax")(x)

    model = tf.keras.Model(inputs = input, outputs = x, name = "ResNet")

    return model

This part trains the ResNet model on Cifar 10 dataset. We tested several sets of hyperparameters and adopted one with the best validation loss. We then store the best weights of each training epochs on drive so that we can continue training even if the session disconnects. We also store searching results and training weights in case the process takes too much time or the session crashes accidentally. We show the result of the training process with a graph about the training and validation accuracy for each epoch. The training result could vary due to randomness created by file shuffling and learning rate. This problem will be fixed soon.

In [None]:
# Set a checkpoint to save weights
cp = tf.keras.callbacks.ModelCheckpoint("weights", monitor = "val_loss", verbose = 0, save_best_only = True, mode = "auto")
lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor = 0.1, patience = 5, verbose = 0, 
                                          mode = 'auto', min_delta = 0.0001, cooldown = 0, min_lr = 0)
es = tf.keras.callbacks.EarlyStopping(monitor = "val_loss", patience = 20, restore_best_weights = True)

weight_decay = 1e-4
learning_rate = 1e-3
dropout = True
dropout_rate = .1

model = createResNet(34, weight_decay, dropout, dropout_rate)
model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = learning_rate),
              loss = tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics = ["accuracy"])

# We can use the existing data if the training process has started
# model.load_weights("weights") 

history = model.fit(train, epochs = 150, validation_data = val, callbacks = [cp, lr])

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])

for i, (acc, val_acc) in enumerate(zip(history.history['accuracy'], history.history['val_accuracy'])):
    if (i + 1) % 10 == 0:
        plt.annotate("{:.2f}".format(acc), xy = (i + 1, acc))
        plt.annotate("{:.2f}".format(val_acc), xy = (i + 1, val_acc))

plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc = 'upper left')
plt.show()

Epoch 1/150


2022-01-16 19:31:35.458634: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-01-16 19:31:35.537910: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3783000000 Hz
2022-01-16 19:31:37.091550: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-01-16 19:31:37.437071: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8101
2022-01-16 19:31:37.922490: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-01-16 19:31:38.207540: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11




2022-01-16 19:32:04.686310: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: weights/assets
Epoch 2/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 3/150
Epoch 4/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 5/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 6/150
Epoch 7/150
Epoch 8/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 9/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 10/150
Epoch 11/150
Epoch 12/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 13/150
Epoch 14/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 15/150
Epoch 16/150
Epoch 17/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 18/150
Epoch 19/150
Epoch 20/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 21/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 22/150
Epoch 23/150
Epoch 24/150
INFO:tensorflow:Assets written to: weights/assets
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
INFO:tensorflow:Assets written 

Here we test our model on test set and show how ResNet predicts on sample images in the test set.

In [None]:
labels = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

print("Test Accuracy: {:.2%}".format(model.evaluate(test)[1]))

fig = plt.figure(figsize = (10, 40))
for sample_data, sample_label in test.take(1):
    pred = np.argmax(model.predict(sample_data), axis = 1)
    
    for i, (img, label) in enumerate(zip(sample_data[:9], sample_label[:9])):
        ax = fig.add_subplot(911 + i)
        ax.imshow(img)


        ax.set_title("Labelled as " + labels[int(label)] + ", classified as " + labels[int(pred[i])])