#<center>CNN Architectures </center>


#<center> Part 5: ResNet implementation using keras<center>

In this notebook we will build ResNet model from the scratch using keras.

Link to the ResNet Paper: https://arxiv.org/abs/1512.03385

**ResNet was developed by a group from the Microsoft Research team in the year 2015.**

ResNet have developed deep neural networks with 50, 101, and 152 weight layers with less complexity. 

The key highlight in ResNet is the residual module architecture with the skip connections.

ResNet architecture

1. Skip connections: To solve the vanishing gradient problem in the deeper networks

2. Zero pooling layers in the residual block, Instead used bottleneck residual block of 1 x 1 Conv layers.


What is **"Vanishing gradient problem"**

As the network backpropagates the gradient of the error from the final layer back to the first layer, it is multipl'ied by the weight matrix at each step; thus the gradient can decrease exponetially quickly to zero. leading to a vanishing gradient phenomenon that prevents earlier layers from learning. As a result, the networks performance gets saturated or even starts to degrade rapidly.

To address this authors have introduced "skip connections"

####Residual blocks

A residual module consists of two branches

1. **Shortcut path** - Connects the input to an addition of the second branch

2. **Main path**: a series of convolutions and activation. The main path consists of three convolutional layers with ReLu activations. we also add batch normalization to each convolutional layer to reduce overfitting and speed up training.

> The main path architecture is as below

> [CONV --> BN --> ReLU] x 3

> Note: Shortcut arrow points to the end of the second convolutional layer right before the ReLu activation function.

> There are no pooling layers in the residual block, rather authors have decided to do ***dimension downsampling using bottlenect 1 x 1 convolutional layers*** to dowsample the output, this configuration is called a bottleneck residual block.


Types of shortcut path

1. Regular shortcut - Adds the input dimensions to the main path

![residual](https://raw.githubusercontent.com/gkadusumilli/CNN-architectures/main/Residual_blocks.png)

2. Reduce shortcut - Adds a convolutional layer in the shortcut path before merging the main path

![residual_shortcut_reduce](https://raw.githubusercontent.com/gkadusumilli/CNN-architectures/main/Residual_blocks_reduced_shortcut%20(1).png
)





####Summary of the residual blocks

* Residual blocks contain two paths: The shortcut path and the main path.

* The main path consists of three convolutional layers and we add batch normalization layer to them:

>> 1 x 1 convolutional layer
>> 3 x 3 convolutional layer
>> 1 x 1 convolutional layer

* Two ways to implement the shortcut path
>> Regular shortcut
>> Reduce shortcut


####ResNet50 network

Stage 1: 7 x 7 convolutional layer

Stage 2: 3 residual blocks, each containing [1 x 1 conv layer + 3 x 3 conv layer + 1 x 1 conv layer] = 9 conv layers

Stage 3: 4 residual blocks = total 12 conv layers

Stage 4: 6 residual blocks = total of 18 conv layers

Stage 5: 3 residual blocks =  total of 9 conv layers

Fully connected softmax layer.


In [10]:
from __future__ import print_function
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model
from keras.datasets import cifar10
import numpy as np
import os

In [11]:
#training parameters
batch_size = 32
epochs = 200
data_augmentation = True
num_classes = 10

#substracting pixel mean improves accuracy
subtract_pixel_mean = True

In [15]:
n = 6
depth = n * 9 + 2

#load the CIFAR 10 data
(x_train, y_train),(x_test,y_test) = cifar10.load_data()

#input image dimensions
input_shape = x_train.shape[1:]
#normalize the data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

if subtract_pixel_mean:
  x_train_mean = np.mean(x_train, axis=0)
  x_train -=x_train_mean
  x_test -=x_train_mean

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)

#convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

def lr_schedule(epoch):
  """Learning Rate Schedule
  Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs

  Called automatically every epoch as part of callbacks during training"""

  lr = 1e-3
  if epoch > 180:
    lr *=0.5e-3
  elif epoch > 160:
    lr *=1e-3
  elif epoch > 120:
    lr *=1e-2
  elif epoch>80:
    lr *=1e-1
  print('Learning rate: ',lr)
  return lr


x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
y_train shape: (50000, 1)


In [19]:
def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder

    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)

    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x

In [22]:
def resnet_v2(input_shape, depth, num_classes=10):
    """ResNet Version 2 Model builder [b]

    Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
    bottleneck layer
    First shortcut connection per layer is 1 x 1 Conv2D.
    Second and onwards shortcut connection is identity.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filter maps is
    doubled. Within each stage, the layers have the same number filters and the
    same filter map sizes.
    Features maps sizes:
    conv1  : 32x32,  16
    stage 0: 32x32,  64
    stage 1: 16x16, 128
    stage 2:  8x8,  256

    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 9 != 0:
        raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
    # Start model definition.
    num_filters_in = 16
    num_res_blocks = int((depth - 2) / 9)

    inputs = Input(shape=input_shape)
    # v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
    x = resnet_layer(inputs=inputs,
                     num_filters=num_filters_in,
                     conv_first=True)

    # Instantiate the stack of residual units
    for stage in range(3):
        for res_block in range(num_res_blocks):
            activation = 'relu'
            batch_normalization = True
            strides = 1
            if stage == 0:
                num_filters_out = num_filters_in * 4
                if res_block == 0:  # first layer and first stage
                    activation = None
                    batch_normalization = False
            else:
                num_filters_out = num_filters_in * 2
                if res_block == 0:  # first layer but not first stage
                    strides = 2    # downsample

            # bottleneck residual unit
            y = resnet_layer(inputs=x,
                             num_filters=num_filters_in,
                             kernel_size=1,
                             strides=strides,
                             activation=activation,
                             batch_normalization=batch_normalization,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_in,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_out,
                             kernel_size=1,
                             conv_first=False)
            if res_block == 0:
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters_out,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])

        num_filters_in = num_filters_out

    # Add classifier on top.
    # v2 has BN-ReLU before Pooling
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model



model = resnet_v2(input_shape=input_shape, depth=depth)


model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=lr_schedule(0)),
              metrics=['accuracy'])
model.summary()


Learning rate:  0.001
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_7 (InputLayer)            [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
conv2d_74 (Conv2D)              (None, 32, 32, 16)   448         input_7[0][0]                    
__________________________________________________________________________________________________
batch_normalization_65 (BatchNo (None, 32, 32, 16)   64          conv2d_74[0][0]                  
__________________________________________________________________________________________________
activation_65 (Activation)      (None, 32, 32, 16)   0           batch_normalization_65[0][0]     
______________________________________________________________________

In [30]:
#prepare model saving directory

save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name='cifar10_%s_model.{epoch:03d}.h5'
if not os.path.isdir(save_dir):
  os.makedirs(save_dir)
filepath=os.path.join(save_dir, model_name)


In [31]:
#prepare callbacks for model saving and for learning rate adjustment

checkpoint = ModelCheckpoint(filepath=filepath,
                             monitor='val_accuracy',
                             verbose=1,
                             save_best_only=True)

lr_scheduler=LearningRateScheduler(lr_schedule)

lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)

callbacks = [checkpoint, lr_reducer, lr_scheduler]

In [32]:
datagen=ImageDataGenerator(
    rotation_range=0,
    width_shift_range=0.1,
    shear_range=0.,
    zoom_range=0.,
    fill_mode='nearest',
    horizontal_flip=True
)
datagen.fit(x_train)

In [33]:
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    validation_data=(x_test,y_test),
                    epochs=epochs, verbose=1, workers=4,
                    steps_per_epoch=x_train.shape[0] //batch_size,
                    callbacks=callbacks)



Epoch 1/200
Learning rate:  0.001
   1/1562 [..............................] - ETA: 3:08 - loss: 1.2022 - accuracy: 0.7188




Epoch 00001: val_accuracy improved from -inf to 0.68020, saving model to /content/saved_models/cifar10_%s_model.001.h5
Epoch 2/200
Learning rate:  0.001

Epoch 00002: val_accuracy improved from 0.68020 to 0.69510, saving model to /content/saved_models/cifar10_%s_model.002.h5
Epoch 3/200
Learning rate:  0.001

Epoch 00003: val_accuracy did not improve from 0.69510
Epoch 4/200
Learning rate:  0.001

Epoch 00004: val_accuracy did not improve from 0.69510
Epoch 5/200
Learning rate:  0.001

Epoch 00005: val_accuracy improved from 0.69510 to 0.79850, saving model to /content/saved_models/cifar10_%s_model.005.h5
Epoch 6/200
Learning rate:  0.001

Epoch 00006: val_accuracy did not improve from 0.79850
Epoch 7/200
Learning rate:  0.001

Epoch 00007: val_accuracy did not improve from 0.79850
Epoch 8/200
Learning rate:  0.001

Epoch 00008: val_accuracy did not improve from 0.79850
Epoch 9/200
Learning rate:  0.001

Epoch 00009: val_accuracy did not improve from 0.79850
Epoch 10/200
Learning rate

<tensorflow.python.keras.callbacks.History at 0x7fd82e47c320>

In [34]:
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Test loss: 0.4836726188659668
Test accuracy: 0.91839998960495
