# ResNet Modeling

## Dataset: cifar 10
## Model: ResNet v2
- **ResNet v1** :  [Deep Residual Learning for Image Recognition, 2015](https://arxiv.org/pdf/1512.03385.pdf)
- **ResNet v2** : [Identity Mapping in Deep Residual Networks, 2016](https://arxiv.org/pdf/1603.05027)

## Experiments
- **ResNet v2** model로 실습, hyperparameter 변경 실습
  * epoch 
  * ResNet block 수 n  
  * subtract_pixel_mean = True/False 변경 
- inference에서 test image 사용한 결과 보기

If necessary, install scipy
```{bash}
!pip3 install scipy
```


Reference: https://github.com/PacktPublishing/Advanced-Deep-Learning-with-Keras/blob/master/chapter2-deep-networks/resnet-cifar10-2.2.1.py

In [None]:
import tensorflow as tf
from tensorflow import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras.models import Model
from keras.datasets import cifar10

import numpy as np
import os
import math
from datetime import datetime 

In [None]:
# for display image and plot
from PIL import Image
import matplotlib.pyplot as plt

## 0. Configuration & Hyperparameters

In [None]:
num_classes = 10 # cifar10 classes : fixed
data_augmentation = True

# Subtracting pixel mean can improve accuracy
subtract_pixel_mean = True

batch_size = 32  # 128  # original paper trained all networks with batch_size=128
epochs = 120     # 100, 120(~55m, 91%), 150, 200(1:30m), 300(2:30m, 91%)

In [None]:
# Model version
version = 2      # fixed

# v1 n : number of residual blocks                        
#     { 3,  5,  7,  9, ...}. cf. training time
# v1 depth: the corresponding number of layers(depth): 
#           {20, 32, 44, 56, ... }
# v2 n : number of residual blocks                        
#     { 2,  3,  4,  5,  6,  7,  9, ...}. cf. training time
# v2 depth: the corresponding number of layers(depth): 
#     {20, 29, 38, 47, 56, 65, 83, ... }

n = 2 # 변경 가능하지만, 증가할 수록 학습 시간이 오래 걸립니다.
# Computed depth from supplied model parameter n
# 1 input Conv, last dense layer : 2 extra layers
if version == 1:
    # 3 stages x 2 (conv layers / ResNetBlock)  x n ((number of ResNetBlocks)/stage)
    depth = n * 6 + 2 
elif version == 2:
    # 3 stage x 3 (conv layers/ResNetBlock) x n ((number of ResNetBlocks)/stage)
    depth = n * 9 + 2 
    
print(f'model: ResNet-v{version}-{depth}')

## 1. Dataset Preparation
- Load cifar10 dataset
- Normalize input(x) data
- Output encoding to one-hot vector

In [None]:
# Model name, depth and version
model_type = 'ResNet%dv%d' % (depth, version)

# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Input image dimensions.
input_shape = x_train.shape[1:]

# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# If subtract pixel mean is enabled
if subtract_pixel_mean:
    x_train_mean = np.mean(x_train, axis=0)
    x_train -= x_train_mean
    x_test -= x_train_mean

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)

In [None]:
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

## 2. Modeling: ResNet
- Using functional APIs: Add

### Residual block F(x)
- H(x) = F(x) + x
- F(x) = f( resnet_layer, x)
 * F(x) : 2 resnet_layers  
- f(x) = 2d convolution - Batch Normalization - Activation(relu) 

In [None]:
def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder

    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True: v1) or
            bn-activation-conv (False: v2)

    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first: # v1
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else: # v2
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x) 
    return x

### Residual block F(x)
1) idenity shortcut: H(x) = F(x) + x
- F(x) = f( resnet_layer, x)
 * F(x) : 2 resnet_layers  
        
2) projection shortcut: H(x) = F(x) + Conv2d(x, stride=2)
  - subsampling: Conv2d(stride=2)
  - resnet_layer = Conv2d - BN - Activation(relu) 
  - resolution을 1/2로 줄일 때, stride=2(filter수는 두배씩 증가시킴) 
  - conv2d(3x3), filter size = 64 (at 32 x 32)

## Modeling: ResNet v1
- input block(first conv.) : Conv+BN+ReLU
- middle blocks: 3 stages ResNet blocks (32x32 - 16x16 - 8x8)
  * stage 0(32x32): 16 channels(no. of filters), n ResNet Blocks
  * stage 1(16x16): 16x2 channels, n ResNet Blocks
  * stage 2(8x8): 16x4 channels, n ResNet Blocks
- output block(last classification layer): average pooling 2d (8)- flatten - dense(softmax, 10 classes)
- depth = 2 + 6*n: number of weighted layers
  * n: ResBlocks/stage
  * input block(conv) + last dense layer : 2
  * middle blocks: 6*n = 3 stages x 2 (conv layers/ResNet Block) * n (ResBlocks/stage)

In [None]:
def resnet_v1(input_shape, depth, num_classes=10):
    """ResNet Version 1 Model builder [a]

    Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
    Last ReLU is after the shortcut connection.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filters is
    doubled. Within each stage, the layers have the same number filters and the
    same number of filters.
    Features maps sizes:
    stage 0: 32x32, 16
    stage 1: 16x16, 32
    stage 2:  8x8,  64
    The Number of parameters is approx the same as Table 6 of [a]:
    ResNet20 0.27M
    ResNet32 0.46M
    ResNet44 0.66M
    ResNet56 0.85M
    ResNet110 1.7M

    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
    # Start model definition.
    num_filters = 16
    num_res_blocks = int((depth - 2) / 6)

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # Instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            if stack > 0 and res_block == 0:  # first layer but not first stack
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            if stack > 0 and res_block == 0:  # first layer but not first stack
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2

    # Add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='glorot_uniform')(y) 

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model

### Residual block F(x)
1) idenity mapping : H(x) = F(x) + x
 - F(x) = f( resnet_layer, x)
 - resnet_layer =  **BN** - Act(relu) - Conv2d - BN - Act(relu) - Conv2d - BN - Act(relu) - Conv2d 
   * **pre-activation** for identity mapping
   * Activation: relu
 - F(x) : 3 conv's 
 
2) projection shortcut: H(x) = F(x) + Conv2d(x, stride=2)
  - resolution 또는 channel 변경시 사용
  - subsampling: Conv2d(stride=2)
  - resolution을  1/2로 줄일 때 stride=2(filter수 는 두배로 증가)
  - conv2d(3x3), filter size = 64(at 32x32) 

### Modeling: ResNet v2
- Pre-activation 방식 ResNet Block 사용
- input block: Conv+BN+ReLU
 * Conv 한번 수행
- middle blocks: 3 stages로 구성. ResNet blocks (32x32 - 16x16 - 8x8)
 * stage 0(32x32): 16 channels(no. of filters), 64 channels for input/output, n ResNet Blocks
   - Note the difference of the first block
 * stage 1(16x16): 64 channels, 128 channls for input/output, n ResNet Blocks
 * stage 2(8x8): 128 channels, 256 channels for input/output, n ResNet Blocks
- output block (last classification layer): 
 * BN-Act-average pooling 2d (256)- flatten - dense(softmax, 10 classes)
- layer 수(depth) 계산: 
  depth = 2 + 9*n (n: ResBlocks/stage)
 * number of weighted layers in the middle blocks = 3 convs/resnet block * n resnet block/stage * 3 stages 
 * input block(conv) + last dense layer : 2

In [None]:
def resnet_v2(input_shape, depth, num_classes=10):
    """ResNet Version 2 Model builder [b]

    Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or 
    also known as bottleneck layer.
    First shortcut connection per layer is 1 x 1 Conv2D.
    Second and onwards shortcut connection is identity.
    At the beginning of each stage, 
    the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, 
    while the number of filter maps is
    doubled. Within each stage, the layers have 
    the same number filters and the same filter map sizes.
    Features maps sizes:
    conv1  : 32x32,  16
    stage 0: 32x32,  64
    stage 1: 16x16, 128
    stage 2:  8x8,  256

    Arguments:
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    Returns:
        model (Model): Keras model instance
    """
    if (depth - 2) % 9 != 0:
        raise ValueError('depth should be 9n+2 (eg 110 in [b])')
    # start model definition.
    num_filters_in = 16
    num_res_blocks = int((depth - 2) / 9)

    inputs = Input(shape=input_shape)
    # v2 performs Conv2D with BN-ReLU
    # on input before splitting into 2 paths
    x = resnet_layer(inputs=inputs,
                     num_filters=num_filters_in,
                     conv_first=True)

    # instantiate the stack of residual units
    for stage in range(3):
        for res_block in range(num_res_blocks):
            activation = 'relu'
            batch_normalization = True
            strides = 1
            if stage == 0:
                num_filters_out = num_filters_in * 4
                # first layer and first stage
                if res_block == 0:  
                    activation = None
                    batch_normalization = False
            else:
                num_filters_out = num_filters_in * 2
                # first layer but not first stage
                if res_block == 0:
                    # downsample
                    strides = 2 

            # bottleneck residual unit
            y = resnet_layer(inputs=x,
                             num_filters=num_filters_in,
                             kernel_size=1,
                             strides=strides,
                             activation=activation,
                             batch_normalization=batch_normalization,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_in,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_out,
                             kernel_size=1,
                             conv_first=False)
            if res_block == 0:
                # linear projection residual shortcut connection
                # to match changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters_out,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])

        num_filters_in = num_filters_out

    # add classifier on top.
    # v2 has BN-ReLU before Pooling
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten(name='feature')(x) # feature vector layer 이름 남김
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='glorot_uniform')(y) 

    # instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model

## 3. Training
### - 3.1 **Optimizer**: **Adam** + lr scheduler
#### Learning scheduler

In [None]:
def lr_schedule(epoch):
    """Learning Rate Schedule

    Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
    Called automatically every epoch as part of callbacks during training.

    # Arguments
        epoch (int): The number of epochs

    # Returns
        lr (float32): learning rate
    """
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-3
    elif epoch > 160:
        lr *= 1e-3
    elif epoch > 120:
        lr *= 1e-2
    elif epoch > 80:
        lr *= 1e-1
    
    return lr

- **Optimizer**: **Adam** + lr scheduler

In [None]:
if version == 2:
    model = resnet_v2(input_shape=input_shape, depth=depth)
else: 
    model = resnet_v1(input_shape=input_shape, depth=depth)

model.compile(loss='categorical_crossentropy',
              optimizer=Adam(learning_rate=lr_schedule(0)),
              metrics=['accuracy'])
model.summary()
print(model_type)

lr_scheduler = LearningRateScheduler(lr_schedule)

callbacks = [lr_scheduler]

In [None]:
model_arch_png = 'model_cifar10_%s.png' % model_type
keras.utils.plot_model(model, to_file=model_arch_png, show_shapes=True )

### 3.2 Train the designed model
- Data Augmentation
- Traing time: ~1:30min/200 epoch, 91%(test dataset)

In [None]:
epochs=2

In [None]:
# Run training, with or without data augmentation.
if not data_augmentation:
    print('Not using data augmentation.')
    start_time = datetime.now()    
    history = model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=False, #  True if necessary 
              callbacks=callbacks)
    end_time = datetime.now()

else:
    print('Using real-time data augmentation.')
    # this will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        # set input mean to 0 over the dataset
        featurewise_center=False,
        # set each sample mean to 0
        samplewise_center=False,
        # divide inputs by std of dataset
        featurewise_std_normalization=False,
        # divide each input by its std
        samplewise_std_normalization=False,
        # apply ZCA whitening
        zca_whitening=False,
        # randomly rotate images in the range (deg 0 to 180)
        rotation_range=0,
        # randomly shift images horizontally
        width_shift_range=0.1,
        # randomly shift images vertically
        height_shift_range=0.1,
        # randomly flip images
        horizontal_flip=True,
        # randomly flip images
        vertical_flip=False)

    # compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(x_train)
    
    steps_per_epoch =  math.ceil(len(x_train) / batch_size)
    # fit the model on the batches generated by datagen.flow().
    start_time = datetime.now()    
    history = model.fit(x=datagen.flow(x_train, y_train, batch_size=batch_size),
              verbose=1,
              epochs=epochs,
              validation_data=(x_test, y_test),
              steps_per_epoch=steps_per_epoch,
              callbacks=callbacks)
    end_time = datetime.now()

In [None]:
elapsed_time = end_time - start_time
print(f'Training time:{elapsed_time}')

## Plot the train history

In [None]:
# plot history.
def plot_history(h):
    epochs_ = len(h.history['loss'])
    ep = np.arange(epochs_)
    fig = plt.figure(figsize=(17,8))
    ax1 = fig.add_subplot(1,2,1)
    ax2 = fig.add_subplot(1,2,2)

    ax1.plot(ep, h.history['loss'], label='loss')
    ax1.plot(ep, h.history['val_loss'], label='val_loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss ')
    ax1.legend(bbox_to_anchor=(1,1))
    ax1.grid(True)

    ax2.plot(ep, h.history['accuracy'], label='accuracy')
    ax2.plot(ep, h.history['val_accuracy'], label='val_accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.legend(bbox_to_anchor=(1,1))
    ax2.grid(True)

In [None]:
plot_history(history)

## Quantitative Evaluation
- Accuracy on test dataset

In [None]:
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

### Generate Confusion Matrix

In [None]:
# Generate predictions for the test dataset.
predictions = model.predict(x_test)

# For each sample image in the test dataset, select the class label with the highest probability.
predicted_labels = [np.argmax(i) for i in predictions]

In [None]:
# Convert one-hot encoded labels to integers.
y_test_integer_labels = tf.argmax(y_test, axis=1)

# Generate a confusion matrix for the test dataset.
cm = tf.math.confusion_matrix(labels=y_test_integer_labels, predictions=predicted_labels)

# Plot the confusion matrix as a heatmap.
plt.figure(figsize=[12, 6])
import seaborn as sn
sn.heatmap(cm, annot=True, fmt='d', annot_kws={"size": 12})
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Truth')
plt.show()

## Inference test

In [None]:
cifar10_categories=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

In [None]:
## Run inference on new data
# save an image from cifar10 dataset and load it
image_id = 0 # cifar10 test image id in number
image_file = f'./cifar10_test{image_id}.png'
keras.utils.save_img(image_file, x_test[image_id])
img = keras.utils.load_img(image_file, target_size=(32,32))

# image display
plt.imshow(img)
img_array = keras.utils.img_to_array(img)

# add a dimension to make an array ( a form of a list of images) for model.predict input
# [image1]
img_array = np.expand_dims(img_array, axis=0)

# preprocessing the input image  
img_array = img_array.astype('float32') / 255

# If subtract pixel mean is enabled
if subtract_pixel_mean:
    img_array -= x_train_mean  # x_train_mean = np.mean(x_train, axis=0)

predictions = model.predict(img_array)
#score = predictions[0]

prediction = np.argmax(predictions[0]) # predicted class id : id for max scores

# categories[prediction]
print(f"ground truth_label = {cifar10_categories[np.argmax(y_test[image_id])]}, predicted_label = {cifar10_categories[prediction]}")
print(f"Similarity = {np.dot(predictions[0],y_test[image_id])}")

## Qualitative Evaluation

In [None]:
def evaluate_model(x_dataset, y_label, model):

    class_names = ['airplane',
                   'automobile',
                   'bird',
                   'cat',
                   'deer',
                   'dog',
                   'frog',
                   'horse',
                   'ship',
                   'truck' ]
    num_rows = 3
    num_cols = 6
    
    # Retrieve a number of images from the dataset.
    data_batch = x_dataset[0:num_rows*num_cols]

    # Get predictions from model.  
    predictions = model.predict(data_batch)

    plt.figure(figsize=(20, 8))
    num_matches = 0
        
    if subtract_pixel_mean:
        data_batch += x_train_mean  # add to range [0,1] for display , x_train_mean = np.mean(x_train, axis=0)

    for idx in range(num_rows*num_cols):
        ax = plt.subplot(num_rows, num_cols, idx + 1)
        plt.axis("off")
        plt.imshow(data_batch[idx])

        pred_idx = tf.argmax(predictions[idx]).numpy()
        truth_idx = np.nonzero(y_label[idx])
            
        title = str(class_names[truth_idx[0][0]]) + " : " + str(class_names[pred_idx])
        title_obj = plt.title(title, fontdict={'fontsize':13})
            
        if pred_idx == truth_idx:
            num_matches += 1
            plt.setp(title_obj, color='g')
        else:
            plt.setp(title_obj, color='r')
                
        acc = num_matches/(idx+1)
    print("Prediction accuracy: ", int(100*acc)/100)
    
    return


In [None]:
evaluate_model(x_test, y_test, model)

In [None]:
# save the model to load the model later
model_file = 'cifar10_%s' % model_type
model.save(model_file) # tf format
#model.save('%s.h5' % model_file) # h5 format

In [None]:
# check if the saved model can be loaded 
reloaded_model = keras.models.load_model(model_file)
reloaded_model.summary()