In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow import keras
import tensorflow as tf
import h5py
import time

# The Identity Block

The identity block is the standard block used in ResNets, and corresponds to the case where the input activation (say $a^{[l]}$) has the same dimension as the output activation (say $a^{[l+2]}$). To flesh out the different steps of what happens in a ResNet's identity block, here is an alternative diagram showing the individual steps:

<div style="text-align: center;">
    <img src="images/idblock3_kiank.png" style="width:800px;height:200px;" alt="Identity Block Diagram">
</div>

The upper path is the "shortcut path." The lower path is the "main path." In this diagram, notice the CONV2D and ReLU steps in each layer. To speed up training, a BatchNorm step has been added. Don't worry about this being complicated to implement--you'll see that BatchNorm is just one line of code in Keras!

In this exercise, you'll actually implement a slightly more powerful version of this identity block, in which the skip connection "skips over" 3 hidden layers rather than 2 layers. It looks like this:

<div style="text-align: center;">
    <img src="images/idblock3_kiank.png" style="width:800px;height:200px;" alt="Extended Identity Block Diagrm">
</div>


In [2]:
def identity_block(A0, n_filters_allConvLayers, filterSize_of_Middle):
    """
    Implements an identity block for a ResNet model.

    Arguments:
    A0 -- input tensor of shape (m, n_H, n_W, n_C0), where:
    n_filters_allConvLayers -- list of integers with the length of 3, specifying the number of filters for each convolutional layer in the block.
    filterSize_of_Middle -- tuple of integers (f_H, f_W), specifying the height and width of the middle convolutional filter's window.
    training -- boolean indicating whether the block should behave in training mode or inference mode.
                - `training=True`: The layer will normalize its inputs using the mean and variance of the current batch of inputs.
                - `training=False`: The layer will normalize its inputs using the mean and variance of its moving statistics, learned during training.

    Returns:
    A3 -- output tensor of shape (m, n_H, n_W, n_C3 == n_C0 == n_filters of the Last ConvLayer)
    """

    initializer = keras.initializers.RandomUniform()
    c1, c2, c3 = n_filters_allConvLayers
    f_H, f_W = filterSize_of_Middle

    A0_shortcut = A0  # A0 shape=(m, n_H, n_W, n_C0)

    Z1 = keras.layers.Conv2D(filters=c1, kernel_size=(1,1), strides=(1,1), padding='valid', kernel_initializer=initializer)(A0)
    Z1_normed = keras.layers.BatchNormalization(axis=3)(Z1)
    A1 = keras.layers.ReLU()(Z1_normed)  # A1 shape=(m, n_H, n_W, n_C1)

    Z2 = keras.layers.Conv2D(filters=c2, kernel_size=(f_H, f_W), strides=(1,1), padding='same', kernel_initializer=initializer)(A1)
    Z2_normed = keras.layers.BatchNormalization(axis=3)(Z2)
    A2 = keras.layers.ReLU()(Z2_normed)  # A2 shape=(m, n_H, n_W, n_C2)

    Z3 = keras.layers.Conv2D(filters=c3, kernel_size=(1,1), strides=(1,1), padding='valid', kernel_initializer=initializer)(A2)
    Z3_normed = keras.layers.BatchNormalization(axis=3)(Z3)  # Z3 shape=(m, n_H, n_W, n_C3)

    if A0_shortcut.shape[-1] == Z3_normed.shape[-1]:
        Z3_plus_A0 = keras.layers.Add()([Z3_normed, A0_shortcut])
        A3 = keras.layers.ReLU()(Z3_plus_A0)

        return A3  # A3 shape=(m, n_H, n_W, n_C3 == n_C0 == c3)

# The Convolutional Block

The ResNet "convolutional block" is the second block type. You can use this type of block when the input and output dimensions don't match up. The difference with the identity block is that there is a CONV2D layer in the shortcut path:

<div style="text-align: center;">
    <img src="images/convblock_kiank.png" style="width:800px;height:500px;">
</div>

* The CONV2D layer in the shortcut path is used to resize the input $x$ to a different dimension, so that the dimensions match up in the final addition needed to add the shortcut value back to the main path. (This plays a similar role as the matrix $W_s$ discussed in lecture.)
* For example, to reduce the activation dimensions's height and width by a factor of 2, you can use a 1x1 convolution with a stride of 2.
* The CONV2D layer on the shortcut path does not use any non-linear activation function. Its main role is to just apply a (learned) linear function that reduces the dimension of the input, so that the dimensions match up for the later addition step.

In [3]:
def conv_block(A0, n_filters_allConvLayers, filterSize_of_Middle, strides_of_First_Residual):
    """
    Implements an identity block for a ResNet model.

    Arguments:
    A0 -- input tensor of shape (m, n_H0, n_W0, n_C0), where:
    n_filters_allConvLayers -- list of integers with the length of 3, specifying the number of filters for each convolutional layer in the block.
    filterSize_of_Middle -- tuple of integers (f_H, f_W), specifying the height and width of the middle convolutional filter's window.
    strides_of_First_Residual -- tuple of integers (s_H, s_W), specifying the strides for the first convolutional layer and the shortcut connection.
    training -- boolean indicating whether the block should behave in training mode or inference mode.
                - `training=True`: The layer will normalize its inputs using the mean and variance of the current batch of inputs.
                - `training=False`: The layer will normalize its inputs using the mean and variance of its moving statistics, learned during training.

    Returns:
    A3 -- output tensor of shape (m, n_H1, n_W1, n_C3 == n_filters of the Last and Residual ConvLayer)
    """

    initializer = keras.initializers.GlorotUniform()
    c1, c2, c3 = n_filters_allConvLayers
    f_H, f_W = filterSize_of_Middle
    s_H, s_W = strides_of_First_Residual

    A0_shortcut = A0  # A0 shape=(m, n_H0, n_W0, n_C0)

    Z1 = keras.layers.Conv2D(filters=c1, kernel_size=(1,1), strides=(s_H, s_W), padding='valid', kernel_initializer=initializer)(A0)
    Z1_normed = keras.layers.BatchNormalization(axis=3)(Z1)
    A1 = keras.layers.ReLU()(Z1_normed)  # A1 shape=(m, n_H1, n_W1, n_C1)

    Z2 = keras.layers.Conv2D(filters=c2, kernel_size=(f_H, f_W), strides=(1,1), padding='same', kernel_initializer=initializer)(A1)
    Z2_normed = keras.layers.BatchNormalization(axis=3)(Z2)
    A2 = keras.layers.ReLU()(Z2_normed)  # A2 shape=(m, n_H1, n_W1, n_C2)

    Z3 = keras.layers.Conv2D(filters=c3, kernel_size=(1,1), strides=(1,1), padding='valid', kernel_initializer=initializer)(A2)
    Z3_normed = keras.layers.BatchNormalization(axis=3)(Z3)  # Z3 shape=(m, n_H1, n_W1, n_C3)

    Z1_shortcut = keras.layers.Conv2D(filters=c3, kernel_size=(1,1), strides=(s_H, s_W), padding='valid', kernel_initializer=initializer)(A0)
    Z1_shortcut_normed = keras.layers.BatchNormalization(axis=3)(Z1_shortcut)  # Z1_shorcut shape=(m, n_H1, n_W1, n_C3)

    Z3_plus_Z1_shortcut = keras.layers.Add()([Z3_normed, Z1_shortcut_normed])  # Now they can be added because they have the same shape
    A3 = keras.layers.ReLU()(Z3_plus_Z1_shortcut)

    return A3  # A3 shape=(m, n_H1, n_W1, n_C3 == c3)

# ResNet50

<div style="text-align: center;">
    <img src="images/resnet_kiank.png" alt="ResNet-50 Architecture" style="width:800px;height:200px;">
</div>

#### Input Stage:
- **Zero-Padding**: Pads the input with a pad of `(3, 3)`

#### Stage 1:
- **2D Convolution**:
    - 64 filters of size `(7, 7)`
    - Stride: `(2, 2)`
- **BatchNorm**: Applied along the channels axis
- **MaxPooling**:
    - Window size: `(3, 3)`
    - Stride: `(2, 2)`.

#### Stage 2:
- **Convolutional Block**:
  - Filters of 4 Conv Layers (Last = Residual): `[64, 64, 256]`
  - Filter size of Middle Layer: `(3, 3)`
  - Stride of First and Residual Layers: `(1, 1)`
- **2 Identity Blocks**:
  - Filters of 3 Conv Layers: `[64, 64, 256]`
  - Filter size of Middle Layer: `(3, 3)`

#### Stage 3:
- **Convolutional Block**:
  - Filters of 4 Conv Layers (Last = Residual): `[128, 128, 512]`
  - Filter size of Middle Layer: `(3, 3)`
  - Stride of First and Residual Layers: `(2, 2)`
- **3 Identity Blocks**:
  - Filters of 3 Conv Layers: `[128, 128, 512]`
  - Filter size of Middle Layer: `(3, 3)`

#### Stage 4:
- **Convolutional Block**:
  - Filters of 4 Conv Layers (Last = Residual): `[256, 256, 1024]`
  - Filter size of Middle Layer: `(3, 3)`
  - Stride of First and Residual Layers: `(2, 2)`
- **5 Identity Blocks**:
  - Filters of 3 Conv Layers: `[256, 256, 1024]`
  - Filter size of Middle Layer: `(3, 3)`

#### Stage 5:
- **Convolutional Block**:
  - Filters of 4 Conv Layers (Last = Residual): `[512, 512, 2048]`
  - Filter size of Middle Layer: `(3, 3)`
  - Stride of First and Residual Layers: `(2, 2)`
- **2 Identity Blocks**:
  - Filters of 3 Conv Layers: `[512, 512, 2048]`
  - Filter size of Middle Layer: `(3, 3)`

#### Output Stage:
- **2D Average Pooling**:
    - Pool size: `(2, 2)`
    - Stride: `(2, 2)`
- **Flatten Layer**:
- **Fully Connected (Dense) Layer**:
  - Reduces input to the number of classes
  - Uses `softmax` activation

In [4]:
def ResNet50(input_shape, n_classes):
    initializer = keras.initializers.GlorotUniform()

    # Input Stage
    input = keras.Input(input_shape)
    input_padded = keras.layers.ZeroPadding2D(padding=(3,3))(input)

    # Stage 1
    Z1 = keras.layers.Conv2D(filters=64, kernel_size=(7,7), strides=(2,2), kernel_initializer=initializer)(input_padded)
    Z1_normed = keras.layers.BatchNormalization(axis=3)(Z1)
    A1 = keras.layers.ReLU()(Z1_normed)
    P1 = keras.layers.MaxPooling2D(pool_size=(3,3), strides=(2,2))(A1)

    # Stage 2
    filters_Stage_2 = [64, 64, 256]
    A = conv_block(P1, filters_Stage_2, filterSize_of_Middle=(3,3), strides_of_First_Residual=(1,1))
    for _ in range(2):
        A = identity_block(A, filters_Stage_2, filterSize_of_Middle=(3,3))

    # Stage 3
    filters_Stage_3 = [128, 128, 512]
    A = conv_block(A, filters_Stage_3, filterSize_of_Middle=(3,3), strides_of_First_Residual=(2,2))
    for _ in range(3):
        A = identity_block(A, filters_Stage_3, filterSize_of_Middle=(3,3))

    # Stage 4
    filters_Stage_4 = [256, 256, 1024]
    A = conv_block(A, filters_Stage_4, filterSize_of_Middle=(3,3), strides_of_First_Residual=(2,2))
    for _ in range(5):
        A = identity_block(A, filters_Stage_4, filterSize_of_Middle=(3,3))

    # Stage 5
    filters_Stage_5 = [512, 512, 2048]
    A = conv_block(A, filters_Stage_5, filterSize_of_Middle=(3,3), strides_of_First_Residual=(2,2))
    for _ in range(2):
        A = identity_block(A, filters_Stage_5, filterSize_of_Middle=(3,3))

    # Output Stage
    P_output = keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(A)
    A_flattened = keras.layers.Flatten()(P_output)
    output = keras.layers.Dense(units=n_classes, activation='softmax', kernel_initializer=initializer)(A_flattened)

    # MODEL
    model = keras.models.Model(inputs=input, outputs=output)
    return model

In [5]:
input_shape = (64, 64, 3)
n_classes = 6

resnet50 = ResNet50(input_shape, n_classes)
resnet50.summary()

In [6]:
#opt = keras.optimizers.Adam(learning_rate=0.00015)
resnet50.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Now, It's Time for Training

In [7]:
def load_dataset(trainDataset_path, testDataset_path):
    train_dataset = h5py.File(trainDataset_path)
    test_dataset = h5py.File(testDataset_path)

    train_X = np.array(train_dataset['train_set_x'])
    train_Y = np.array(train_dataset['train_set_y'])

    test_X = np.array(test_dataset['test_set_x'])
    test_Y = np.array(test_dataset['test_set_y'])

    classes = np.array(train_dataset['list_classes'])

    return train_X, train_Y, test_X, test_Y, classes

In [8]:
train_X, train_Y, test_X, test_Y, classes = load_dataset('train_signs.h5', 'test_signs.h5')
classes

array([0, 1, 2, 3, 4, 5])

In [9]:
X_train = train_X / 255
X_test =  test_X / 255

Y_train = keras.utils.to_categorical(train_Y, num_classes=6)
Y_test = keras.utils.to_categorical(test_Y, num_classes=6)

print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

X_train shape: (1080, 64, 64, 3)
Y_train shape: (1080, 6)
X_test shape: (120, 64, 64, 3)
Y_test shape: (120, 6)


In [10]:
start_time = time.time()
resnet50.fit(X_train, Y_train, epochs=10, batch_size=32)
end_time = time.time()

elapsed_time = end_time - start_time
print('-'*100)
print(f"Time taken: {elapsed_time:.2f} seconds")

Epoch 1/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 744ms/step - accuracy: 0.3342 - loss: 2.3717
Epoch 2/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 48ms/step - accuracy: 0.7946 - loss: 0.5715
Epoch 3/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 47ms/step - accuracy: 0.9045 - loss: 0.2727
Epoch 4/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 48ms/step - accuracy: 0.8974 - loss: 0.2908
Epoch 5/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 54ms/step - accuracy: 0.8709 - loss: 0.4258
Epoch 6/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 50ms/step - accuracy: 0.9589 - loss: 0.1277
Epoch 7/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 47ms/step - accuracy: 0.9535 - loss: 0.1357
Epoch 8/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 48ms/step - accuracy: 0.9400 - loss: 0.1581
Epoch 9/10
[1m34/34[0m [32m━━━━━━━━━━━━━━━━

**Expected Output**:

```
Epoch 1/10
34/34 [==============================] - 16s 64ms/step - loss: 1.7770 - accuracy: 0.3111
Epoch 2/10
34/34 [==============================] - 2s 50ms/step - loss: 1.1800 - accuracy: 0.5583
Epoch 3/10
34/34 [==============================] - 2s 51ms/step - loss: 0.7900 - accuracy: 0.6935
Epoch 4/10
34/34 [==============================] - 2s 50ms/step - loss: 0.5295 - accuracy: 0.8065
Epoch 5/10
34/34 [==============================] - 2s 50ms/step - loss: 0.3665 - accuracy: 0.8648
Epoch 6/10
34/34 [==============================] - 2s 50ms/step - loss: 0.3032 - accuracy: 0.8880
Epoch 7/10
34/34 [==============================] - 2s 51ms/step - loss: 0.2456 - accuracy: 0.9194
Epoch 8/10
34/34 [==============================] - 2s 51ms/step - loss: 0.2123 - accuracy: 0.9278
Epoch 9/10
34/34 [==============================] - 2s 50ms/step - loss: 0.2113 - accuracy: 0.9389
Epoch 10/10
34/34 [==============================] - 2s 50ms/step - loss: 0.1469 - accuracy: 0.9491
```

In [11]:
resnet50.evaluate(X_test, Y_test)

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 665ms/step - accuracy: 0.2113 - loss: 5.0561


[5.138160228729248, 0.20000000298023224]

<span style="color: red; font-weight: bold; font-style: italic;">
WHATTT THEEE HELLL? STH WRONG HERE!!! I DONT KNOW HOW TO FIX IT :(
</span>



**Expected Output**:

<table>
    <tr>
        <td>
            <b>Test Accuracy</b>
        </td>
        <td>
           >0.70
        </td>
    </tr>

</table>

**What you should remember**:

- Very deep "plain" networks don't work in practice because vanishing gradients make them hard to train.  
- Skip connections help address the Vanishing Gradient problem. They also make it easy for a ResNet block to learn an identity function.
- There are two main types of blocks: The **identity block** and the **convolutional block**.
- Very deep Residual Networks are built by stacking these blocks together.

# Bibliography

This notebook presents the ResNet algorithm from He et al. (2015). The implementation here also took significant inspiration and follows the structure given in the GitHub repository of Francois Chollet:

- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun - [Deep Residual Learning for Image Recognition (2015)](https://arxiv.org/abs/1512.03385)
- Francois Chollet's GitHub repository: https://github.com/fchollet/deep-learning-models/blob/master/resnet50.py
