# Lab: Residual Networks (ResNet)

With this assignment you will build a very deep convolutional network known as Residual Network. 

In [None]:
import tensorflow as tf
import numpy as np
import scipy.misc
from tensorflow.keras.applications.resnet_v2 import ResNet50V2
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet_v2 import preprocess_input, decode_predictions
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, Add, Dense, Activation, ZeroPadding2D,BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D

from tensorflow.keras.layers import BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from tensorflow.keras.models import Model, load_model
from resnets_utils import *
from tensorflow.keras.initializers import random_uniform, glorot_uniform, constant, identity
from tensorflow.python.framework.ops import EagerTensor
from matplotlib.pyplot import imshow

%matplotlib inline

##  The Problem of Very Deep Neural Networks

In recent years, neural networks have become much deeper, evolving from having just a few layers (e.g., AlexNet) to over a hundred layers.

* The main benefit of a very deep network is that it can represent very complex functions. It can also learn features at many different levels of abstraction, from edges (at the shallower layers, closer to the input) to very complex features (at the deeper layers, closer to the output). 

* But in practice, they are hard to train.  A huge barrier to training them is vanishing gradients: very deep networks often have a gradient signal that goes to zero quickly, thus making gradient descent very slow.

* More specifically, during gradient descent, as you backpropagate from the final layer back to the first layer, the gradient can decrease exponentially quickly to zero (or, in rare cases, grow exponentially quickly and "explode," from gaining very large values). 

* During training, you might see the magnitude (or norm) of the gradient for the shallower (the first) layers decrease to zero very rapidly as training proceeds, as shown below: 

<img src="images/vanishing_grad_kiank.png" style="width:400px;height:200px;">
<caption><center> <u> <font color='purple'> <b>Figure 1</b> </u><font color='purple'>  : <b>Vanishing gradient</b> <br> The speed of learning decreases very rapidly for the shallower layers as the network trains </center></caption>



##  Building a Residual Network

Residual Networks, introduced by [He et al.](https://arxiv.org/pdf/1512.03385.pdf), allow to train much deeper networks. 

In ResNets, a "shortcut" or a "skip connection" allows the model to skip layers:  

<img src="images/skip_connection_kiank.png" style="width:300px;height:120px;">
<caption><center> <u> <font color='purple'> <b>Figure 2</b> </u><font color='purple'>  : A ResNet block showing a skip-connection <br> </center></caption>

The image on the left shows the "main path" through the network. The image on the right adds a shortcut to the main path. By stacking these ResNet blocks, a very deep network can be formed. 

Two main types of blocks are used in a ResNet, depending if the input/output dimensions are the same or different: "identity block" and "convolutional block."

### Identity Block

The identity block corresponds to the case where the input activation (say $a^{[l]}$) has the same dimension as the output activation (say $a^{[l+2]}$) (Fig.3). 

<img src="images/idblock2_kiank.png" style="width:600px;height:120px;">
<caption><center> <u> <font color='purple'> <b>Figure 3</b> </u><font color='purple'>  : <b>Identity block.</b> Skip connection "skips over" 2 layers. </center></caption>

The upper path is the "shortcut path." The lower path is the "main path."  To speed up training, a BatchNorm step has been added. 

In this exercise, you'll implement a skip connection "skips over" 3 hidden layers rather than 2 layers. It looks like in Fig.4: 

<img src="images/idblock3_kiank.png" style="width:600px;height:120px;">
    <caption><center> <u> <font color='purple'> <b>Figure 4</b> </u><font color='purple'>  : <b>Identity block.</b> Skip connection "skips over" 3 layers.</center></caption>

### Ex. 1 - ResNet identity_block

- To implement the Conv2D step: [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D)
- To implement BatchNorm: [BatchNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization) `BatchNormalization(axis = 3)(X, training = training)`. If training is set to False, its weights are not updated with the new examples, that is the model is used in prediction mode.
- For the activation, use:  `Activation('relu')(X)`
- To add the value passed forward by the shortcut: [Add](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add)

The initializer argument is set to [random_uniform]. 

These are the individual steps:

**First component of main path:**
- CONV2D has $F_1$ filters of shape (1,1), stride of (1,1), padding "valid". `kernel_initializer = initializer(seed=0)`. 
- BatchNorm is normalizing the 'channels' axis.
- Apply ReLU activation function. It has no hyperparameters. 

**Second component of main path:**
- CONV2D has $F_2$ filters of shape $(f,f)$, stride of (1,1), padding "same". `kernel_initializer = initializer(seed=0)`.
- BatchNorm is normalizing the 'channels' axis.
- Apply ReLU activation function. 

**Third component of main path:**
- CONV2D has $F_3$ filters of shape (1,1), stride of (1,1), padding "valid". `kernel_initializer = initializer(seed=0)`. 
- BatchNorm is normalizing the 'channels' axis.
- There is **no** ReLU activation function in this component. 

**Final step:**
- `X_shortcut` and the output from the 3rd layer `X` are added together.
- **Hint**: The syntax will look something like `Add()([var1,var2])`
- Apply ReLU activation function. 

The first component of the main path has been implemented. Implement the rest. 

Use seed =0 for the random uniform initialization to get compatible results with the given answers.


In [None]:
# Identity_block as skip connection

def identity_block(X, f, filters, training=True, initializer=random_uniform):
    """
    Implementation of the identity block as defined in Fig. 4
    
    Arguments:
    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
    f -- integer, specifying the shape of the filters of the second 
    CONV2D component for the main path
    
    filters -- python list of integers, defining the number of 
    filters in the CONV layers of the main path
    training -- True: Behave in training mode
                False: Behave in inference mode
    initializer -- to set up the initial weights of a layer. 
                   Equals to random uniform initializer
    
    Returns:
    X -- output of the identity block, tensor of shape (n_H, n_W, n_C)
    """
    
    # Retrieve Filters
    F1, F2, F3 = filters
    
    # Save the input value. 
    #You'll need this later to add back to the main path. 
    X_shortcut = X
    cache = []
    # First component of main path
    X = Conv2D(filters = F1, kernel_size = 1, strides = (1,1), 
        padding = 'valid',kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X, training = training) 
    X = Activation('relu')(X)
    
    ## Second component of main path
    X = ?

    ## Third component of main path 
    X = ?
    
# Final step:Add shortcut to main path pass it through a RELU activ.
    
    X = Add()([X_shortcut,X])
    X = Activation('relu')(X)

    return X

### Convolutional Block

The ResNet "convolutional block" is the second block type. You can use this type of block when the input and output dimensions don't match up. The difference with the identity block is that there is a CONV2D layer in the shortcut path: 

<img src="images/convblock_kiank.png" style="width:600px;height:120px;">
<caption><center> <u> <font color='purple'> <b>Figure 5</b> </u><font color='purple'>  : <b>Convolutional block</b> </center></caption>

* The CONV2D layer in the shortcut path is used to resize the input $x$ to a different dimension. 
* For example, to reduce the activation dimensions's height and width by a factor of 2, you can use a 1x1 convolution with a stride of 2. 
* The`initializer` is set to [glorot_uniform]. 

### Ex. 2 - ResNet convolutional_block
    
**First component of main path:**
- CONV2D has $F_1$ filters of shape (1,1), stride of (s,s), padding "valid". `kernel_initializer = initializer(seed=0)`.
- BatchNorm is normalizing the 'channels' axis.
- Apply ReLU activation function.

**Second component of main path:**
- CONV2D has $F_2$ filters of shape (f,f), stride of (1,1), padding "same".  `kernel_initializer = initializer(seed=0)`.
- BatchNorm is normalizing the 'channels' axis.
- Apply ReLU activation function.

**Third component of main path:**
- CONV2D has $F_3$ filters of shape (1,1), stride of (1,1), padding "valid". `kernel_initializer = initializer(seed=0)`.
- BatchNorm is normalizing the 'channels' axis. 
- No ReLU activation function in this component. 

**Shortcut path:**
- CONV2D has $F_3$ filters of shape (1,1), stride of (s,s), padding "valid". `kernel_initializer = initializer(seed=0)`.
- BatchNorm is normalizing the 'channels' axis. 

**Final step:**
- The shortcut and the main path values are added together.
- Apply ReLU activation function.
 
    
The first component of the main path is already implemented. Implement the rest.

In [None]:
# Convolutional_block as skip connection

def convolutional_block(X, f, filters, s = 2, training=True, 
                        initializer=glorot_uniform):
    """
    Implementation of the convolutional block as defined in Fig. 5
    
    Arguments:
    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
    f -- integer, specifying the shape of the filters of the second 
    CONV layer of the main path
    
    filters -- python list of integers, defining the number of filters 
    in the CONV layers 
                                                                          of the main path
    s -- Integer, specifying the stride to be used
    training -- True: Behave in training mode
                False: Behave in inference mode
    initializer -- to set up the initial weights of a layer. 
    Equals to Glorot uniform initializer, also called Xavier uniform 
    initializer.
    
    Returns:
    X -- output of the convolutional block, 
    tensor of shape (n_H, n_W, n_C)
    """
    
    # Retrieve Filters
    F1, F2, F3 = filters
    
    # Save the input value
    X_shortcut = X

    ##### MAIN PATH #####
    
    # First component of main path glorot_uniform(seed=0)
    X = Conv2D(filters = F1, kernel_size = 1, strides = (s, s), 
      padding='valid', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X, training=training)
    X = Activation('relu')(X)
      
    ## Second component of main path
    X = ?

    ## Third component of main path
    X = ?
    
    ##### SHORTCUT PATH ##### 
    X_shortcut = ?
    
# Final step: Add shortcut to main path & pass it through a RELU
# (Use this order [X, X_shortcut]), 
    
    X = ?
    
        
    return X

##  Building ResNet Model (50 layers)

You now have the necessary blocks to build a very deep ResNet. Fig. 6 describes the architecture of this neural network. "ID BLOCK" stands for "Identity block," and "ID BLOCK x3" means to stack 3 identity blocks together.

<img src="images/resnet_kiank.png" style="width:800px;height:120px;">
<caption><center> <u> <font color='purple'> <b>Figure 6</b> </u><font color='purple'>  : <b>ResNet-50 model</b> </center></caption>
    
### Exercise 3 - ResNet50 

The details of this ResNet-50 model are:
- Zero-padding pads the input with a pad of (3,3)
    
- Stage 1:
    - 2D Convolution has 64 filters of shape (7,7) and stride of (2,2). 
    - BatchNorm is applied to the 'channels' axis of the input.
    - MaxPooling uses a (3,3) window and (2,2) stride.
    
- Stage 2:
    - Convolutional block uses three sets of filters of size [64,64,256], "f" is 3, and "s" is 1.
    - 2 Identity Blocks use three sets of filters of size [64,64,256], and "f" is 3.
    
- Stage 3:
    - Convolutional block uses three sets of filters of size [128,128,512], "f" is 3 and "s" is 2.
    - 3 Identity Blocks use three sets of filters of size [128,128,512] and "f" is 3.
    
- Stage 4:
    - Convolutional block uses three sets of filters of size [256, 256, 1024], "f" is 3 and "s" is 2.
    - 5 Identity Blocks use three sets of filters of size [256, 256, 1024] and "f" is 3
    
- Stage 5:
    - Convolutional block uses three sets of filters of size [512, 512, 2048], "f" is 3 and "s" is 2.
    - 2 Identity Blocks use three sets of filters of size [512, 512, 2048] and "f" is 3.
    - 2D Average Pooling uses a window of shape (2,2).
    - 'flatten' layer doesn't have any hyperparameters.
    - Fully Connected (Dense) layer reduces its input to the number of classes using a softmax activation.

Implement ResNet with 50 layers described in Fig.6.Stages 1 and 2 are  implemented. Implement the rest. (The syntax for implementing Stages 3-5 is quite similar to that of Stage 2). 


In [None]:
# ResNet50

def ResNet50(input_shape = (64, 64, 3), classes = 6):
    """
    Stage-wise implementation of the architecture of ResNet50:
    CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> 
    -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3
    
    -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL ->
     -> FLATTEN -> DENSE 

    Arguments:
    input_shape -- shape of the images of the dataset
    classes -- integer, number of classes

    Returns:
    model -- a Model() instance in Keras
    """
    
    # Define the input as a tensor with shape input_shape
    X_input = Input(input_shape)
    
    # Zero-Padding
    X = ZeroPadding2D((3, 3))(X_input)
    
    # Stage 1
    X = Conv2D(64, (7, 7), strides = (2, 2), 
               kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3)(X)
    X = Activation('relu')(X)
    X = MaxPooling2D((3, 3), strides=(2, 2))(X)

    # Stage 2
    X = convolutional_block(X, f = 3, filters = [64, 64, 256], s = 1)
    X = identity_block(X, 3, [64, 64, 256])
    X = identity_block(X, 3, [64, 64, 256])

    ## Stage 3
    X = ?
    
    ## Stage 4 
    X = ?

    ## Stage 5 
    X = ?
    
    ##  Use "X = AveragePooling2D(...)(X)"
    X = ?

    # output layer
    X = Flatten()(X)
    X = Dense(classes, activation='softmax', 
              kernel_initializer = glorot_uniform(seed=0))(X)
    
    # Create model
    model = Model(inputs = X_input, outputs = X)

    return model

In [None]:
# Build the ResNet50 model's graph

model = ?


In [None]:
# Compile the model 
# optimizer 'adam'; loss 'categorical_crossentropy'; metrics 'accuracy'

?

In [None]:
# Print a summary of the model
?

The model is now ready to be trained. Let's load the SIGNS dataset.

<img src="images/signs_data_kiank.png" style="width:400px;height:200px;">
<caption><center> <u> <font color='purple'> <b>Figure 7</b> </u><font color='purple'>  : <b>SIGNS dataset</b> </center></caption>


In [None]:
#data of only 6 classes (0,1,2,3,4,5)
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

#Normalize the images between [0, 1]
X_train = ?
X_test = ?

#What is the dimension of Y_train_orig, Y_test_orig

?

# Convert train & test labels (0,1,2,3,4,5) 
# into binary labels(0,1) - one hot matrices
Y_train = ?
Y_test = ?

print ("number of training examples = ?")

print ("number of test examples = ? ")

#Print the shape of X_train, Y_train, X_test, Y_test


In [None]:
#Train (fit) the model on a few epochs with a batch size of 32
#If your computer is relatively slow choose epochs=2
#If your computer is relatively fast choose epochs=10

?

**Expected Output**:

```
Epoch 1/10
34/34 [==============================] - 1s 34ms/step - loss: 1.9241 - accuracy: 0.4620
Epoch 2/10
34/34 [==============================] - 2s 57ms/step - loss: 0.6403 - accuracy: 0.7898
Epoch 3/10
34/34 [==============================] - 1s 24ms/step - loss: 0.3744 - accuracy: 0.8731
Epoch 4/10
34/34 [==============================] - 2s 44ms/step - loss: 0.2220 - accuracy: 0.9231
Epoch 5/10
34/34 [==============================] - 2s 57ms/step - loss: 0.1333 - accuracy: 0.9583
Epoch 6/10
34/34 [==============================] - 2s 52ms/step - loss: 0.2243 - accuracy: 0.9444
Epoch 7/10
34/34 [==============================] - 2s 48ms/step - loss: 0.2913 - accuracy: 0.9102
Epoch 8/10
34/34 [==============================] - 1s 30ms/step - loss: 0.2269 - accuracy: 0.9306
Epoch 9/10
34/34 [==============================] - 2s 46ms/step - loss: 0.1113 - accuracy: 0.9630
Epoch 10/10
34/34 [==============================] - 2s 57ms/step - loss: 0.0709 - accuracy: 0.9778
```

The exact values may not match due to random inicialization, but the important thing is that the loss value decreases, and the accuracy increases over the epochs.

In [None]:
# Let's see how this model (trained on only a few epochs) 
# performs on the test set.
 
    ?

print ("Test Accuracy = ?")

**Expected Output (for epochs=2)**:Test Accuracy >  0.7

**Expected Output (for epochs=10)**:Test Accuracy > 0.80

You can train the ResNet for more iterations, if you want. It tends to get better performance when trained for ~20 epochs, but this does take more than an hour when training on a CPU. 

You can load and run a pretrained (using a GPU) ResNet50 model on the test set in the cells below. It may take some time to load the model. 

In [None]:
pre_trained_model = tf.keras.models.load_model('resnet50.h5')

In [None]:
# Let's see how the pretrained model performs on the test set.
# Compute Test Acc

preds_test = ?




In [None]:
# Let's see how the pretrained model performs on the train set.
# Compute the Train Acc

preds_train = ?



### 5 - Test on Your Own Image

You can take a picture of your own hand and see the output of the model. To do this:

    1. Add your image to this Jupyter Notebook's directory, in the "images" folder
    2. Write your image's name in the following code
    3. Run the code and check if the algorithm is right! 

In [None]:
img_path = 'images/my_image.jpg'
img = image.load_img(img_path, target_size=(64, 64))

x = image.img_to_array(img)

x = np.expand_dims(x, axis=0)

# Normalize the image (between 0-1)
x = 
print('Input image shape:', x.shape)
imshow(img)

#Apply the pretrained model to predict the class
prediction = ?

# Print the probabilities of the 6 classes 
?

#The class is equal to the max probability 
?
