<a href="https://colab.research.google.com/github/luigiselmi/dl_tensorflow/blob/main/residual_connections.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Residual connections
Deep neural networks are affected by the *vanishing gradients* problem for which the parameters become increasingly smaller. One solution to avoid such problem is to use a shortcut, that is to add the input to the output of a block of one or more layers. Residual connections have been introduced in the paper "[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)" by He et al.
![residual connection](https://github.com/luigiselmi/dl_tensorflow/blob/main/images/residual_connection.jpg?raw=1)

In [1]:
from tensorflow import keras
from tensorflow.keras import layers

The shape of the output of a block of convolutional layers can change. For instance, more filters can be added while the height and width of the filters can shrink if a MaxPooling layer is added to a convolutional layer. In order to add the input tensor to the output they have to have the same shape.   

In [None]:
inputs = keras.Input(shape=(32, 32, 3))
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
residual = x
residual.shape

TensorShape([None, 30, 30, 32])

Let's say we add a block to the model that doubles the number of filters

In [None]:
x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
x.shape

TensorShape([None, 30, 30, 64])

In this case we can use a 1x1 convolutional layer without activation function that will simply project the input to the same number of 2D arrays

In [None]:
residual = layers.Conv2D(64, 1)(residual)
residual.shape

TensorShape([None, 30, 30, 64])

Now the input can be added to each filter

In [None]:
x = layers.add([x, residual])
x.shape

TensorShape([None, 30, 30, 64])

Now we see the case for which also the height and width of the filters changes

In [3]:
inputs = keras.Input(shape=(32, 32, 3))
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
residual = x
residual.shape

TensorShape([None, 30, 30, 32])

For instance we double the number of filters like before and we also reduce the size of each filter

In [4]:
x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
x = layers.MaxPooling2D(2, padding="same")(x)
x.shape

TensorShape([None, 15, 15, 64])

In this case we use again a 1x1 convolutional layer without activation function and with stride = 2 to match the size of the filters.

In [5]:
residual = layers.Conv2D(64, 1, strides=2)(residual)
residual.shape

TensorShape([None, 15, 15, 64])

Now we can add the input tensor to the output.

In [7]:
x = layers.add([x, residual])
x.shape

TensorShape([None, 15, 15, 64])

## An example of ConvNet with residual blocks
We build a simple convolutional network using three residual blocks, each containing two convolutional layers.   

In [8]:
def residual_block(x, filters, pooling=False):
    residual = x
    x = layers.Conv2D(filters, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(filters, 3, activation="relu", padding="same")(x)
    if pooling:
        x = layers.MaxPooling2D(2, padding="same")(x)
        residual = layers.Conv2D(filters, 1, strides=2)(residual)
    elif filters != residual.shape[-1]:
        residual = layers.Conv2D(filters, 1)(residual)
    x = layers.add([x, residual])
    return x

We double the number of filters of each block, that is of each convolutional layer inside the block, and change their size in two blocks.

In [9]:
inputs = keras.Input(shape=(32, 32, 3))
x = layers.Rescaling(1./255)(inputs)
x = residual_block(x, filters=32, pooling=True)
x = residual_block(x, filters=64, pooling=True)
x = residual_block(x, filters=128, pooling=False)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_3 (InputLayer)        [(None, 32, 32, 3)]          0         []                            
                                                                                                  
 rescaling (Rescaling)       (None, 32, 32, 3)            0         ['input_3[0][0]']             
                                                                                                  
 conv2d_4 (Conv2D)           (None, 32, 32, 32)           896       ['rescaling[0][0]']           
                                                                                                  
 conv2d_5 (Conv2D)           (None, 32, 32, 32)           9248      ['conv2d_4[0][0]']            
                                                                                              