# ResNet

ResNet architecture never dies: https://twitter.com/wightmanr/status/1444852719773122565

In [1]:
import torch
from torch import nn
from torch.nn import functional as F

Architectures we have seen before have 2 issues:
* Intuitively, we could think that deeper neural networks are always better. While behind true *in theory*, in practice, it's not. If the *optimal* neural network has 8 layers and we designed one with 9, *in theory*, the last layer to be learned is the identity function. *In practice* it's extremely hard to learn this very function precisely.
<center>
    <img src='images/functionclasses.svg' width=55% style="margin-left:auto; margin-right:auto"/>
    <p style="font-size:14px;">Source: <a href='http://d2l.ai/'>D2L</a></p>
</center>
* Even if **BatchNorm** greatly mitigate the vanishing gradient problem, this issue strikes back when the network becomes very deep.

To mitigate this issue, **ResNet** introduces residual connections between blocks

<center>
    <img src='images/resnet-block.svg' width=55% style="margin-left:auto; margin-right:auto"/>
    <p style="font-size:14px;">Source: <a href='http://d2l.ai/'>D2L</a></p>
</center>

**ResNet** architecture is very similar to **VGG**'s one with 3x3 convolution while introducing **BatchNorm** layers  
The residual connections enforce the number of channels on the input and output of each block to be equal. To do so, a 1x1 convolution layer is applied to the input to have a compatible channel dimension

In [2]:
class Residual(nn.Module):
    def __init__(self, input_channels, num_channels, use_1x1conv=False, strides=1):
        super().__init__()
        self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3,
                               padding=1, stride=strides)
        self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3,
                               padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(input_channels, num_channels,
                                   kernel_size=1, stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)

Same as for **GoogleLeNet**, first block is a 7x7 CNN with 64 output channels with a stride of 2 followed by a 3x3 MaxPooling layer with stride of 2

In [3]:
b1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
                              nn.BatchNorm2d(64), nn.ReLU(),
                              nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

Number of convolutional layers depends on the choosen architecure (i.e. **ResNet-18**, **ResNet-34**, **ResNet-50**, **ResNet-101**, **ResNet-110**, **ResNet-152**, etc.)

<center>
    <img src='images/resnet18.svg' width=15% height=15% style="margin-left:auto; margin-right:auto"/>
    <p style="font-size:14px;">Source: <a href='http://d2l.ai/'>D2L</a></p>
</center>

In [4]:
def resnet_block(input_channels, num_channels, num_residuals):
    blk = []
    for i in range(num_residuals):
        if i == 0 and input_channels != num_channels: # 1x1 applied only once per block if needed
            blk.append(
                Residual(input_channels, num_channels, use_1x1conv=True,
                         strides=2))
        else:
            blk.append(Residual(num_channels, num_channels))
    return blk

**ResNet-18**: 1 Convolution layer in the initial block, 16 in the ResNet blocks and 1 final linear layer

In [5]:
b2 = nn.Sequential(*resnet_block(64, 64, 2))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))

In [6]:
net = nn.Sequential(b1, b2, b3, b4, b5, nn.AdaptiveAvgPool2d((1, 1)),
                    nn.Flatten(), nn.Linear(512, 10))

In [7]:
from torchinfo import summary
summary(net, input_size=(32, 3, 224, 224))

Layer (type:depth-idx)                   Output Shape              Param #
Sequential                               --                        --
├─Sequential: 1-1                        [32, 64, 56, 56]          --
│    └─Conv2d: 2-1                       [32, 64, 112, 112]        9,472
│    └─BatchNorm2d: 2-2                  [32, 64, 112, 112]        128
│    └─ReLU: 2-3                         [32, 64, 112, 112]        --
│    └─MaxPool2d: 2-4                    [32, 64, 56, 56]          --
├─Sequential: 1-2                        [32, 64, 56, 56]          --
│    └─Residual: 2-5                     [32, 64, 56, 56]          --
│    │    └─Conv2d: 3-1                  [32, 64, 56, 56]          36,928
│    │    └─BatchNorm2d: 3-2             [32, 64, 56, 56]          128
│    │    └─Conv2d: 3-3                  [32, 64, 56, 56]          36,928
│    │    └─BatchNorm2d: 3-4             [32, 64, 56, 56]          128
│    └─Residual: 2-6                     [32, 64, 56, 56]          --
│