### We look at ResNet model more closely here.

The resnet paper is available here if you are interested :
https://arxiv.org/pdf/1512.03385.pdf

ref:https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035

# Residual Network

why?

Even though given enough capacity , a feedforward network with a single layer should be enough to represent any function but practically the network is prone to overfit the data. So we tried making our models deeper.

But with deeper network comes the problem of vanishing gradients, as the gradient is back propogated to earlier layers repeted multiplication may make the gradient infinitely small. So performances decreases.

The core idea of ResNet is introducing these shortcut connection that skips one or more layers.

![skip_connections](https://miro.medium.com/max/510/1*ByrVJspW-TefwlH7OLxNkg.png)

The authors were of the opinion that stacking layers should not degrade the network performance, because they were stacking identity mapping (layers which don't do anything). 

Authors in their paper claim that 


>We present comprehensive experiments on ImageNet
[36] to show the degradation problem and evaluate our
method. We show that: 1) Our extremely deep residual nets
are easy to optimize, but the counterpart “plain” nets (that
simply stack layers) exhibit higher training error when the
depth increases; 2) Our deep residual nets can easily enjoy
accuracy gains from greatly increased depth, producing results substantially better than previous networks.



![](https://miro.medium.com/max/1144/1*2ns4ota94je5gSVjrpFq3A.png)

The weird connections that we see here is that the next layer not only takes input from the previous layer but also from the layer implemented before that. One we are going to implement will have three layers in between. 

In [2]:
import torch
import torch.nn as nn

![](https://www.researchgate.net/publication/334288428/figure/tbl1/AS:778196211466240@1562547843868/Architectures-for-ResNet34-ResNet50-and-ResNet101-in-this-paper-Building-blocks-are.png)

In all resnet her we can see that in beginning it dows a convolution, at conv1 with 7 x 7 (the kernel size), 64 (number of channels) and then stride of convolution is 2.

It does not mention padding but given the output size we can say that the padding should be 3. Then it has 4 different resnet layers

Look at resnet 50, in the first resnet layer 3 layers repeated  times so a total of 9 layers. The second resnet layer has 12, third has 18 and and fourth has 9. so ( 9 + 12 + 18 + 9 + 2 ( layers conv1 and maxpool) makes it 50.

We can see that each resnet layer has decreased the input by half. Also the input channel in last layer is always 4 times the starting input layer. this we have catered for in ourcode by `self.expansion` variable.
for example look at resnet 50 conv2 block, here starting kernel size is 64 which eventually becomes 256


since we are going to use these block architecture multiple times in resnet architecture let us define the block first. 

In [49]:
class block(nn.Module):
    def __init__(
        self, in_channels, intermediate_channels, identity_downsample=None, stride=1
    ):
        super(block, self).__init__()
        self.expansion = 4
        self.conv1 = nn.Conv2d(
            in_channels, intermediate_channels, kernel_size=1, stride=1, padding=0, bias=False
        )
        self.bn1 = nn.BatchNorm2d(intermediate_channels)
        self.conv2 = nn.Conv2d(
            intermediate_channels,
            intermediate_channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            bias=False
        )
        self.bn2 = nn.BatchNorm2d(intermediate_channels)
        self.conv3 = nn.Conv2d(
            intermediate_channels,
            intermediate_channels * self.expansion,
            kernel_size=1,
            stride=1,
            padding=0,
            bias=False
        )
        self.bn3 = nn.BatchNorm2d(intermediate_channels * self.expansion)
        self.relu = nn.ReLU()
        self.identity_downsample = identity_downsample
        self.stride = stride

    def forward(self, x):
        identity = x.clone()

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.conv3(x)
        x = self.bn3(x)

        if self.identity_downsample is not None:
            identity = self.identity_downsample(identity)

        x += identity
        x = self.relu(x)
        return x

In [50]:
class ResNet(nn.Module):
    def __init__(self, block, layers, img_channels, num_classes):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(img_channels, 64, kernel_size=7, stride=2, padding=3)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        #resnet layers
        self.layer1 = self._make_layer(block, layers[0], intermediate_channels=64, stride=1)
        self.layer2 = self._make_layer(block, layers[1], intermediate_channels=128, stride=2)
        self.layer3 = self._make_layer(block, layers[2], intermediate_channels=256, stride=2)
        self.layer4 = self._make_layer(block, layers[3], intermediate_channels=512, stride=2)
        
        self.avgpool = nn.AdaptiveAvgPool2d((1,1))
        self.fc = nn.Linear(512*4, num_classes)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.avgpool(x)
        x = x.reshape(x.shape[0], -1)
        x = self.fc(x)
        
        return x
        
        
    def _make_layer(self,block, num_residual_blocks , intermediate_channels, stride):
        identity_downsample = None
        layers = []
        
        if stride != 1 or self.in_channels != intermediate_channels * 4:
            # if its not the first conv or if the out_channels * 4 valueis not matching
            identity_downsample = nn.Sequential(nn.Conv2d(self.in_channels, intermediate_channels*4, kernel_size=1, stride=stride),
                                                nn.BatchNorm2d(intermediate_channels*4))
            
        tempblock = block(self.in_channels, intermediate_channels, identity_downsample, stride)
        layers.append(tempblock)
        self.in_channels = intermediate_channels * 4
            
        for i in range(num_residual_blocks-1):
            layers.append(block(self.in_channels, intermediate_channels))
            
        return nn.Sequential(*layers)
        

In [51]:
def ResNet50(img_channels=3, num_classes=1000):
    return ResNet(block, [3,4,6,3], img_channels, num_classes)

In [45]:
def ResNet101(img_channels=3, num_classes=1000):
    return ResNet(block, [3,4,23,3], img_channels, num_classes)

In [46]:
def ResNet152(img_channels=3, num_classes=1000):
    return ResNet(block, [3,8,36,3], img_channels, num_classes)

In [52]:
def test():
    net = ResNet50()
    x= torch.randn(2,3,224,224)
    y =net(x)
    print(y.shape)

test()

torch.Size([2, 1000])
