In this kernel, I am going to show you different components of resnet architecture and how to implement each in pytorch.
Read the accompanying blogpost for detailed explanation [here](https://www.jarvislabs.ai/blogs/resnet).
Hang in there!


- [Understanding and Building Resnet from scratch](#building-resnet-from-scratch)
  * [Why it is important to understand resnets?](#understand)
  * [Residual Block](#residual)
  * [Implementing `resnet34` using PyTorch](#implement)
    + [Convolution block](#conv)
    + [Residual block](#residual)
    + [Resnet Layers](#reslayers)
    + [Classifier block](#classifier)
    + [ResNet class](#resnet-class)
    + [Creating Resnet34 model](#create-res34)
  * [Conclusion](#conclusion)
  <a id='understand'></a>

## Why it is important to understand ResNets?

ResNets are the backbone behind most of the modern computer vision architectures.  For a lot of common problems in computer vision, the go-to architecture is `resnet34`. Most of the modern CNN architectures like ResNext, DenseNet are different variants to original `resnet` architecture.

Understanding the functioning of the `resnet` model helps us while building custom architectures for problems like image classification, segmentation, and object detection. For example, when using the `resnet` model as the backbone for image segmentation using U-net architecture, we create skip connections between different blocks of encoder and decoder. So it becomes a lot easier to understand and build these architectures later on since we already know how resnet is built. Understanding these architecture helps in guessing the output shapes of each `resnet` block which is in turn added to different decoder blocks.

ResNets were introduced in the [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) paper by Kaiming He et al.


###  imports
Let's grab some imports

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
import pdb

## Resnet architecture from torchvision 
First things first , let's understand what are we trying to build here. 
Pull out resnet34 architecture from Pytorch models.

In [None]:
models.resnet34()

We are going to build this from scratch in next sections.

<a id='implement' aria-hidden='true'></a>
## Building Resnet

### Input

Grab random input to understand how feature maps are generated in convolution layer.

Pytorch nn.Conv2d layer expects inputs of (batch_size, channels, height, width).

We are considering batch size of 2, 3 RGB color channels, height and width of 224 each. It is a rank 4 tensor.

In [None]:
inp = torch.randn([2,3,224,224])

In [None]:
inp.shape # bs,rgb channels , width,height

 <a id='conv' aria-hidden='true'></a>

### Convolution block

First we are going to construct a small sequential network for initial convolution block of resnet.
It's the very first layer in our resnet architecture.

It consists of 4 operations,

1. Convolution
2. Batch Normalization
3. ReLU activation function
4. Maxpooling

In [None]:
conv_block = nn.Sequential(nn.Conv2d(3,64,kernel_size=7, stride=2, padding=3, bias=False), #112,112
                       nn.BatchNorm2d(64),
                       nn.ReLU(inplace=True),
                       nn.MaxPool2d(kernel_size=3, stride=2, padding=1)) # 56,56
conv_block

Conv2d layer takes 3 input channels and generate 64 filters/channels i.e feature maps. We can choose how many features we want to generate by convolution operation. 

Here, kernel size is 7 X 7 , stride is 2 and padding 3. Padding adds border of zeros around your input matrix.

Conv2d layer downsamples input when stride is equal to 2, i.e convolution window skips over 1 pixel. 

So after conv2d layer output shape will be (2,64,112,112) tensor of activations.
Basically height and width of input grid reduces to half.

In [None]:
#inp=nn.Conv2d(3,64,kernel_size=7, stride=2, padding=3, bias=False)(inp)
#inp.shape

After maxpooling operation with stride of 2 again output from conv2d->batchNorm->Relu gets downsampled to half.

Let's check what is shape of output after conv_block operations.

In [None]:
out=conv_block(inp)
out.shape

In [None]:
list(models.resnet34().children())[:4]

<a id='residual' aria-hidden='true'></a>
###  Residual block

![res_block](https://raw.githubusercontent.com/jarvislabsai/blog/master/build_resnet34_pytorch/images/res_block1.png)
After 2 convolution operations input of those 2 convolution is added to their output.

### Basic Block
Each basic block constitutes of 2 convolution operations.

Each convolutional layer is followed by a batch normalization layer and a ReLU activation function. 

except if downsample has to be applied the 2nd conv layer's output is added to input before applying relu.

In [None]:
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super().__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1,
                     padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out

In [None]:
BasicBlock(64,128)

In [None]:
t = torch.randn((2,64,56,56))
t.shape

In [None]:
BasicBlock(64,64)(t).shape

<a id='reslayers' aria-hidden='true'></a>

### make_layer()



Next 4 layers after initial conv_block are called as layer 1, 2, 3, 4 respectively.
Each layer consists of multiple convolution blocks.
Conv_block refers to set of operations as in Convolution->BatchNorm->ReLU activation to an input.
Instead of adding new layers to create a deeper neural network, resnet authors added many conv_block within each layer, thus keeping depth of neural network same - 4 layers.

In the PyTorch implementation they distinguish between the blocks that includes 2 operations – Basic Block – and the blocks that include 3 operations – Bottleneck Block.

The make_layer() function takes which type of block to use as an argument, the number of input and output filters,  and number of blocks to be stacked together.
Every layer downsamples the input at the start using stride equals to 2 i.e for 1st convolutional layer in 1st block of a layer.
For all the rest convolution layers stride is 1.
Also, if downsample has to be applied to input stride 2 convolution is used followed by BatchNorm.

In [None]:
def _make_layer(block, inplanes,planes, blocks, stride=1):
    downsample = None  
    if stride != 1 or inplanes != planes:
        downsample = nn.Sequential(            
            nn.Conv2d(inplanes, planes, 1, stride, bias=False),
            nn.BatchNorm2d(planes),
        )
    layers = []
    layers.append(block(inplanes, planes, stride, downsample))
    inplanes = planes
    for _ in range(1, blocks):
        layers.append(block(inplanes, planes))
    return nn.Sequential(*layers)

In [None]:
layers=[3, 4, 6, 3]

In [None]:
layer1 =_make_layer(BasicBlock, inplanes=64,planes=64, blocks=layers[0])
layer1

In [None]:
list(models.resnet34().children())[4]

In [None]:
layer2 = _make_layer(BasicBlock, 64, 128, layers[1], stride=2)
layer2

In [None]:
list(models.resnet34().children())[5]

###  why we need to downsample input

In [None]:
t = torch.rand((2,64,56,56))
t.shape #batch, RGB channels/filters,width,height

In [None]:
o = nn.Conv2d(64,128,3,2,1)(t)
o.shape

In [None]:
#o+t

Uncomment the above code to see what RuntimeError you get.
Pytorch says -
RuntimeError: The size of tensor a (28) must match the size of tensor b (56) at non-singleton dimension 3
Height and width of both the tensors are not matching  56,56 !=28,28

To apply convolution we need to have width and height dimentions of 2 tensors same. That is why downsampling is done here.

In [None]:
t_d =nn.Conv2d(64,128,1,2,0)(t)
o.shape,t_d.shape

In [None]:
(o+t_d).shape

Ah! it works now!

<a id='classifier' aria-hidden='true'></a>
### Classifier block


This is a fully connected layer. It's the final layer in our resnet architecture.
It consists of 3 operations -

1. Average pooling layer - aggregates all features. It's output grid size is 1X 1 which forms rank 3 tensor of (512, 1,1)

2. Flatten- Our loss function expects a vector of tensor instead of rank 3 tensor. 
	So in forward() we call
	 'torch.flatten' to remove any unit axis from matrix and make it just a vector of length 512.
	```python 
	x = torch.flatten(x, 1)
	```

3. Linear Layer - It's a fully connected layer just before applying softmax, which takes in 512 features and outputs 1000 class probabilities.
(In case of Imagenet, we have 1000 categories.)

In [None]:
num_classes=1000
nn.Sequential(nn.AdaptiveAvgPool2d((1, 1)),
              nn.Linear(512 , num_classes))

Let's confirm our classifier block matches Pytorch's resnet implementation

In [None]:
list(models.resnet34().children())[8:]

<a id='resnet-class' aria-hidden='true'></a>

## Resnet class

In [None]:
class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000):
        super().__init__()
        
        self.inplanes = 64

        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 , num_classes)


    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None  
   
        if stride != 1 or self.inplanes != planes:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes, 1, stride, bias=False),
                nn.BatchNorm2d(planes),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        
        self.inplanes = planes
        
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)
    
    
    def forward(self, x):
        x = self.conv1(x)           # 224x224
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)         # 112x112

        x = self.layer1(x)          # 56x56
        x = self.layer2(x)          # 28x28
        x = self.layer3(x)          # 14x14
        x = self.layer4(x)          # 7x7

        x = self.avgpool(x)         # 1x1
        x = torch.flatten(x, 1)     # remove 1 X 1 grid and make vector of tensor shape 
        x = self.fc(x)

        return x

In the comments, I have mentioned how output size changes after every layer.

<a id='create-res34' aria-hidden='true'></a>
###  Creating Resnet34 model

In [None]:
def resnet34():
    layers=[3, 4, 6, 3]
    model = ResNet(BasicBlock, layers)
    return model

In [None]:
model=resnet34()

In [None]:
model

<a id='conclusion' aria-hidden='true'></a>
## Conclusion

The most important concept in resnet in the residual block. Residual blocks enable building neural networks with 1000's of layers deep. Skip connections without adding much of overload on the network preserves information from initial layers till the last. 

We have learned how to build resnet34 architecture from scratch. We can extend it to deeper models like resnet50, 101, and 152 using BottleNeck Block as in PyTorch.  

Please consider upvoting the kernel, if you found something new to learn from it. Thank you for staying with me this long :)