## ResNet

### About ResNet
- Published as a conference paper at CVPR 2016 from Microsoft Research (MSR)
- The original paper "Deep Residual Learning for Image Recognition" can be found at <https://arxiv.org/abs/1512.03385>
- One of the most cited papers in the modern deep learning era
- Secured first place in the classification track of ILSVRC 2015 and also sweeped the localization challenges in both ILSVRC 2015 / COCO 2015 competitions

### Motivation / Key ideas:
- Back in 2014 / 15, researchers were trying to push the limits to see how deep they could go with neural networks
- Motivated by the success of VGGNet / InceptionNet, they tried extending these architecture to become more deeper, but they saw that the network struggled to learn (or converge)
- In principle, they expected that **deeper networks should perform at least as good as their shallower counterparts**
- In other words, the deeper layers should learn the identity mapping if that is the best thing to do!
- Thus, they added the skip connections, so that the weights in the layers only have to learn the residuals $\mathcal{F}(x)$
![](https://miro.medium.com/v2/resize:fit:1400/1*jNyv5wv-LyXfRb3Ye1q0jA.png "Residual Learning")
- These skip connections provide some kind of **gradient superhighways** during backprop for the gradients to flow without much attenuation, thereby alleviating the vanishing gradients problem!

### Importing the necessary libraries

In [2]:
import torch
import torch.nn as nn

### Implementing the ResNet Architecture
- The authors propose five different variants of the ResNet architectures with 18, 34, 50, 101 and 152 layers
- The complete specification of these architectures is as follows:
![](https://raw.githubusercontent.com/SingularityKChen/PicUpload/master/img/20210403172705.png "ResNet architectures")
- We will be focusing on the deeper architectures, starting with 50 layer onwards (note from the above figure that the only difference between 50, 101 and 152 layer version is the number of times certain blocks get repeated!)



 

### Implementation guidelines:
- We will first implement the **block** class which creates a block containing three conv layers: 1x1, 3x3 and 1x1
    - One thing to be noted here is that, in each of these blocks, the number of output channels is **4** times the number of input channels
- Then, we will implement the **ResNet** class which defines the full architecture following the above table 

In [18]:
class block(nn.Module): 
  def __init__(self, in_channels, out_channels, id_downsample=None, stride=1): 
    ''' 
    Creates a single 3-layer block of resnet architecture (1x1 N filters --> 3x3 N filters --> 1x1 4N filters)
    Args:
    in_channels : number of input channels to the first 1x1 conv layer
    out_channels : number of output channels to the first 1x1 conv layer
    id_downsample : a conv layer that will help us match shapes (if they differ) to perform elementwise addition...
    '''
    super(block, self).__init__()
    self.expansion = 4

    self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)
    self.bn1 = nn.BatchNorm2d(out_channels)
    self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1)
    self.bn2 = nn.BatchNorm2d(out_channels)
    self.conv3 = nn.Conv2d(out_channels, out_channels*self.expansion, kernel_size=1, stride=1, padding=0)
    self.bn3 = nn.BatchNorm2d(out_channels*self.expansion)
    self.relu = nn.ReLU()
    self.id_downsample = id_downsample
    
    if id_downsample:
      print(f'Creating block with in_ch: {in_channels}, out_ch: {out_channels} and downsampling: {type(id_downsample[0])}')
    else:
      print(f'Creating block with in_ch: {in_channels}, out_ch: {out_channels} and downsampling: {type(id_downsample)}')
  def forward(self, x):
    identity = x

    x = self.relu(self.bn1(self.conv1(x)))
    x = self.relu(self.bn2(self.conv2(x)))
    x = self.bn3(self.conv3(x))
    #Now add the skip connection...

    if self.id_downsample is not None:
      identity = self.id_downsample(identity)

    x += identity
    x = self.relu(x)
    return x

In [19]:
class ResNet(nn.Module):
  def __init__(self, block, layers, image_channels, num_classes): 
    '''
    Args:
    layers : a list that contains the number of times the each resnet layer need to be repeated (say [3, 4, 6, 3] for 50 layers)
    image_channels : 1 for Grayscale images and 3 for RGB images
    num_classes : number of output classes
    '''
    super(ResNet, self).__init__()
    self.in_channels = 64
    #Initial layers...
    self.conv1 = nn.Conv2d(image_channels, 64, kernel_size=7, stride=2, padding=3)
    self.bn1 = nn.BatchNorm2d(64)
    self.relu = nn.ReLU()
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

    #ResNet layers begin...
    self.layer1 = self._make_layer(block, layers[0], out_channels=64, stride=1) # at the end we will have 64 * 4 = 256 channels
    self.layer2 = self._make_layer(block, layers[1], out_channels=128, stride=2) # 128 * 4 = 512 channels
    self.layer3 = self._make_layer(block, layers[2], out_channels=256, stride=2) # 1024 channels
    self.layer4 = self._make_layer(block, layers[3], out_channels=512, stride=2) # 2048 channels

    self.avgpool = nn.AdaptiveAvgPool2d((1,1))
    self.fc = nn.Linear(512*4, num_classes)


  def forward(self,x):
    x = self.relu(self.bn1(self.conv1(x)))
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)

    x = self.avgpool(x)
    x = x.reshape(x.shape[0], -1)
    x = self.fc(x)

    return x


  def _make_layer(self, block, num_residual_blocks, out_channels, stride):
    id_downsample = None
    layers = []

    # We will do the necessary transformation using id_downsample if (i) we change the spatial dimensions or 
    # (ii) we change the no of channels
    if stride!=1 or self.in_channels != out_channels*4:
      id_downsample = nn.Sequential(
          nn.Conv2d(self.in_channels, out_channels*4, kernel_size=1, stride = stride ),
          nn.BatchNorm2d(out_channels*4)
      )
    
    #Creating the first residual block that changes the no of channels...
    layers.append(block(self.in_channels, out_channels, id_downsample, stride))
    self.in_channels = out_channels * 4

    #Creating the remaining blocks...
    for i in range(num_residual_blocks-1):
      layers.append(block(self.in_channels, out_channels))

    return nn.Sequential(*layers)


In [20]:
def ResNet50(img_channels=3, num_classes=1000):
  return ResNet(block, [3, 4, 6, 3], img_channels, num_classes)

def ResNet101(img_channels=3, num_classes=1000):
  return ResNet(block, [3, 4, 23, 3], img_channels, num_classes)

def ResNet152(img_channels=3, num_classes=1000):
  return ResNet(block, [3, 8, 36, 3], img_channels, num_classes)


### Sanity check...

In [21]:
def sanity_check():
  net = ResNet50()
  x = torch.rand(2, 3, 224, 224)
  y = net(x)
  print(y.shape)

In [22]:
sanity_check()

Creating block with in_ch: 64, out_ch: 64 and downsampling: <class 'torch.nn.modules.conv.Conv2d'>
Creating block with in_ch: 256, out_ch: 64 and downsampling: <class 'NoneType'>
Creating block with in_ch: 256, out_ch: 64 and downsampling: <class 'NoneType'>
Creating block with in_ch: 256, out_ch: 128 and downsampling: <class 'torch.nn.modules.conv.Conv2d'>
Creating block with in_ch: 512, out_ch: 128 and downsampling: <class 'NoneType'>
Creating block with in_ch: 512, out_ch: 128 and downsampling: <class 'NoneType'>
Creating block with in_ch: 512, out_ch: 128 and downsampling: <class 'NoneType'>
Creating block with in_ch: 512, out_ch: 256 and downsampling: <class 'torch.nn.modules.conv.Conv2d'>
Creating block with in_ch: 1024, out_ch: 256 and downsampling: <class 'NoneType'>
Creating block with in_ch: 1024, out_ch: 256 and downsampling: <class 'NoneType'>
Creating block with in_ch: 1024, out_ch: 256 and downsampling: <class 'NoneType'>
Creating block with in_ch: 1024, out_ch: 256 and d