<h1> **RESNET IMPLEMENTATION** </h1>


---


Importing important libraries and dependencies

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim 

import torchvision
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import matplotlib.image as img

torch.set_grad_enabled(True)

<h1> Basics architectural knowledge about ResNet </h1>

ResNet stands for "Residual Network" which is a very logical name because, we stores the output of some network as **residue** and pass it as input in further layers in network. The basic aim of this network is to make a able to make very deep networks and yet not suffer with the problems like - 

1.   Vanishing Gradient
2.   Overfitting

And usage of **residue** further in network ensures that features will be reused and important informations are passed further in network that ensures meaningful features will be made. 

We can see the effects of vanishing gradient which messes with network in such a way that deeper network shows more error. We basically want to go deeper to prevent overfitting but the trade-off is too high. So, ResNet comes to rescue.

![alt text](https://miro.medium.com/max/935/1*McwAbGJjA1lV_xBdg1w5XA.png)












<h1> Overview of ResNet Architecture </h1>

There are two types of blocks (combination of standard layers) we will be making -


1.   Base Block - This is the simplest block of the following form 

  ![alt text](https://miro.medium.com/max/2678/1*BCbJZXwGDtEdytj9ag_YWw.png)



*   1st weight layer is for changing the number of channels
*   Then we activate it
*   Then 2nd weight layer is for changing height and width of the filter
*   Then we add the residue **X** to this output before activating it.
*   Then relu(output + x)
---




2.   BottleNeck Block - Since these research papers are made by really iterative process, another kind of block is preffered for deeper ResNet versions like ResNet50 or higher.

  ![alt text](https://miro.medium.com/max/2621/1*sb_4xKI_bRoX6jmZcNTRWw.png)

*   1st weight layer is for changing the number of channels
*   Then we activate it
*   Then 2nd weight layer is for changing height and width of the filter
*   Then we activate it
*   Then we pass it to a layer which expands the number of channels (mainly by a factor of 4)
*   Then we add the residue **X** to this output before activating it. Sometime the dimension of **residue** is not same as the output of block. So, we perform a comvolution operation to make it to the size of output.
*   Then relu(output + x)



In [0]:
class baseblock(nn.Module):
  expansion=1
  def __init__(self,input_planes,planes,stride=1,kernel_size=1,dim_change=None):
    super().__init__()
    #layer to change the dimension
    self.conv1 = nn.Conv2d(input_planes,planes,kernel_size=1)
    self.bn1 = nn.BatchNorm2d(planes)
    #layer to change Height width of image
    self.conv2 = nn.Conv2d(planes,planes,kernel_size=3,stride=stride,padding=1)
    self.bn2 = nn.BatchNorm2d(planes)
    self.dim_change = nn.Sequential()
    if stride!=1 or input_planes!=planes*expansion:
      self.dim_change = nn.Sequential(nn.Conv2d(input_planes,planes*expansion,kernel_size=1,stride=stride)
                                      nn.BatchNorm2d(planes*expansion))
    
    def forward(self,x):
      res = x
      #layer 1
      out = self.conv1(x)
      out = self.bn1(out)
      out = F.relu(x)
      #layer 2
      out = self.conv2(out)
      out = self.bn1(out)

      out+=self.dim_change(res)
      out=F.relu(out)
      return out


class bottleneck(nn.Module):
  expansion=4
  def __init__(self,input_planes,planes,stride=1,kernel_size=1,dim_change=None):
    super().__init__()
    #layer to change the dimension
    self.conv1 = nn.Conv2d(input_planes,planes,kernel_size=1)
    self.bn1 = nn.BatchNorm2d(planes)
    #layer to change Height width of image
    self.conv2 = nn.Conv2d(planes,planes,kernel_size=3,stride=stride,padding=1)
    self.bn2 = nn.BatchNorm2d(planes)
    #layer for channel expansion and nothing else
    self.conv3 = nn.Conv2d(planes,planes*self.expansion,kernel_size=1)
    self.bn3 = nn.BatchNorm2d(planes*self.expansion)
    self.dim_change = nn.Sequential()
    if stride != 1 or input_planes != planes*self.expansion:
      self.dim_change = nn.Sequential(nn.Conv2d(input_planes,planes*self.expansion,kernel_size=1,stride=stride),
                                      nn.BatchNorm2d(planes*self.expansion))

  def forward(self,x):
    res=x

    t=self.conv1(x)
    t=self.bn1(t)
    t=F.relu(t)
    #t = F.Relu(self.bn1(self.conv1(x)))

    t=self.conv2(t)
    t=self.bn2(t)
    t=F.relu(t)
    #t = F.Relu(self.bn2(self.conv2(t)))

    t=self.conv3(t)
    t=self.bn3(t)
    #t = self.bn3(self.conv3(t))

    t+=self.dim_change(res)
    t=F.relu(t)

    return t

<h1> What a ResNet Network looklike? </h1>

Resnet architecture can be implemented in many ways using those blocks we studied before. Below are some standard configurations and ways of using ResNet to get nicely optimised output.
![alt text](https://miro.medium.com/max/1849/1*aq0q7gCvuNUqnMHh4cpnIw.png)

Below are the two types of block we have implemented before.

![alt text](https://miro.medium.com/max/1103/1*zS2ChIMwAqC5DQbL5yD9iQ.png)

In this article we are going to implement ResNet50 architecture from scratch using these blocks and train CIFAR-10 dataset. We are basically going to stack the blocks according to the suggestion of research paper and train our model accordingly.

![alt text](https://cv-tricks.com/wp-content/uploads/2019/07/ResNet50_architecture-1.png)

Dotted lines are used in network whenever the dimension of residue is different than the output. So at those places, we adjust the residue according to the output using convolution operation.

In [0]:
class ResNet(nn.Module):
  def __init__(self,block,num_layers,classes=10):
    super().__init__()
    self.input_planes=64
    self.conv1 = nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1)
    self.bn1 = nn.BatchNorm2d(64)
    self.layer1 = self.layer_x(block,64,num_layers[0],stride=1)
    self.layer2 = self.layer_x(block,128,num_layers[1],stride=2)
    self.layer3 = self.layer_x(block,256,num_layers[2],stride=2)
    self.layer4 = self.layer_x(block,512,num_layers[3],stride=2)
    self.avgpool = nn.AvgPool2d(kernel_size=4,stride=1)
    self.fc  = nn.Linear(512*block.expansion,classes) 

  def layer_x(self,block,planes,num_layers,stride):
    layers = []
    layers.append(block(self.input_planes,planes,stride=stride))
    self.input_planes = block.expansion*planes
    for i in range(1,num_layers):
      layers.append(block(self.input_planes,planes))
      self.input_planes = planes * block.expansion
    return nn.Sequential(*layers)

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.layer1(out)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = self.avgpool(out)
    out = out.view(out.size(0), -1)
    out = self.fc(out)
    return out

<h1> Analysis of data and implentation of network </h1>

The CIFAR-10 dataset consists of 60000 32x32x3 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

<h2>Step 1</h2>

    self.conv1 = nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1)
    self.bn1 = nn.BatchNorm2d(64)

32x32x3 image --> conv1 --> 32x32x64 feature maps

<h2>Step 2</h2>

    self.layer1 = self.layer_x(block,64,num_layers[0],stride=1)

We will break it into 2 parts
  1. **Overview** - 
32x32x64 --> layer1x3 --> 32x32x256
  2. **Detailed** - 

  residue = 32x32x64

*   32x32x64 --> conv1 --> 32x32x64
*   32x32x64 --> conv2 --> 32x32x64
*   32x32x64 --> conv2 --> 32x32x256 (output of bottleNeck)

  residue is not equal to output !! So,


    self.dim_change = nn.Sequential(nn.Conv2d(input_planes,planes*self.expansion,kernel_size=1,stride=stride),
                                      nn.BatchNorm2d(planes*self.expansion))
                                    
  32x32x64 --> conv2d --> 32x32x256 (Final output of bottleNeck which has to activated)

Now this output will be fed to the same bottleNeck Block.

Again 2 times into the block but with stride=1 - 32x32x256(prev output) --> bottleNeck --> 32x32x256(new output)


<h2>Step 3</h2>

    self.layer2 = self.layer_x(block,128,num_layers[1],stride=2)

We will break it into 2 parts
  1. **Overview** - 
32x32x256 --> layer2x4 --> 32x32x512
  2. **Detailed** - 

  residue = 32x32x256

*   32x32x256 --> conv1 --> 32x32x128
*   32x32x128 --> conv2 --> 16x16x128
*   16x16x128 --> conv2 --> 16x16x512 (output of bottleNeck)

  residue is not equal to output !! So,


    self.dim_change = nn.Sequential(nn.Conv2d(input_planes,planes*self.expansion,kernel_size=1,stride=stride),
                                      nn.BatchNorm2d(planes*self.expansion))

   32x32x256 --> conv2d --> 16x16x512 (Final output of bottleNeck which has to activated) 

Now this output will be fed to the same bottleNeck Block.

Again 3 times into the block but with stride=1 - 16x16x512(prev output) --> bottleNeck --> 16x16x512(new output)

<h2>Step 4</h2>

    self.layer3 = self.layer_x(block,256,num_layers[2],stride=2)


We will break it into 2 parts
  1. **Overview** - 
16x16x512 --> layer3x6 --> 8x8x1024
  2. **Detailed** - 

  residue = 16x16x512

*   16x16x512 --> conv1 --> 16x16x256
*   16x16x256 --> conv2 --> 8x8x256
*   8x8x256 --> conv2 --> 8x8x1024 (output of bottleNeck)

  residue is not equal to output !! So,


    self.dim_change = nn.Sequential(nn.Conv2d(input_planes,planes*self.expansion,kernel_size=1,stride=stride),
                                      nn.BatchNorm2d(planes*self.expansion))

   16x16x512 --> conv2d --> 8x8x1024 (Final output of bottleNeck which has to activated) 

Now this output will be fed to the same bottleNeck Block.

Again 2 times into the block but with stride=1 - 8x8x1024(prev output) --> bottleNeck --> 8x8x1024(new output)


<h2>Step 5</h2>

    self.layer3 = self.layer_x(block,512,num_layers[3],stride=2)


We will break it into 2 parts
  1. **Overview** - 
8x8x1024 --> layer4x3 --> 4x4x2048
  2. **Detailed** - 

  residue = 8x8x1024

*   8x8x1024 --> conv1 --> 8x8x512
*   8x8x512 --> conv2 --> 4x4x512
*   4x4x512 --> conv2 --> 4x4x2048 (output of bottleNeck)

  residue is not equal to output !! So,


    self.dim_change = nn.Sequential(nn.Conv2d(input_planes,planes*self.expansion,kernel_size=1,stride=stride),
                                      nn.BatchNorm2d(planes*self.expansion))

   8x8x1024 --> conv2d --> 4x4x2048 (Final output of bottleNeck which has to activated) 

Now this output will be fed to the same bottleNeck Block.

Again 2 times into the block but with stride=1 - 4x4x2048(prev output) --> bottleNeck --> 4x4x2048(new output)

<h2>Step 6</h2>

    self.avgpool = nn.AvgPool2d(kernel_size=4,stride=1)

4x4x2048 --> AvgPool --> 1x1x2048 (Output that should be unsqueezed and be fed to fully-connected layer)

<h2>Step 7</h2>

    self.fc  = nn.Linear(512*block.expansion,classes) 
The output is unsqueezed using -
    
    out = out.view(out.size(0), -1)





In [0]:
def test():

  #To convert data from PIL to tensor
  transform = transforms.Compose(
      [transforms.ToTensor(),
      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
      )

  #Load train and test set:
  train = torchvision.datasets.CIFAR10(root='./data',train=True,download=True,transform=transform)
  trainset = torch.utils.data.DataLoader(train,batch_size=128,shuffle=True)

  test = torchvision.datasets.CIFAR10(root='./data',train=False,download=True,transform=transform)
  testset = torch.utils.data.DataLoader(test,batch_size=128,shuffle=False)
  
  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  print(device)

  #ResNet-18 
  #net = ResNet(baseBlock,[2,2,2,2],10)

  #ResNet-50
  net =  ResNet(bottleneck,[3,4,6,3])
  net.to(device)
  costFunc = nn.CrossEntropyLoss()
  optimizer =  optim.SGD(net.parameters(),lr=0.02,momentum=0.9)

  for epoch in range(1):
    closs = 0
    for i,batch in enumerate(trainset,0):
        data,output = batch
        data,output = data.to(device),output.to(device)
        prediction = net(data)
        loss = costFunc(prediction,output)
        closs = loss.item()

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        #print every 1000th time
        if i%100 == 0:
            print('[%d  %d] loss: %.4f'% (epoch+1,i+1,closs/1000))
            closs = 0

  correctHits=0
  total=0
  for batches in testset:
      data,output = batches
      data,output = data.to(device),output.to(device)
      prediction = net(data)
      _,prediction = torch.max(prediction.data,1)  #returns max as well as its index
      total += output.size(0)
      correctHits += (prediction==output).sum().item()
  print('Accuracy = '+str((correctHits/total)*100))

In [0]:
test()