# Residual Netwworks

1. <u>ResNets (2015)</u> is an architecture that has taken over both Language and Image Tasks ever since its inception in 2015. The original papers showed great improvements in the on Image-net and Cifar Datasets by using **Residaul Blocks** to build very very deep networks. Resnets took the depth from previous state of the art of ~20 to 150+


##  Benefits
1. Solved for **training error degraration** (NOT due to overfitting) often seen in deep learning networks. Note: This is diff from the vanishing gradient problem and overfitting problem usually seen.
    * Notice how the red line is higher than the green line during training when the network is deeper (56 Layers) vs 20 Layers


<img src="PreResnetTrainingError.png" width="400" height="400">

<img src="ResnetBuildingBlock.png" width="400" height="400">


## Mathematics of Resnets

* Instead of forcing the network to learn a complete transformation, we prefer that it learns a residual transformation


## Setup

In [73]:
import os
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

In [74]:
import torch
from torch.utils.tensorboard import SummaryWriter
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 
writer = SummaryWriter('runs/resnet50')
log_dir = f"{os.getcwd()}/runs"

In [75]:
%%bash
if lsof -Pi :6006 -sTCP:LISTEN -t >/dev/null ; then
    echo "TensorBoard is already running"
else
    tensorboard --logdir=runs --port=6006 &
    echo "TensorBoard has started"
fi


TensorBoard has started



NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.12.3 at http://localhost:6006/ (Press CTRL+C to quit)


Process is interrupted.


In [76]:
import torch 
import torchvision 
import torch.nn as nn
import torch.nn.functional as F

### Torchvision Transforms
* Here we'll use the **scipt mode** of transforms. The script mode provides efficient implementatoin of transforms by making them device compatible (CPU/GPU)
* The wrapper using `ScriptedTransform(torch.nn.Module)` is not necessary. However [torch docs](https://pytorch.org/vision/stable/transforms.html) states that <i>`"For any custom transformations to be used with torch.jit.script, they should be derived from torch.nn.Module."`</i> 
* Additionally, `torchvision.transforms.Compose()` is in-compatible with `torch.jit.script()` and you must use `torch.nn.Sequential`.
* In general, tt's good practice to build modules like this to provide **Compatibility** and **Parameterization**. A common example is having learnable params in in data transformataion pipelines. For now, we don't have any params as such.  
* `torch.jit.script()` **does not** support `torchvision.transforms.ToTensor()`

In [77]:
# class ScriptedDataTransform(torch.nn.Module):
#     def __init__(self):   
#         super().__init__()
#         self.custom_transform = torch.nn.Sequential(
#             torchvision.transforms.Resize((224, 224)), 
#             torchvision.transforms.RandomHorizontalFlip(),
#             torchvision.transforms.ToTensor(), 
#             torchvision.transforms.Normalize(
#                 mean=(0.485, 0.456, 0.406), 
#                 std=(0.229, 0.224, 0.225)
#             )
#         )
        
#     def forward(self, x):
#         output = self.custom_transform(x)
#         return output

# transformer = ScriptedDataTransform()
# scripted_transformer = torch.jit.script(transformer)

In [78]:
custom_transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize((224, 224)), 
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(), 
    torchvision.transforms.Normalize(
        mean=(0.485, 0.456, 0.406), 
        std=(0.229, 0.224, 0.225)
    )
])

In [84]:
## need access
# imagenet_dataset = torchvision.datasets.ImageNet(
#     root='./data', 
#     split='train', 
#     transform=scripted_transformer, 
#     download=True
# )

# Download and load training dataset
trainset = torchvision.datasets.CIFAR10(root='data/', train=True, download=True, transform=custom_transform)
trainset_loader = torch.utils.data.DataLoader(trainset, batch_size=5, shuffle=True, drop_last=True)

Files already downloaded and verified


In [85]:
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=custom_transform)
testset_loader = torch.utils.data.DataLoader(testset, batch_size=5, shuffle=False, drop_last=False)

Files already downloaded and verified


In [86]:
print(f"size of trainset = {len(trainset)}")
print(f"size of trainloader = {len(trainset_loader)}")
print(f"size of testset = {len(testset)}")
print(f"size of testloader = {len(testset_loader)}")

size of trainset = 50000
size of trainloader = 10000
size of testset = 10000
size of testloader = 2000


In [87]:
first_train_batch = next(iter(trainset_loader))
train_x = first_train_batch[0]
train_y  = first_train_batch[1]
print(f'train X shape: {train_x.shape} train Y: {train_y}')

train X shape: torch.Size([5, 3, 224, 224]) train Y: tensor([7, 1, 4, 3, 1])


## Model

### Residual Block

<img src="ResnetCNN.png" width="400" height="400">

* The image on the left shows a 34-layer ResNet
* Some blocks are normal while some require **downsampling**

Diagram explaination
1. **7x7 conv, 64, /2** refers to `kernel size=7` `out_channels=64` and `stride=2`
2. **pool, /2** refers to halfing H and W by using a `stride=2` and `max/avg pool kernel_size=2`
3. **3x3 conv, 64** refers to  `kernel size=3` and `out_channels=64` and `stride=1 [default]`
4. The **dotted lines** represent `downsampling` in order for us to makes the tensor shapes match while doing `x = x + residual` where `residual=downsample(x)`



**Excerpt from the Paper**
* Residual Network. Based on the above plain network, we insert shortcut connections (Fig. 3, right) which turn the network into its counterpart residual version. The identity shortcuts (Eqn.(1)) can be directly used when the input and output are of the same dimensions (solid line shortcuts in Fig. 3). 

* When the dimensions increase (dotted line shortcuts in Fig. 3), we consider two options: 
    * (A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; 
    * (B) The projection shortcut in Eqn.(2) is used to match dimensions (done by 1×1 convolutions). For both options, when the shortcuts go across feature maps of two sizes, they are performed with a stride of 2.



In [94]:
class ResidualBlock(torch.nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, downsample=False):
        """
        The `downsample param is useful when the image size is reduced while chaining together the 
        residual blocks`
        """
        super().__init__()
        self.conv = torch.nn.Conv2d(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding
        )
        self.batchnorm = torch.nn.BatchNorm2d(num_features=out_channels)
        self.relu = torch.nn.ReLU()
        
    def forward(self, input_x):
        train_x, train_y = input_x

        x = self.relu(self.batchnorm(self.conv(train_x)))  # purple rectangle from above diagram
        
        x = self.batchnorm(self.conv(x))
        if self.downsample:
            residual = self.downsample(train_x)
            
        x += residual
        
        out = self.relu(x)
        
        return out   

In [95]:
residual_block = ResidualBlock(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=0)
residual_block(first_train_batch)

torch.Size([5, 3, 224, 224]) torch.Size([5])


AttributeError: 'ResidualBlock' object has no attribute 'downsample'

### Resnet 
* Here we'll utilize the ResidualBlock prepared above to build our Resnet
* We'll first start with [see diagram above]
    * A simple layer
    * following by max pool
    * followed by a bunch of residual blocks 
    * followed by fC layers
    
**Note**: We won't be constructing the original 150 layer reset. That's too big to train on PC. However, we'll try to emulate the 34-layer network 

<img src="ResnetFullDiagram.png" width="400" height="400">

In [None]:
import torch.nn.Conv2d as conv2d
import torch.nn.BatchNorm as batchnorm
import torch.nn.ReLU as relu

class Resnet(torch.nn.Module)
    def __init__(self):
        
        # inital block
        self.layer1 = relu(
            batchnorm(
                conv2d(
                    in_channels=3, 
                    out_channels=16, 
                    kernel_size=3, 
                    stride=2
                )
            )
        )
        
        self.maxpool = torch.nn.MaxPool2d(kernel_size=2, stride=2)
        
        # -------- ResidulalBlocks -------------
        # Purple block of 3 skip connection blocks
        self.block1 = self._make_block(num_layers_in_block=3, in_channels=64, out_channels=64, kernel_size=3)
        
        # Green block of 1 downsample (dotted line) and 3 normal
        self.block2 = self._make_block(num_layers_in_block=4, in_channels=64, out_channels=64, kernel_size=3)
        
        # Red block of 1 downsample and 5 normal 
        self.block3 = ResidualBlock(in_channel=16, out_channels=32, kernel_size=3,stride=2) 
        
        # Blue block of 1 downsample and 2 normal 
        self.block4 = ResidualBlock(in_channel=16, out_channels=32, kernel_size=3,stride=2) 
       
        # final layers
        self.avg_pool = torch.nn.AvgPool2d()
        self.relu = torch.nn.ReLU()
        self.fc = nn.Linear(in_features=64, out_features=10)
        
    def _make_block(
        self,
        num_layers_in_block,
        in_channels,
        out_channels,
        padding,
        stride, downsample
    ):
        """
        Helper function to create residual blocks on demand
        """
        layers = []
        
        if stride != 1 or in_channels != out_channels:
            print("We'll be connecting via dotted line to match the shapes")
            dotted_block = ResidualBlock(in_channels=in_channels, out_channels=out_channels, stride=2)
            
        layers.append(dotted_block)
        for i in range(1, num_layers_in_block):
            layers.append(ResidualBlock(in_channel=in_channels, out_channels=out_channels, kernel_size=3,stride=1))
    
        
    def forward(self):
        
        
        
    

 

## Read More
* what is bottleneck
* Advances in Resnets over the years
* Uses of Resnets over the years - example Transformers. 
* Signifance of 1X1 convoluations for dimenstionality reduction across the channel axis. 

In [51]:
import torch

B = 2
C = 3
H = 10
W = 10

a = torch.randn(B, C, H, W).clone().detach()

input_t = torch.tensor(a, dtype=torch.float32)
input_t.shape


  input_t = torch.tensor(a, dtype=torch.float32)


torch.Size([2, 3, 10, 10])

* As you can saee below, the entire Volume of shape C, K, K gets compressed into 1 scalar value. This is done in two step
    * element wise matrix mul (also called as conv/correlation)
    * squish the volume into a scalar by simply adding all the values up. 


In [52]:
conv = torch.nn.Conv2d(in_channels=C, out_channels=8, stride=1, kernel_size=5, bias=False)
kernel_weights = conv._parameters['weight'][0] # the weights are (3,(5,5))

input_to_convolve = a[0][:, :5, :5]
input_to_convolve.shape, kernel_weights.shape

corr = torch.multiply(input_to_convolve, kernel_weights)
torch.sum(corr).item()

0.7579530477523804

In [53]:
output_t = conv(input_t)
first_channel_h_w = output_t[0][0]
first_channel_h_w[0][0].item()

0.7579531073570251