## ResNet

**Paper:** https://arxiv.org/abs/1512.03385 (Kaiming He is everywhere!)

The big idea in ResNet was the residual block. In the res block we do some calculations and at the end we add the identity.

F(x) = F(x) + x

This help with the vanishing gradient problem.

In [4]:
import torch 
import torch.nn as nn

In [79]:
x = torch.randn(1,3,28,28)
x.shape

torch.Size([1, 3, 28, 28])

The important thing to note here is that the size of x after conv1 and conv2 have to match the identity. To ensure this we use padding in the convolution. Otherwise the addition would fail because the shapes don't match up.

In [84]:
class ResBlock(nn.Module):
    def __init__(self):
        super(ResBlock, self).__init__()
        self.conv1 = nn.Conv2d(3,9,3,padding=1)
        self.conv2 = nn.Conv2d(9,3,3,padding=1)
        
    def forward(self, x):
        identity = x
        x = torch.relu(self.conv1(x))
        x = self.conv2(x)
        x = torch.relu(x + identity)
        return x

In [85]:
rb = ResBlock()

In [86]:
y = rb(x)

In [83]:
y.shape

torch.Size([1, 3, 28, 28])