# ResNet architecture

While building too deep a network, there are two problems. In forward propagation,
the last few layers of the network have almost no information about what the original
image was. In backpropagation, the first few layers near the input hardly get any
gradient updates due to vanishing gradients (in other words, they are almost zero).
To solve both problems, residual networks (ResNet) use a highway-like connection
that transfers raw information from the previous few layers to the later layers. In
theory, even the last layer will have the entire information of the original image due
to this highway network. And because of the skipping layers, the backward gradients
will flow freely to the initial layers with little modification.

The term residual in the residual network is the additional information that the
model is expected to learn from the previous layer that needs to be passed on to the
next layer.

A typical residual block appears as follows:

![imgs](./imgs/trans1.png)

As you can see, while so far, we have been interested in extracting the F(x) value,
where x is the value coming from the previous layer, in the case of a residual
network, we are extracting not only the value after passing through the weight layers,
which is F(x), but are also summing up F(x) with the original value, which is x.

So far, we have been using standard layers that performed either linear or
convolution transformations F(x) along with some non-linear activation. Both of
these operations in some sense destroy the input information. For the first time, we
are seeing a layer that not only transforms the input, but also preserves it, by adding
the input directly to the transformation – F(x) + x . This way, in certain scenarios,
the layer has very little burden in remembering what the input is, and can focus on
learning the correct transformation for the task.

In [1]:
import torch
from torch import nn

In [None]:
class ResLayer(nn.Module):
     def __init__(self,ni,no,kernel_size,stride=1):
        super(ResLayer, self).__init__()
        padding = kernel_size - 2
        self.conv = nn.Sequential(
            nn.Conv2d(ni, no, kernel_size, stride, 
                      padding=padding),
            nn.ReLU()
        )

    def forward(self, x):
        return self.conv(x) + x

