CNN-ResNet

🕵🏻 Vision : Model 5: ResNet : Image Classification

Paper : "Deep Residual Learning for Image Recognition", Talk : CVPR 2016,

2015 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge) Winner in all the three categories - classification, localization & detection.

It is a benchmark competition where teams across the world compete to classify, localize, detect ... images of 1000 categories, taken from the imagenet dataset.The imagenet dataset holds 15M images of 22K categories but for this contest: 1.2M images in 1K categories were chosen.The ILSVRC 2015 classification challenge involves the task of classifying the image into one of 1000 leaf-node categories in the Imagenet hierarchy. There are about 1.2 million images for training, 50,000 for validation and 100,000 images for testing. Each image is associated with one ground truth category, and performance is measured based on the highest scoring classifier predictions. Two numbers are usually reported: the top-1 accuracy rate, which compares the ground truth against the first predicted class, and the top-5 error rate, which compares the ground truth against the first 5 predicted classes: an image is deemed correctly classified if the ground truth is among the top-5, regardless of its rank in them. The challenge uses the top-5 error rate for ranking purposes. Team " MSRA ", achieved top 5 test error rate of 3.6%, It was way too good. Check ILSVRC 2015 results.

This paper is important as it is the first paper which brought down the computer vision error rate ( 3.6% achieved) below human level error of 5-10%. What it meant was that, computers are now less error prone than humans in classifying objects in an image. Several experiments were done to measure human error rate, one of the most famous experiment is covered in Andrej Karpathy's blog on "Experiences with competing against ConvNets on the ImageNet challenge".

Overview

ResNet is a Deep Convolutional Neural Network proposed by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun from Microsoft Research Asia.

Architecture

As per what we have seen so far, increasing the depth should increase the accuracy of the network, as long as over-fitting is taken care of. But the problem with increased depth is that the signal required to change the weights, which arises from the end of the network by comparing ground-truth and prediction becomes very small at the earlier layers, because of increased depth. It essentially means that earlier layers are almost negligible learned. This is called vanishing gradient. The second problem with training the deeper networks is, performing the optimization on huge parameter space and therefore naively adding the layers leading to higher training error. Residual networks allow training of such deep networks by constructing the network through modules called residual models as shown in the figure. This is called degradation problem.

We can see that the ResNet architecture is made up of repeated loop kind of blocks with 2 convolutional layer within. This block is called the "Residual block". A general notion of a residual block can be visualized as,

Imagine a network A, which produces x amount of training error. Construct a network B by adding few layers on top of A and put parameter values in those layers in such a way that they do nothing to the outputs from A. Let’s call the additional layer as C. This would mean the same x amount of training error for the new network. So while training network B, the training error should not be above the training error of A. And since it DOES happen, the only reason is that learning the identity mapping(doing nothing to inputs and just copying as it is) with the added layers-C is not a trivial problem, which the solver does not achieve. To solve this, the module shown above creates a direct path between the input and output to the module implying an identity mapping and the added layer-C just need to learn the features on top of already available input. Since C is learning only the residual, the whole module is called residual module.

The idea behind a residual block is that you have your input x go through conv-relu-conv series. This will give you some F(x). That result is then added to the original input x. Let’s call that H(x) = F(x) + x. In traditional CNNs, your H(x) would just be equal to F(x) right? So, instead of just computing that transformation (straight from x to F(x)), we’re computing the term that you have to add, F(x), to your input, x. Basically, the residual block shown above is computing a “delta” or a slight change to the original input x to get a slightly altered representation (When we think of traditional CNNs, we go from x to F(x) which is a completely new representation that doesn’t keep any information about the original x). The authors believe that “it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping”.

For Deep Resnets (50+), 1x1 convolutions are introduced to minimize depth, similar to that from GoogLeNets idea.

Andrew NG has explained ResNets in his video Residual Networks (ResNets) and on Why Residual Network Works Well?

model.summary()

Model Visualizations 🚴🏼‍♀️ ResNet-50 🚴🏼‍♀️ ResNet-101 🚴🏼‍♀️ ResNet-152

Important Points

1. After only the first 2 layers, the spatial size gets compressed from an input volume of 224x224
   to a 56x56 volume.
2. The group tried a 1202-layer network, but got a lower test accuracy, presumably due to overfitting.
3. Trained on an 8 GPU machine for two to three weeks.
4. Batch Normalization after Every CONV layer.
5. Xavier/2 initialization from He. et al.
6. SGD + Momentum(0.9)
7. Learning rate = 0.1, divivded by 10 when validation error plateaus
8. Mini batch size - 256
9. Weight decay of 1e-5
10. No dropout used

Practical

Will Update : Training...

Versions

1. Layers - 34 / 50 / 101 / 152 [ https://arxiv.org/abs/1512.03385 ]
2. Identity Mapping Deep Residual Networks [ https://arxiv.org/abs/1603.05027 ]
3. Deep Residual Network [ https://arxiv.org/abs/1512.03385 ]
4. Deep Networks with Stochastic Depth [ https://arxiv.org/abs/1603.09382 ]
5. Wide Residual Network [ https://arxiv.org/abs/1605.07146 ]
6. ResNeXt [ https://arxiv.org/abs/1611.05431 ]

The differnt versions of residual blocks can be visualized as,

Reference

Paper Summary
Overview of ResNet

Deep Residual Learning

CS231n : Lecture 9 | CNN Architectures - ResNet

Other related papers: 0. ResNet v2 - 50, 101, 152

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

README.md

README.md

Repository files navigation

CNN-ResNet

Overview

Architecture

model.summary()

Important Points

Practical

Versions

Reference

About

Releases

Packages

florist-notes/CNN-ResNet

Folders and files

Latest commit

History

img

img

README.md

README.md

Repository files navigation

CNN-ResNet

Overview

Architecture

model.summary()

Important Points

Practical

Versions

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages