Skip to content

๐Ÿ‘ Vision : Model 5: ResNet : Image Classification

Notifications You must be signed in to change notification settings

florist-notes/CNN-ResNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

30 Commits
ย 
ย 
ย 
ย 

Repository files navigation

CNN-ResNet

๐Ÿ•ต๐Ÿป Vision : Model 5: ResNet : Image Classification

Paper : "Deep Residual Learning for Image Recognition", Talk : CVPR 2016,

2015 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge) Winner in all the three categories - classification, localization & detection.

It is a benchmark competition where teams across the world compete to classify, localize, detect ... images of 1000 categories, taken from the imagenet dataset.The imagenet dataset holds 15M images of 22K categories but for this contest: 1.2M images in 1K categories were chosen.The ILSVRC 2015 classification challenge involves the task of classifying the image into one of 1000 leaf-node categories in the Imagenet hierarchy. There are about 1.2 million images for training, 50,000 for validation and 100,000 images for testing. Each image is associated with one ground truth category, and performance is measured based on the highest scoring classifier predictions. Two numbers are usually reported: the top-1 accuracy rate, which compares the ground truth against the first predicted class, and the top-5 error rate, which compares the ground truth against the first 5 predicted classes: an image is deemed correctly classified if the ground truth is among the top-5, regardless of its rank in them. The challenge uses the top-5 error rate for ranking purposes. Team " MSRA ", achieved top 5 test error rate of 3.6%, It was way too good. Check ILSVRC 2015 results.

This paper is important as it is the first paper which brought down the computer vision error rate ( 3.6% achieved) below human level error of 5-10%. What it meant was that, computers are now less error prone than humans in classifying objects in an image. Several experiments were done to measure human error rate, one of the most famous experiment is covered in Andrej Karpathy's blog on "Experiences with competing against ConvNets on the ImageNet challenge".

Overview

ResNet is a Deep Convolutional Neural Network proposed by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun from Microsoft Research Asia.

Architecture

As per what we have seen so far, increasing the depth should increase the accuracy of the network, as long as over-fitting is taken care of. But the problem with increased depth is that the signal required to change the weights, which arises from the end of the network by comparing ground-truth and prediction becomes very small at the earlier layers, because of increased depth. It essentially means that earlier layers are almost negligible learned. This is called vanishing gradient. The second problem with training the deeper networks is, performing the optimization on huge parameter space and therefore naively adding the layers leading to higher training error. Residual networks allow training of such deep networks by constructing the network through modules called residual models as shown in the figure. This is called degradation problem.

We can see that the ResNet architecture is made up of repeated loop kind of blocks with 2 convolutional layer within. This block is called the "Residual block". A general notion of a residual block can be visualized as,

Imagine a network A, which produces x amount of training error. Construct a network B by adding few layers on top of A and put parameter values in those layers in such a way that they do nothing to the outputs from A. Letโ€™s call the additional layer as C. This would mean the same x amount of training error for the new network. So while training network B, the training error should not be above the training error of A. And since it DOES happen, the only reason is that learning the identity mapping(doing nothing to inputs and just copying as it is) with the added layers-C is not a trivial problem, which the solver does not achieve. To solve this, the module shown above creates a direct path between the input and output to the module implying an identity mapping and the added layer-C just need to learn the features on top of already available input. Since C is learning only the residual, the whole module is called residual module.

The idea behind a residual block is that you have your input x go through conv-relu-conv series. This will give you some F(x). That result is then added to the original input x. Letโ€™s call that H(x) = F(x) + x. In traditional CNNs, your H(x) would just be equal to F(x) right? So, instead of just computing that transformation (straight from x to F(x)), weโ€™re computing the term that you have to add, F(x), to your input, x. Basically, the residual block shown above is computing a โ€œdeltaโ€ or a slight change to the original input x to get a slightly altered representation (When we think of traditional CNNs, we go from x to F(x) which is a completely new representation that doesnโ€™t keep any information about the original x). The authors believe that โ€œit is easier to optimize the residual mapping than to optimize the original, unreferenced mappingโ€.

For Deep Resnets (50+), 1x1 convolutions are introduced to minimize depth, similar to that from GoogLeNets idea.

Andrew NG has explained ResNets in his video Residual Networks (ResNets) and on Why Residual Network Works Well?

model.summary()

Model Visualizations ๐Ÿšด๐Ÿผโ€โ™€๏ธ ResNet-50 ๐Ÿšด๐Ÿผโ€โ™€๏ธ ResNet-101 ๐Ÿšด๐Ÿผโ€โ™€๏ธ ResNet-152

Important Points

1. After only the first 2 layers, the spatial size gets compressed from an input volume of 224x224
   to a 56x56 volume.
2. The group tried a 1202-layer network, but got a lower test accuracy, presumably due to overfitting.
3. Trained on an 8 GPU machine for two to three weeks.
4. Batch Normalization after Every CONV layer.
5. Xavier/2 initialization from He. et al.
6. SGD + Momentum(0.9)
7. Learning rate = 0.1, divivded by 10 when validation error plateaus
8. Mini batch size - 256
9. Weight decay of 1e-5
10. No dropout used

Practical

Will Update : Training...

Versions

1. Layers - 34 / 50 / 101 / 152 [ https://arxiv.org/abs/1512.03385 ]
2. Identity Mapping Deep Residual Networks [ https://arxiv.org/abs/1603.05027 ]
3. Deep Residual Network [ https://arxiv.org/abs/1512.03385 ]
4. Deep Networks with Stochastic Depth [ https://arxiv.org/abs/1603.09382 ]
5. Wide Residual Network [ https://arxiv.org/abs/1605.07146 ]
6. ResNeXt [ https://arxiv.org/abs/1611.05431 ]

The differnt versions of residual blocks can be visualized as,

Reference

  1. Paper Summary

  2. Overview of ResNet

  3. Deep Residual Learning

    CS231n : Lecture 9 | CNN Architectures - ResNet
    

Other related papers: 0. ResNet v2 - 50, 101, 152

  1. Identity Mappings in Deep Residual Networks
  2. Resnet in Resnet: Generalizing Residual Architectures
  3. Residual Networks of Residual Networks: Multilevel Residual Networks

About

๐Ÿ‘ Vision : Model 5: ResNet : Image Classification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published