![](./img/deeper.gif)

## LeNet 1998

##### original paper: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

![image.png](attachment:image.png)

#### Keras Implementation 

In [1]:
import tensorflow.keras as keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D,Dense,Flatten

In [8]:
lenet = keras.Sequential()

lenet.add(Conv2D(filters = 6, kernel_size = (5,5), strides = (1,1),activation="tanh",input_shape=(32,32,3)))
lenet.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))

lenet.add(Conv2D(filters = 16, kernel_size = (5,5),strides = (1,1),activation="tanh"))
lenet.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))

lenet.add(Flatten())

lenet.add(Dense(units=120,activation="tanh"))
lenet.add(Dense(units=84,activation="tanh"))
lenet.add(Dense(units=10,activation="softmax"))


In [9]:
lenet.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 6)         456       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 10, 10, 16)        2416      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
flatten (Flatten)            (None, 400)               0         
_________________________________________________________________
dense (Dense)                (None, 120)               48120     
_________________________________________________________________
dense_1 (Dense)              (None, 84)                1

## AlexNet  (2012)
###### modified LeNet
###### original paper:https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

###### Main Points

###### Trained the network on ImageNet data, which contained over 15 million annotated images from a total of over 22,000 categories.
###### Used ReLU for the nonlinearity functions (Found to decrease training time as ReLUs are several times faster than the conventional tanh function).
###### Used data augmentation techniques that consisted of image translations, horizontal reflections, and patch extractions.
###### Implemented dropout layers in order to combat the problem of overfitting to the training data.
###### Trained the model using batch stochastic gradient descent, with specific values for momentum and weight decay.
###### Trained on two GTX 580 GPUs for five to six days.

![image.png](attachment:image.png)

## VGG Net (2014)

###### original paper:https://arxiv.org/pdf/1409.1556.pdf

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![](./img/vgg.png)

###### Main Points

###### The use of only 3x3 sized filters is quite different from AlexNet’s 11x11 filters in the first layer and ZF Net’s 7x7 filters. The authors’ reasoning is that the combination of two 3x3 conv layers has an effective receptive field of 5x5. This in turn simulates a larger filter while keeping the benefits of smaller filter sizes. One of the benefits is a decrease in the number of parameters. Also, with two conv layers, we’re able to use two ReLU layers instead of one.
###### 3 conv layers back to back have an effective receptive field of 7x7.
###### As the spatial size of the input volumes at each layer decrease (result of the conv and pool layers), the depth of the volumes increase due to the increased number of filters as you go down the network.
###### Interesting to notice that the number of filters doubles after each maxpool layer. This reinforces the idea of shrinking spatial dimensions, but growing depth.
###### Worked well on both image classification and localization tasks. The authors used a form of localization as regression (see page 10 of the paper for all details).
###### Built model with the Caffe toolbox.
###### Used scale jittering as one data augmentation technique during training.
###### Used ReLU layers after each conv layer and trained with batch gradient descent.
###### Trained on 4 Nvidia Titan Black GPUs for two to three weeks.

## GoogLeNet or InceptionNet(2015)
##### original paper: https://arxiv.org/pdf/1409.4842.pdf

![image.png](attachment:image.png)

### Inception Module

![image.png](attachment:image.png)

###### Main Points

###### Used 9 Inception modules in the whole architecture, with over 100 layers in total! Now that is deep…
###### No use of fully connected layers! They use an average pool instead, to go from a 7x7x1024 volume to a 1x1x1024 volume. This saves a huge number of parameters.
###### Uses 12x fewer parameters than AlexNet.
###### During testing, multiple crops of the same image were created, fed into the network, and the softmax probabilities were averaged to give us the final solution.
###### Utilized concepts from R-CNN (a paper we’ll discuss later) for their detection model.
###### There are updated versions to the Inception module (Versions 6 and 7).
###### Trained on “a few high-end GPUs within a week”.


## Microsoft ResNet (2015)
##### original paper: https://arxiv.org/pdf/1512.03385.pdf

![](./img/resnet.png)

### Residual Block

![image.png](attachment:image.png)

###### “Ultra-deep” – Yann LeCun.
###### 152 layers…
###### Interesting note that after only the first 2 layers, the spatial size gets compressed from an input volume of 224x224 to a 56x56 volume.
###### Authors claim that a naïve increase of layers in plain nets result in higher training and test error (Figure 1 in the paper).
###### The group tried a 1202-layer network, but got a lower test accuracy, presumably due to overfitting.
###### Trained on an 8 GPU machine for two to three weeks.

## Comporasion on ImageNet 

![image.png](attachment:image.png)

### Refferances 
###### https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
###### https://www.youtube.com/playlist?list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF