## VGGNet

### About VGGNet...
- Published as a conference paper at ICLR 2015 from the Visual Geometry Group (VGG) @ Oxford
- The original paper **Very Deep Convolutional Networks for Large-Scale Image Recognition** can be found at <https://arxiv.org/abs/1409.1556>
- Secured the first and second place in the localization and classification tracks (respectively) of ILSVRC 2014
- Proposed 4 variants of CNN architecture with 11, 13, 16 and 19 layers (Yes, 19 layers was as deep as they could go back in 2014!)

### Importing the necessary libraries

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

### Complete architecture...

![](https://cdn-5f733ed3c1ac190fbc56ef88.closte.com/wp-content/uploads/2017/03/VGGNet.png "VGG Architecture")

### Defining architecture skeletons...

- All the convolution layers use 3x3 filters with padding=1 and stride=1, thereby maintaining the spatial resolution
- The integers represent the number of filters to be used in a convolutional layer
- The letter 'M' represents the maxpooling layer with kernel_size=2 and stride=2 (i.e. it reduces the spatial resolution by a factor of 2)
- Designed to work with RGB images of size 224x224 (very common in ImageNet) aimed at 1000-class classification
- Though not included in the original paper, we will use BatchNormalization layer after every convolution layer and use Relu activations along with Dropout layers in the FC portion of the network


In [2]:
VGG_archs = {
    'VGG11': [64, 'M',  128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M']
}

In [7]:
class VGGNet(nn.Module):
  def __init__(self, in_channels=3, num_classes=1000):
    super(VGGNet, self).__init__()
    self.in_channels = in_channels
    self.conv_layers = self.create_conv_layers(VGG_archs['VGG16'])
    self.fcs = nn.Sequential(
        nn.Linear(512*7*7, 4096),
        nn.ReLU(),
        nn.Dropout(p=0.5),

        nn.Linear(4096, 4096),
        nn.ReLU(),
        nn.Dropout(p=0.5),

        nn.Linear(4096, num_classes)
        )
  
  def forward(self, x):
    x = self.conv_layers(x)
    x = x.reshape(x.shape[0], -1)
    x = self.fcs(x)
    return x
  
  def create_conv_layers(self, architecture):
    layers = []
    in_channels = self.in_channels

    for layer in architecture:
      if type(layer) == int:
        out_channels = layer

        layers += [nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                             kernel_size=(3,3), stride=(1,1), padding=(1,1)),
                   nn.BatchNorm2d(out_channels),
                   nn.ReLU()]
        in_channels = out_channels #updating input channels for the next layer...
      elif layer == 'M':
        layers += [nn.MaxPool2d(kernel_size=(2,2), stride=(2,2))]

    print(len(layers)) #returns the total number of layers before the FC layers (including BN, activations, etc...)
    print(type(layers[0]), type(layers[1]), type(layers[2]), type(layers[6])) #should be Conv2d, BatchNorm2d, ReLU and MaxPool2d
    return nn.Sequential(*layers)

### Sanity check...

In [8]:
model = VGGNet(in_channels=3, num_classes=1000)
x = torch.randn(5, 3, 224, 224)
print(model(x).shape)

44
<class 'torch.nn.modules.conv.Conv2d'> <class 'torch.nn.modules.batchnorm.BatchNorm2d'> <class 'torch.nn.modules.activation.ReLU'> <class 'torch.nn.modules.pooling.MaxPool2d'>
torch.Size([5, 1000])
