# GoogLeNet v1

Paper : [Going Deeper with Convolutions](https://arxiv.org/pdf/1409.4842.pdf)

## Difficulties in Training Deep Neural Networks

*Deeper Networks are More Prone to Overfitting*

- Deeper Networks have more parameters
- Greater risk of overfitting, especially when there is a lack of training data
- Not the best way forward as it can be expensive and time consuming to obtain quality training data

*Dramatic Increase in Computational Requirements with Network Size*

- The larger the input size and dimensions, the more computation required
- An example provided in the paper is that the existing convolution layers are densely connected, and scale quadratically with the number of kernels.


## Moving to Sparse Architecture

One possible solution suggested is to move towards sparse connections within neural network building blocks. This can help to:
 - Reduce overfitting as sparse architecture have fewer parameters compared to dense architecture
 - Save of wasted computation time for connections that end up close to zero
 

## The Inception Module

One way of compressing the network is to find layers and kernels in the network that represent largely the same features (i.e. have high correlation). You can then group these filters and use them as the new layer.

<div>
<img src="./assets/GoogLeNet_InceptionModule.png" width = 800px>
</div>


## 1x1 Convolution as Compression

Drawing inspiration for word embeddings, the Inception modules uses 1x1 convolutions (with ReLU activation) as a tensor compressor that learns an embedding of the previous layers output. 

This embedding helps to reduce computational requirements of the different convolutional layers.

In [0]:
import torch
import torch.nn as nn

In [0]:
def sameConv2d(in_channel, out_channel, kernel_size, stride=1):
    return nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding=kernel_size//2)


In [0]:
class InceptionUnit(torch.nn.Module):
    def __init__(self, inChannel, out1x1, reduce3x3, out3x3, reduce5x5, out5x5, poolProj):
        super().__init__()
        self.feature1_1 = nn.Sequential(
            sameConv2d(inChannel, out1x1, kernel_size = 1),
            nn.ReLU(inplace = True)
        )
        self.feature3_3 = nn.Sequential(
            sameConv2d(inChannel, reduce3x3, kernel_size = 1),
            nn.ReLU(inplace = True),
            sameConv2d(reduce3x3, out3x3, kernel_size = 3),
            nn.ReLU(inplace = True)
        )
        self.feature5_5 = nn.Sequential(
            sameConv2d(inChannel, reduce5x5, kernel_size = 1),
            nn.ReLU(inplace = True),
            sameConv2d(reduce5x5, out5x5, kernel_size=  5),
            nn.ReLU(inplace = True)
        )
        self.parallelPool = nn.Sequential(
            nn.MaxPool2d(kernel_size = 3, stride = 1, padding = 1, ceil_mode=True),
            sameConv2d(inChannel, poolProj, kernel_size = 1),
            nn.ReLU(inplace = True)
        )
    
    def forward(self, input):
        out1 = self.feature1_1(input)
        out2 = self.feature3_3(input)
        out3 = self.feature5_5(input)
        out4 = self.parallelPool(input)

        return torch.cat([out1, out2, out3, out4], axis = 1)

## Training a Small GoogLeNet

As the original GoogLeNet is very deep with multiple pooling layers to digest the 224 x 224 ImageNet training images, the original architecture listed in the paper may not be suitable for the 32 x 32 images in CIFAR-10. For demonstration purposes, a smaller GoogLeNet was build following 2 key principles:

 - As deeper network layers tend capture features that occupy larger areas of the input, the proportion oof 3x3 and 5x5 filters should increase as the network grows deeper
 - The same method of 2D Average Pooling to flatten will be used, adjusted for the expected output size of the smaller GoogLeNet

In [0]:
class SmallGoogLeNet(torch.nn.Module):
    def __init__(self, nClass):
        super().__init__()
        self.nClass = nClass
        self.conv1 = nn.Sequential(
            sameConv2d(3, 64, kernel_size=7, stride=2),
            nn.ReLU(inplace=True),
            nn.LocalResponseNorm(5),
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)
        )

        self.inception2 = nn.Sequential(
            InceptionUnit(64, 64, 16, 32, 8, 16, 16),
            InceptionUnit(128, 64, 32, 64, 16, 32, 32),
            InceptionUnit(192, 48, 56, 112, 32, 64, 32),
            InceptionUnit(256, 32, 64, 128, 24, 64, 32),
            InceptionUnit(256, 16, 112, 128, 24, 80, 32),
            nn.AvgPool2d(kernel_size=8, stride=1, ceil_mode=True),
        )

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Dropout(0.4),
            nn.Linear(256, self.nClass)
        )
    
    def forward(self, input):
        out = self.conv1(input)
        out = self.inception2(out)
        out = self.classifier(out)

        return out

In [0]:
from torch.optim import Adam
from torch.nn.init import kaiming_normal_, normal_
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor

from torchvision.datasets import CIFAR10

from helper import training

In [0]:
import os

# seeding the random number generators
# ensures some form of determinism in the outputs 
seed = 2020
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
os.environ['PYTHONHASHSEED']=str(seed)

In [0]:
def initWeight(unit):
    if isinstance(unit, (torch.nn.Linear, torch.nn.Conv2d)):
        kaiming_normal_(unit.weight, nonlinearity='relu')
        normal_(unit.bias)

In [9]:
# We will be using the CIFAR-10 dataset
trainset = CIFAR10(
    root = "../data",
    train = True,
    download = True,
    transform = ToTensor()
)

testset = CIFAR10(
    root = "../data",
    train = False,
    download = True,
    transform = ToTensor()
)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../data/cifar-10-python.tar.gz


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Extracting ../data/cifar-10-python.tar.gz to ../data
Files already downloaded and verified


In [0]:
trainer = training.Trainer(nEpoch=30, logInterval=50)

In [0]:
trainer.addDataloader(
    dataloader = DataLoader(
        trainset, batch_size=256,
        shuffle=True, num_workers=0),
    loaderType = 'train')

trainer.addDataloader(
    dataloader = DataLoader(
        testset, batch_size=256, 
        shuffle=True, num_workers=0),
    loaderType = 'test')

trainer.addLossFn(nn.CrossEntropyLoss())

In [0]:
model = SmallGoogLeNet(10)
model.apply(initWeight)

optimizer = Adam(model.parameters())

In [13]:
trainer.train(model, optimizer)

Epoch [ 1 /30 ]  Batch [ 50  / 196 ]  Loss: 1.9571
Epoch [ 1 /30 ]  Batch [ 100 / 196 ]  Loss: 1.9566
Epoch [ 1 /30 ]  Batch [ 150 / 196 ]  Loss: 1.7039
Epoch [ 1 /30 ]  Batch [ 196 / 196 ]  Loss: 1.8689
Epoch [ 2 /30 ]  Batch [ 50  / 196 ]  Loss: 1.7074
Epoch [ 2 /30 ]  Batch [ 100 / 196 ]  Loss: 1.5226
Epoch [ 2 /30 ]  Batch [ 150 / 196 ]  Loss: 1.5517
Epoch [ 2 /30 ]  Batch [ 196 / 196 ]  Loss: 1.5534
Epoch [ 3 /30 ]  Batch [ 50  / 196 ]  Loss: 1.4778
Epoch [ 3 /30 ]  Batch [ 100 / 196 ]  Loss: 1.5808
Epoch [ 3 /30 ]  Batch [ 150 / 196 ]  Loss: 1.5632
Epoch [ 3 /30 ]  Batch [ 196 / 196 ]  Loss: 1.3423
Epoch [ 4 /30 ]  Batch [ 50  / 196 ]  Loss: 1.3912
Epoch [ 4 /30 ]  Batch [ 100 / 196 ]  Loss: 1.4686
Epoch [ 4 /30 ]  Batch [ 150 / 196 ]  Loss: 1.4015
Epoch [ 4 /30 ]  Batch [ 196 / 196 ]  Loss: 1.3187
Epoch [ 5 /30 ]  Batch [ 50  / 196 ]  Loss: 1.2524
Epoch [ 5 /30 ]  Batch [ 100 / 196 ]  Loss: 1.4099
Epoch [ 5 /30 ]  Batch [ 150 / 196 ]  Loss: 1.4379
Epoch [ 5 /30 ]  Batch [ 196 / 

In [0]:
trainer.test(model, 10)

In [15]:
# Accuracy
torch.true_divide(torch.diagonal(trainer.confMatrix).sum(), trainer.confMatrix.sum()).item()

0.7168999910354614

In [16]:
trainer.confMatrix

tensor([[834,  26,  91,  34,  33,  21,  11,  20, 104,  41],
        [ 15, 801,   2,   8,   1,   5,   0,   1,  17,  52],
        [ 15,   0, 489,  25,  37,  21,  13,   9,   8,   2],
        [ 25,  11,  73, 529,  36, 183,  55,  24,  12,  17],
        [ 12,   1,  90,  70, 645,  34,  28,  30,   2,   3],
        [  3,   4,  44, 121,  12, 533,  13,  22,   3,   2],
        [ 15,  10, 107,  79,  67,  58, 829,  10,   6,   5],
        [ 12,   7,  75,  90, 152, 126,  29, 869,   7,  27],
        [ 38,  24,  13,  19,  11,   7,  12,   2, 808,  19],
        [ 31, 116,  16,  25,   6,  12,  10,  13,  33, 832]])