<a href="https://colab.research.google.com/github/CS7140/PA-7/blob/main/Q3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Rajesh Sakhamuru

11-7-2020
# GoogLeNet Implementation


In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

In [2]:
strategy = tf.distribute.MirroredStrategy()

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)


In [5]:
class inceptionBlock(tf.keras.Model):
    
    def __init__(self, n1, n2, n3, n4):
        super().__init__()
        self.branch1 = tf.keras.layers.Conv2D(n1, kernel_size=1, activation='relu')

        self.branch2 = tf.keras.models.Sequential([
                tf.keras.layers.Conv2D(n2[0], 1, activation='relu'),
                tf.keras.layers.Conv2D(n2[1], 3, padding='same', activation='relu')]) 
        
        self.branch3 = tf.keras.models.Sequential([
                tf.keras.layers.Conv2D(n3[0], 1, activation='relu'),
                tf.keras.layers.Conv2D(n3[1], 5, padding='same', activation='relu')]) 

        self.branch4 = tf.keras.models.Sequential([
                tf.keras.layers.MaxPool2D(3, 1, padding='same'),
                tf.keras.layers.Conv2D(n4, 1, activation='relu')]) 
        
    def call(self, X):
        return tf.keras.layers.Concatenate()([self.branch1(X),self.branch2(X),self.branch3(X),self.branch4(X)])

In [15]:
with strategy.scope():
    (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()

    train_labels = tf.convert_to_tensor(train_labels)
    test_labels = tf.convert_to_tensor(test_labels)

    train_images = train_images.reshape((train_images.shape[0], 28, 28, 1))
    test_images = test_images.reshape((test_images.shape[0], 28, 28, 1))

    # resizes images and pads resized image with 0s so convolutions+pooling don't shrink
    # resolution too much
    train_images = tf.image.resize_with_pad(train_images, 96, 96)
    test_images = tf.image.resize_with_pad(test_images, 96, 96)

    train_images = tf.convert_to_tensor((train_images) / 255)
    test_images = tf.convert_to_tensor((test_images) / 255)

In [35]:
with strategy.scope():

    googLeNet = tf.keras.models.Sequential([
            tf.keras.layers.Conv2D(64, 7, strides=2, padding='same', activation='relu', input_shape=(96, 96, 1)),
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'),
            tf.keras.layers.Conv2D(64, 1, activation='relu'),
            tf.keras.layers.Conv2D(192, 3, padding='same', activation='relu'),
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'),
            inceptionBlock(64, (96, 128), (16, 32), 32),
            inceptionBlock(128, (128, 192), (32, 96), 64), 
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'),
            inceptionBlock(64, (96, 128), (16, 32), 32),
            inceptionBlock(128, (128, 192), (32, 96), 64), 
            inceptionBlock(128, (128, 192), (32, 96), 64),
            inceptionBlock(128, (128, 192), (32, 96), 64), 
            inceptionBlock(256, (160, 320), (32, 128), 128), 
            tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'),
            inceptionBlock(64, (96, 128), (16, 32), 32),
            inceptionBlock(128, (128, 192), (32, 96), 64), 
            tf.keras.layers.GlobalAvgPool2D(),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(10)])


    optimizer = tf.keras.optimizers.SGD(learning_rate=0.02)
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    googLeNet.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

    googLeNetHistory = googLeNet.fit(train_images, train_labels, epochs=15, batch_size=64, validation_data=(test_images, test_labels), verbose=2)

Epoch 1/15
938/938 - 46s - loss: 2.0254 - accuracy: 0.2891 - val_loss: 1.9307 - val_accuracy: 0.3842
Epoch 2/15
938/938 - 44s - loss: 0.7287 - accuracy: 0.7129 - val_loss: 0.6991 - val_accuracy: 0.7354
Epoch 3/15
938/938 - 44s - loss: 0.5134 - accuracy: 0.8059 - val_loss: 0.7775 - val_accuracy: 0.7070
Epoch 4/15
938/938 - 44s - loss: 0.4324 - accuracy: 0.8382 - val_loss: 0.4823 - val_accuracy: 0.8154
Epoch 5/15
938/938 - 44s - loss: 0.3834 - accuracy: 0.8571 - val_loss: 0.4184 - val_accuracy: 0.8525
Epoch 6/15
938/938 - 44s - loss: 0.3516 - accuracy: 0.8694 - val_loss: 0.4246 - val_accuracy: 0.8371
Epoch 7/15
938/938 - 44s - loss: 0.3251 - accuracy: 0.8781 - val_loss: 0.3367 - val_accuracy: 0.8743
Epoch 8/15
938/938 - 44s - loss: 0.3042 - accuracy: 0.8862 - val_loss: 0.3360 - val_accuracy: 0.8717
Epoch 9/15
938/938 - 44s - loss: 0.2861 - accuracy: 0.8935 - val_loss: 0.3132 - val_accuracy: 0.8844
Epoch 10/15
938/938 - 44s - loss: 0.2722 - accuracy: 0.8977 - val_loss: 0.3500 - val_accura

# Conclusion

The number of model parameters for GoogLeNet is 3,749,146 (as can be seen below) which is an order of magnitude lower than the number of model parameters for VGG-11 which had 22,121,802 parameters and AlexNet which had 21,598,922 parameters for similarly performing networks. That noted, VGG-11 still performed ~1.5% better than GoogLeNet and AlexNet.

The reason GoogLeNet is able to significantly reduce model parameter size is because the Inception layer used 9 times within the architecture of GoogLeNet uses convolutions of different sizes (5x5, 3x3, 1x1) to capture detail within the images. Particularly the 1x1 convolutions within the inception layers allow to significantly reduce the dimension of data input into the 5x5 and 3x3 convolutions and avoid the blowing up of dimensions as the network gets deeper. 

Also, by replacing the fully connected layers of AlexNet and VGG, GoogLeNet instead uses a global average pooling layer after the last convolutional layer. This additionally drastically reduces the number of parameters in the network because the image recognition features are drawn out and mapped in the convolutional layers instead in GoogLeNet.

In [37]:
googLeNet.summary()

Model: "sequential_227"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_411 (Conv2D)          (None, 48, 48, 64)        3200      
_________________________________________________________________
max_pooling2d_95 (MaxPooling (None, 24, 24, 64)        0         
_________________________________________________________________
conv2d_412 (Conv2D)          (None, 24, 24, 64)        4160      
_________________________________________________________________
conv2d_413 (Conv2D)          (None, 24, 24, 192)       110784    
_________________________________________________________________
max_pooling2d_96 (MaxPooling (None, 12, 12, 192)       0         
_________________________________________________________________
inception_block_63 (inceptio (None, 12, 12, 256)       163696    
_________________________________________________________________
inception_block_64 (inceptio (None, 12, 12, 480)    