## Inception Module

The general idea behind the inception module is to create an architecture where the input can be passed through different types of layers at once. In order to extract distinct features parallelly and finally concatenate them later. This is done so that the model can learn both local and abstract features which in turn enhances model performance.

In the actual model proposed in the paper, the inception module branches into four distinct paths.

+ The first path learns local features using a convolutional layer with 1×1 filters
+ The second path first applies 1×1 convolutions for dimensionality reduction. In order to prepare the input to be passed through 3×3 convolutions.
+ The third path is the same as the second one. The only difference is that this time we use 5×5 convolutions. Both the second and the third branches are tasked with learning the general features in images.
+ This is known as a pool projection branch. It applied 3×3 max pooling before learning features using a 1×1 convolutional layer.

These branches apply operations on the same input(same in value, not the same instance) parallelly and are later concatenated. In order to ensure that the concatenation of outputs can be performed, the same padding is used across the module.

![](./fig/inception_block.png)

## Convolutional Module Implementation

![](./fig/conv_module.png)

**Arguments:**  
input: input to be processed  
filters: number of filters that should be in the Conv2D layer.  
kernel_size: Size of the filters.  
strides: stride of the filters.  
padding: Predefined to 'same' for the whole model by default.

In [11]:
def conv_module(input_layer, filters, kernel_size, strides, padding="same"):
    input_layer = Conv2D(filters=filters,kernel_size=kernel_size,strides=strides,padding=padding)(input_layer)
    input_layer = BatchNormalization()(input_layer)
    input_layer = Activation("relu")(input_layer)
    return input_layer

## Inception Module Implementation

We define our modified inception module by using two conv_modules. The first module is initialized with:

+ 1×1 filters(used to learn local features in images).
+ The second and third with 3×3 and 5×5 respectively(responsible for learning general features).
+ Define the pool projection layer using a global pooling layer.
+ After that, we concatenate the layer outputs along the channel dimension.

In [15]:
def Inception_block(input_layer, f1_conv1, f2_conv1, f2_conv3, f3_conv1, f3_conv5, f4_conv1): 
    # Input: 
    # - f1_conv1: number of filters of the 1x1 convolutional layer in the first path
    # - f2_conv1, f2_conv3 are number of filters corresponding to the 1x1 and 3x3 convolutional layers in the second path
    # - f3_conv1, f3_conv5 are the number of filters corresponding to the 1x1 and 5x5  convolutional layer in the third path
    # - f4_conv1: number of filters of the 1x1 convolutional layer in the fourth path

    # 1st path:
    path1 = conv_module(input_layer, filters=f1_conv1, kernel_size=(1,1), strides=(1,1))

    # 2nd path
    path2 = conv_module(input_layer, filters=f2_conv1, kernel_size=(1,1), strides=(1,1))
    path2 = conv_module(path2, filters=f2_conv3, kernel_size=(3,3), strides=1)

    # 3rd path
    path3 = conv_module(input_layer, filters=f3_conv1, kernel_size = (1,1), strides=(1,1))
    path3 = conv_module(path3, filters=f3_conv5, kernel_size = (5,5), strides=(1,1))

    # 4th path
    path4 = MaxPooling2D(pool_size=(3,3), strides=(1,1), padding = 'same')(input_layer)
    path4 = conv_module(path4, filters=f4_conv1, kernel_size=(1,1), strides=(1,1))

    # concatenation
    output_layer = concatenate([path1, path2, path3, path4], axis=-1)

    return output_layer

In [16]:
def GoogleNet():
    # input layer 
    input_layer = Input(shape = (224, 224, 3))
    
    # convolutional layer: filters = 64, kernel_size = (7,7), strides = 2
    X = Conv2D(filters = 64, kernel_size = (7,7), strides = 2, padding = 'valid', activation = 'relu')(input_layer)
    
    # max-pooling layer: pool_size = (3,3), strides = 2
    X = MaxPooling2D(pool_size = (3,3), strides = 2)(X)

    # convolutional layer: filters = 64, strides = 1
    X = Conv2D(filters = 64, kernel_size = (1,1), strides = 1, padding = 'same', activation = 'relu')(X)

    # convolutional layer: filters = 192, kernel_size = (3,3)
    X = Conv2D(filters = 192, kernel_size = (3,3), padding = 'same', activation = 'relu')(X)

    # max-pooling layer: pool_size = (3,3), strides = 2
    X = MaxPooling2D(pool_size= (3,3), strides = 2)(X)

    # 1st Inception block
    X = Inception_block(X, f1 = 64, f2_conv1 = 96, f2_conv3 = 128, f3_conv1 = 16, f3_conv5 = 32, f4 = 32)

    # 2nd Inception block
    X = Inception_block(X, f1 = 128, f2_conv1 = 128, f2_conv3 = 192, f3_conv1 = 32, f3_conv5 = 96, f4 = 64)

    # max-pooling layer: pool_size = (3,3), strides = 2
    X = MaxPooling2D(pool_size= (3,3), strides = 2)(X)

    # 3rd Inception block
    X = Inception_block(X, f1 = 192, f2_conv1 = 96, f2_conv3 = 208, f3_conv1 = 16, f3_conv5 = 48, f4 = 64)

    # Extra network 1:
    X1 = AveragePooling2D(pool_size = (5,5), strides = 3)(X)
    X1 = Conv2D(filters = 128, kernel_size = (1,1), padding = 'same', activation = 'relu')(X1)
    X1 = Flatten()(X1)
    X1 = Dense(1024, activation = 'relu')(X1)
    X1 = Dropout(0.7)(X1)
    X1 = Dense(5, activation = 'softmax')(X1)


    # 4th Inception block
    X = Inception_block(X, f1 = 160, f2_conv1 = 112, f2_conv3 = 224, f3_conv1 = 24, f3_conv5 = 64, f4 = 64)

    # 5th Inception block
    X = Inception_block(X, f1 = 128, f2_conv1 = 128, f2_conv3 = 256, f3_conv1 = 24, f3_conv5 = 64, f4 = 64)

    # 6th Inception block
    X = Inception_block(X, f1 = 112, f2_conv1 = 144, f2_conv3 = 288, f3_conv1 = 32, f3_conv5 = 64, f4 = 64)

    # Extra network 2:
    X2 = AveragePooling2D(pool_size = (5,5), strides = 3)(X)
    X2 = Conv2D(filters = 128, kernel_size = (1,1), padding = 'same', activation = 'relu')(X2)
    X2 = Flatten()(X2)
    X2 = Dense(1024, activation = 'relu')(X2)
    X2 = Dropout(0.7)(X2)
    X2 = Dense(1000, activation = 'softmax')(X2)


    # 7th Inception block
    X = Inception_block(X, f1 = 256, f2_conv1 = 160, f2_conv3 = 320, f3_conv1 = 32, 
                      f3_conv5 = 128, f4 = 128)

    # max-pooling layer: pool_size = (3,3), strides = 2
    X = MaxPooling2D(pool_size = (3,3), strides = 2)(X)

    # 8th Inception block
    X = Inception_block(X, f1 = 256, f2_conv1 = 160, f2_conv3 = 320, f3_conv1 = 32, f3_conv5 = 128, f4 = 128)

    # 9th Inception block
    X = Inception_block(X, f1 = 384, f2_conv1 = 192, f2_conv3 = 384, f3_conv1 = 48, f3_conv5 = 128, f4 = 128)

    # Global Average pooling layer 
    X = GlobalAveragePooling2D(name = 'GAPL')(X)

    # Dropoutlayer 
    X = Dropout(0.4)(X)

    # output layer 
    X = Dense(1000, activation = 'softmax')(X)

    # model
    model = Model(input_layer, [X, X1, X2], name = 'GoogLeNet')

    return model

In [18]:
model = GoogLeNet()

NameError: name 'GoogLeNet' is not defined