## Image classification GoogLeNet

***NOTE***

Be sure to use hardware acceleration to use the GPU. Click on `Runtime`, change `runtime type`, and select `GPU` for the *hardware accelerator* option.

* Add batch nomalisation to GoogLeNet
* Remove the preprocessing which was applied in the ```resize_images``` function which was responsible to for the normalisation, and instead, add a batch normalisation layer before the first convolutional layer in the network. You will run into problems calling the ```summary()``` function depending on where you add batch norm.
* How does the performance compare in both cases versus GoogLeNet without batch normalisation?

## Imports first

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.datasets import fashion_mnist
import numpy as np
import matplotlib.pyplot as plt
from keras.utils import np_utils
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.preprocessing import LabelEncoder
import pandas as pd
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dropout, Flatten
%matplotlib inline

import tensorflow as tf
print(tf.__version__)

# Module 1
The first module uses a 64-channel 7x7 convolutional layer.

In [None]:
model = Sequential()

# 2D conv
model.add(tf.keras.layers.Conv2D(64, 7, strides=2, padding='same',
                                   activation='relu', input_shape = (96,96,1)))

model.add(BatchNormalization())
# Max pooling
model.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2,
                                      padding='same'))

# Module 2

In [None]:
# 1x1 conv
model.add(tf.keras.layers.Conv2D(64, 1, activation='relu'))

# 2D 3x3 conv
model.add(tf.keras.layers.Conv2D(192, 3, padding='same', activation='relu'))

# Pooling
model.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'))

In [None]:
class Inception(tf.keras.Model):
    # `c1`--`c4` are the number of output channels for each branch
    def __init__(self, c1, c2, c3, c4):
        super().__init__()

        # TO DO

    def call(self, x):
      
        # TO DO

        # Concatenate
        return tf.keras.layers.Concatenate()([b1, b2, b3, b4])

In [None]:
model.add(Inception(64, (96, 128), (16, 32), 32))

The number of output channels of the second Inception block is increased to 128+192+96+64=480

In [None]:
model.add(Inception(128, (128, 192), (32, 96), 64))

# Remember based on the diagram above, a pooling layer is applied after each module
model.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'))

# Module 4
The fourth module is more complicated. It connects five Inception blocks in series

* block 1 outputs: 192+208+48+64 = 512

* block 2 outputs: 160+224+64+64 = 512

* block 3 outputs: 128+526+64+64 = 512

* block 3 outputs: 112+228+64+64 = 528

* block 5 outputs: 256+320+128+128 = 832

Each block is learning a different number of features due to usual reasons but also due to difference in number of channels from 1x1 conv.

In [None]:
model.add(Inception(192, (96, 208), (16, 48), 64))
model.add(Inception(160, (112, 224), (24, 64), 64))
model.add(Inception(128, (128, 256), (24, 64), 64))
model.add(Inception(112, (144, 288), (32, 64), 64))
model.add(Inception(256, (160, 320), (32, 128), 128))

# Remember based on the diagram above, a pooling layer is applied after each module
model.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'))

# Module 5

Two inception blocks

* block 1 output: 256+320+128+128=832

* block 2 output: 384+384+128+128=1024

* This block uses the global average pooling layer to change the height and width of each channel to 1, just as in NiN.

In [None]:
model.add(Inception(256, (160, 320), (32, 128), 128))
model.add(Inception(384, (192, 384), (48, 128), 128))
model.add(tf.keras.layers.GlobalAvgPool2D())
model.add(tf.keras.layers.Flatten())

# Output

In [None]:
model.add(tf.keras.layers.Dense(10, activation="softmax"))

<hr>

## Load the dataset

In [None]:
# load data
(X_train, Y_train), (X_test, Y_test) = tf.keras.datasets.fashion_mnist.load_data()

## Find the unique numbers from the train labels

In [None]:
classes = np.unique(Y_train)
nClasses = len(classes)
print('Total number of outputs : ', nClasses)
print('Output classes : ', classes)

## Reshape needed

Keras wants to know the depth of an image. 

For CNNS, Keras wants the format of the data as follows: [batches, width, height, depth]. 

In this case the colour channel/depth of the images is 1. Currently the shape is:

But this doesn't have a depth value. So we can reshape it

In [None]:
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], X_train.shape[2], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], X_test.shape[2], 1))

## Convert from categorical labels to one-hot encoded vectors

In this case there are 10 classes so we can tell the function to convert into a vector of length 10

In [None]:
Y_train = np_utils.to_categorical(Y_train, 10)
Y_test = np_utils.to_categorical(Y_test, 10)
num_classes = 10

## Small twist!

API: https://www.tensorflow.org/api_docs/python/tf/data/Dataset

In [None]:
train_ds = tf.data.Dataset.from_tensor_slices((X_train, Y_train))
test_ds = tf.data.Dataset.from_tensor_slices((X_test, Y_test))

In [None]:
def resize_images(image, label):
    # Normalize images to have a mean of 0 and standard deviation of 1
    image = tf.image.per_image_standardization(image)

    image = tf.image.resize(image, (96,96))
    return image, label

In [None]:
train_ds = (train_ds
                  .map(resize_images)
                  .shuffle(buffer_size=10000)
                  .batch(batch_size=64, drop_remainder=True))
test_ds = (test_ds
                  .map(resize_images)
                  .batch(batch_size=32, drop_remainder=False))

In [None]:
model.compile(loss='categorical_crossentropy',
             optimizer=tf.keras.optimizers.Adam(learning_rate=0.005),
             metrics=['accuracy'])

## Begin training

In [None]:
model.fit(train_ds, epochs=2, batch_size=32, verbose=1)

## Predict on all the test data

In [None]:
predictions = model.predict(test_ds)

In [None]:
predictions.shape

In [None]:
correct_values = np.argmax(Y_test,axis=-1)
predicted_classes = np.argmax(predictions,axis=-1)

In [None]:
accuracy_score(predicted_classes,correct_values)*100

More efficient:

In [None]:
targets = []
for x,y in test_ds.as_numpy_iterator():
  targets.extend(y)

In [None]:
np.asarray(targets).shape

In [None]:
accuracy_score(predicted_classes,np.argmax(targets,axis=-1))*100