# GoogLeNet for Cats and Dogs predictions

<br>

![](https://www.allaboutpetsprovo.com/wp-content/uploads/2019/09/cat-dog-exchange.jpg)

<center>Image taken from <a href="https://www.allaboutpetsprovo.com/cats-vs-dogs.html">here</a></center>
<br>
<br>

In this lesson, you will build GoogLeNet neural network from **scratch** using the Keras (TensorFlow 2.+) library and train it to recognize images of cats and dogs. Let's start!

### Steps:

1. Import libraries and download the dataset
2. Create an InceptionBlock
3. Build the original GoogLeNet architecture
4. Load data using tensorflow ImageDataGenerator
5. Train the model

### Topics covered and learning objectives

- Load image data from folders using _ImageDataGenerators_
- GoogLeNet model - Implementation and network architecture
- Inception blocks
- Build from scratch GoogLeNet model using Keras (TensorFlow) library

### Time estimates:

- Reading/Watching materials: 1h 45min
- Exercises: 1h 10min
  <br><br>
- **Total**: ~3h


In [1]:
from pathlib import PurePath, Path
import os
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tests import *

from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv2D, Layer, MaxPool2D, GlobalAvgPool2D, Dense, AveragePooling2D, Flatten, Dropout, Input

# For loading YouTube videos
from IPython.display import IFrame

## Download the dataset

Before starting the project, you'll need to download it from Kaggle and place it inside the **data** folder created for you. [Download here](https://www.kaggle.com/c/dogs-vs-cats)

For this exercise, we will use 2,000 images, which is only a subset of the entire dataset of 25,000 images.

![](images/download.png)

NOTE: Download might take a while! It is about 800 MBs

Once downloaded, your zip file will contain **two (2)** zip files.

Extract only these two files:

- train.zip
- test1.zip

Extract both of them in the **data/module_1** folder inside the root directory of the repo for the following to work!

After extracting everything, this was my folder structure:

<pre>
<b>module_1</b>
|__ <b>train</b>
    |______ <b>cats</b>: [cat.0.jpg, cat.1.jpg, cat.2.jpg ...]
    |______ <b>dogs</b>: [dog.0.jpg, dog.1.jpg, dog.2.jpg ...]
|__ <b>test1</b>
    |______ <b>cats</b>: [cat.2000.jpg, cat.2001.jpg, cat.2002.jpg ...]
    |______ <b>dogs</b>: [dog.2000.jpg, dog.2001.jpg, dog.2002.jpg ...]
</pre>

If everything is okay with this step, let's go and build the first part of our network, the InceptionBlock.


In [2]:
# Dataset paths setup - used later in the code. You don't have to change anything here
REPO_DIR = Path(os.getcwd()).parent

# Note: Please put the data into the data folder in the root of the repo for the following to work!
train_dir = REPO_DIR / "data/module_1/train"
validation_dir = REPO_DIR / "data/module_1/test1"

![Inception block](https://cdn.analyticsvidhya.com/wp-content/uploads/2018/10/Screenshot-from-2018-10-17-11-14-10.png)

<center>Image taken from <a href="https://www.analyticsvidhya.com/blog/2018/10/understanding-inception-network-from-scratch/">here</a></center>

<br><br>
As seen on the image above, the Inception Block has 4 different parts which analyze an image (or input to the block) in different ways.

Instead of having only one size representation of a layer input, Inception Block allows us to extract features from different image sizes, make our network more robust and ultimately more accurate.

Okay, but what is the architecture?
It's straightforward to build.

#### 1st part

- One conv layer with a kernel size of 1, ReLu activation

#### 2nd part

- First, conv layer with a kernel size of 1, ReLu activation
- Second, conv layer with a kernel size of 3, ReLu activation and padding same

#### 3rd part

- First, conv layer with a kernel size of 1, ReLu activation
- Second, conv layer with a kernel size of 5, ReLu activation and padding same

#### 4rt part

- First, MaxPool layer with a pool size of 3
- Second, conv layer with a kernel size of 1, ReLu activation

#### 5th part

- Return the concatenation of all 4 channels using tf.concat

Links to learn more about Inception blocks:

Reading:

- https://paperswithcode.com/method/inception-module
- https://deepai.org/machine-learning-glossary-and-terms/inception-module


In [3]:
IFrame("https://www.youtube.com/embed/C86ZXvgpejM", 1000, 500)

**In some cases Ipython widgets do not work!**

If this is the case here is the like for YouTube video from cell above: https://www.youtube.com/watch?v=C86ZXvgpejM


In [4]:
IFrame("https://www.youtube.com/embed/KfV8CJh7hE0", 1000, 500)

**In some cases Ipython widgets do not work!**

If this is the case here is the like for YouTube video from cell above: https://www.youtube.com/embed/KfV8CJh7hE0


In [5]:
IFrame("https://www.youtube.com/embed/STTrebkhnIk", 1000, 500)

**In some cases Ipython widgets do not work!**

If this is the case here is the like for YouTube video from cell above: https://www.youtube.com/embed/STTrebkhnIk


## Exercise 1:

Using the explanations and resources provided, complete the **InceptionBlock** function.


In [6]:
from tensorflow.keras import layers

def InceptionBlock(inputs, filters_1, filters_2, filters_3, filters_4):
    """
    Implement Inception block here

    Args:
        Inputs - previous layer from the network
        filters_1 :int: - Number of filters used in the Part 1 of the Inception block. E.g. 32
        filters_2 :Tuple: - Number of filters used for the two layers of the Part 2 E.g. (32, 32)
        filters_3 :Tuple: - Nmber of filters used for the two layers of the Part 3 E.g. (32, 32)
        filters_4 :int: - Number of filters used in the Part 4 of the Inception block E.g. 32

    Return:
        tf.concat - of all 4 parts
    """
    # YOUR CODE HERE

    # 1x1 conv
    part1 = Conv2D(filters_1, (1, 1), activation="relu", padding="same")(inputs)
    # 1x1 conv, 3x3 conv
    part2 = Conv2D(filters_2[0], (1, 1), activation="relu",padding="same")(inputs)
    part2 = Conv2D(filters_2[1], (3, 3),  activation="relu",padding="same")(part2)
    # 1x1 conv, 5x5 conv
    part3 = Conv2D(filters_3[0], (1, 1), activation="relu", padding="same")(inputs)
    part3 = Conv2D(filters_3[1], (5, 5),  activation="relu", padding="same")(part2)

    # max pool
    part4 = MaxPool2D(pool_size=(3, 3), strides=1, padding="same")(inputs)
    part4 = Conv2D(filters_4, (1, 1), activation="relu", padding="same")(part4)

    return layers.Concatenate(axis=-1)([part1, part2, part3, part4])
    # return tf.concat([part1, part2, part3, part4], axis=-1)

    # raise NotImplementedError

In [7]:
aaa = InceptionBlock(Input(shape = (224, 224, 3)), 10, (10, 10), (10, 10), 10).shape[1:]

2024-08-13 15:54:47.821047: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Max
2024-08-13 15:54:47.821066: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 32.00 GB
2024-08-13 15:54:47.821073: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 10.67 GB
2024-08-13 15:54:47.821086: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-08-13 15:54:47.821096: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [8]:
list(aaa)

[224, 224, 40]

In [9]:
# RUN THIS CELL TO CHECK IF YOUR SOLUTION IS CORRECT
TEST_INCEPTIONBLOCK(InceptionBlock)

# GoogleLeNet

### Implementing GoogLeNet from scratch

Like all big and famous architectures, GoogLeNet was created for the ImageNet competition. This architecture was later used to develop SOTA Face recognition applications, Reverse Image search, and many other Google products.

What is so special about this model?

GoogLeNet was created to solve the overfitting problem of big architectures. This was achieved by using Inception modules (layers) instead of the regular ones. Besides this _trick_, the authors have added two **mini-networks** in the middle of the model. These mini-networks are called Auxiliary classifiers.

Read this -> https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5

### Auxiliary classifier

Auxiliary classifiers are small networks used ONLY in the TRAINING time to prevent vanishing gradient problems for more extensive networks.

These small networks have the same output layer as the primary (big) model, with the Softmax/Sigmoid function. Calculating loss from these points helps preserver gradients in lower layers in the model and update the training time better.

Note: The number of outputs depends on the number of classes. Our task here is cats vs. dogs. Since this is the binary classification, we will use Sigmoid instead of Softmax with only 1 (one) neuron as the output.

While this is awesome for the training process, it is useless for the inference time, so we would only keep the main model in the production.

Links to learn more about Auxiliary classifiers:

- https://towardsdatascience.com/deep-learning-googlenet-explained-de8861c82765

![Auxiliary classifier](https://miro.medium.com/max/550/1*htr2D6tKh3JMS7Acy4BDTw.png)

<center>Image taken from <a href="https://towardsdatascience.com/deep-learning-googlenet-explained-de8861c82765">here</a></center>

<br><br>

The architecture of the Auxiliary Classifier is pretty simple.

- Start with AveragePooling with a pool size of 5x5 and strides of 3
- Put that through Conv layer with 128 feature maps, kernel size of 1, padding same, and activation relu
- Flatten the output of the Conv layer
- Use Dense layer with activation relu and 1024 units
- Add dropout layer of 0.7 or 70% drop
- Complete it with a Dense (output) layer with 1 unit for binary classification or with the same number as the number of classes for multi-class classification. (Sigmoid or Softmax)

### Exercise 2 Complete the Auxiliary Classifier function

Using the explanation and links provided, complete the _AuxiliaryClassifier_ function and run tests to check if your implementation is correct.


In [10]:
def AuxilaryClassifier(X):
    """
    Implement Inception block here

    Args:
        X - previous layer from the network

    Return:
        Last layer of the Auxilary Classifier (Softmax/Sigmoid)
    """
    # YOUR CODE HERE

    raise NotImplementedError

In [11]:
# RUN THIS CELL TO CHECK IF YOUR IMPLEMENTATION IS CORRECT
TEST_AUXILARY(AuxilaryClassifier)

NotImplementedError: 

## GoogLeNet architecture

![GoogleNet model](https://paperswithcode.com/media/methods/Screen_Shot_2020-06-22_at_3.28.59_PM.png)

<center>Image taken from <a href="https://paperswithcode.com">here</a></center>
<br><br>
Now that we have the most crucial components of the GoogleNet model (**InceptionBlock** and **AuxiliaryClassifier**), let's walk through the whole architecture and start by implementing it inside the **GoogLeNet function**.

GoogLeNet implementation guide:

1. Start by defining the Input layer. In the original paper, the model accepted (224, 224, 3) size, so let's keep that.
2. Define the first part of the model that goes:
   - Conv with 64 feature maps, Kernel size of 7 and strides of 2, padding=valid
   - Followed by MaxPooling layer with pooling size of 3 and strides 2, padding = same
   - Conv with 64 feature maps with a kernel size of 1
   - Conv with 192 feature maps, kernel size of 3, and padding is the same
   - Finish this part with MaxPooling with a kernel size of 3 and strides of 2
3. This part is given to you as a reference in the GoogLeNet function
4. Define the first Auxiliary Classifier
5. Followed by 3 Inception Blocks
   - 1st block: 160, (112, 224), (24, 64), 64
   - 2nd block: 128, (128, 256), (24, 64), 64
   - 3rd block: 112, (144, 288), (32, 64), 64
6. Define the second Auxiliary Classifier
7. Define the last part of the network

   - Inception block with config: 256, (160, 320), (32, 128), 128
   - MaxPooling layer with pooling size of 3, strides are 2, and padding is same
   - Inception block: 256, (160, 320), (32, 128), 128
   - Inception block: 384, (192, 384), (48, 128), 128
   - Global Average pooling layer
   - Complete the network with Dense layer with the number of units 1 (Dogs vs. cats), activation sigmoid, and name="output"

8. Define the model using keras Model, where inputs will be inputs defined from the 1st step, and the outputs will be a list of 3 things - Last layer of the model, auxiliary classifier 1 outputs, and auxiliary classifier 2 outputs

Learn more about GoogLeNet:

- https://towardsdatascience.com/deep-learning-googlenet-explained-de8861c82765
- https://www.geeksforgeeks.org/understanding-googlenet-model-cnn-architecture/


In [None]:
def GoogLeNet():

    # input layer

    # First part of the network

    # THIS PART IS GIVEN TO YOU AS A REFERENCE
    X = InceptionBlock(X, 64, (96, 128), (16, 32), 32)
    X = InceptionBlock(X, 128, (128, 192), (32, 96), 64)
    X = MaxPool2D(pool_size=3, strides=2)(X)
    X = InceptionBlock(X, 192, (96, 208), (16, 48), 64)

    # 1st Aux classifier

    # Inception blocks

    # 2nd aux classifier

    # Last part of the network

    # Define the model

    return model

### Let's use your completed function GoogLeNet() and defined the model


In [None]:
model = GoogLeNet()

In [None]:
# RUN THIS CELL TO CHECK IF YOUR IMPLEMENTATION OF GOOGLENET IS CORRECT
TEST_GOOGLENET(model)

### Compile the model


In [None]:
model.compile(loss=['binary_crossentropy', 'binary_crossentropy', 'binary_crossentropy'],
              loss_weights=[1, 0.3, 0.3],
              optimizer=RMSprop(lr=0.001),
              metrics=['acc'])

### Setting up config (Hyperparams) for the model


In [None]:
IMG_SIZE=(224, 224) # <- DO NOT CHANGE

# Experiment with batch_size and epochs
batch_size=32
epochs=15

## Data loading and preprocessing

To help ourselves in loading and processing images, let's use **ImageDataGenerator** provided as a part of the TensorFlow library.

To learn more about data generators and how to use them, read this blog:

- https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html


In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# For this project we will use only scaling as the image preprocessing step (All pixels between 0-1)
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)

### Running Data Generators over the downloaded dataset

Using ImageDataGenerators allows us to load images in many ways. In our case, we have all images in the folder called **data**, and each class is its folder _["cat," "dog"]_. This is the perfect setup for the function called **flow_from_directory**!

This function takes data from a specified folder and automatically detects the number of images, number of classes and loads them in the memory when the training starts. When defining the generator, you can specify a standardized image size to resize all loaded images to the specified size.

Here is the link to learn more about _flow_from_directory_:

- https://vijayabhaskar96.medium.com/tutorial-image-classification-with-keras-flow-from-directory-and-generators-95f75ebe5720


In [None]:
# Flow training images in batches of 20 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
        train_dir,  # This is the source directory for training images
        target_size=IMG_SIZE,  # All images will be resized to 150x150
        batch_size=batch_size,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

# Flow validation images in batches of 20 using val_datagen generator
validation_generator = val_datagen.flow_from_directory(
        validation_dir,
        target_size=IMG_SIZE,
        batch_size=batch_size,
        class_mode='binary')

### Exercise 4 Train the _model_ using all the parameters, **train_generator** and **validation_generator**

HINT: Here is the post that explains how to train a model using data generators: https://www.pyimagesearch.com/2018/12/24/how-to-use-keras-fit-and-fit_generator-a-hands-on-tutorial/


In [None]:
# YOUR CODE HERE

### Using trained model to make predictions


In [None]:
predictions = np.where(model.predict(validation_generator)[0] < 0.5, 0, 1)