# Convolutional Neural Network

## Setup and Context

### Introduction

The following is a Convolutional Neural Network that differentiates between cats and dogs.

### Import Statements

In [60]:
import tensorflow as tf
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
import numpy as np

In [46]:
tf.__version__

'2.15.0'

## Data Preprocessing

### The Training Set

We want to perform some **transformations** (such as rotation, flip, zoom, translation, shear e.t.c) on the images of the training set. The idea behind these transformations is to expose the model to a diverse range of input variations during training, which helps prevent it from memorizing specific details of the training set. This way we can avoid **overfitting**.

In [47]:
train_datagen = ImageDataGenerator(
        rescale=1./255,
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

The `rescale` parameter of the `ImageDataGenerator` will apply **feature scaling** to every pixel. By scaling the pixel values by 1/255, we normalize them to the range [0, 1].

Below is a generator that will read images found in subfolers of 'data/train', and indefinitely generate batches of augmented image data.

In [48]:
training_set = train_datagen.flow_from_directory(
        'data/train',  # this is the target directory
        target_size=(150, 150),  # all images will be resized to 150x150
        batch_size=32,
        class_mode='binary') # this is a binary classification problem i.e either cats or dogs

Found 8005 images belonging to 2 classes.


### The Test Set

We do not want to perform transformations on the test data to avoid **data leakage**. We want to ensure that the evaluation of the model's performance reflects its ability to generalize to new, unseen data in a real-world scenario. We do however still need to normalize the data for our neural network.

In [49]:
test_datagen = ImageDataGenerator(rescale=1./255)

Similary to the training, we create a generator for the test set. Only difference is the directory of the images.

In [50]:
test_set = test_datagen.flow_from_directory(
        'data/test',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

Found 2023 images belonging to 2 classes.


## Building the Architecture

### Initializing the CNN

The Convolutional Neural Network is a sequence of layers. Therefore we are going to intialize our layer with the `Sequential` class.

In [51]:
cnn = tf.keras.Sequential()

### Convolution Layer

We add the Convolution Layer with the `Conv2D` class to our model while specifying `filters` (number of feature detectors you want to apply to your images), `kernel_size` (the size of the feature detectors), `activation` (the activation function) and `input_shape` (the input shape).

When we add the very first layer, whether a convolution layer or a dense layer, we have to specify the input shape of our inputs. The input shape is (150, 150, 3) as the size of our images after preprocessing is 150x150 and we are using coloured images, 3 dimensions corresponding to the RGB channels.

For other layers after the input layer, we do not specify the input shape.

In [52]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=(150, 150, 3)))

### Pooling Layer

Next we add a layer for pooling. More specifically, **max pooling**. The `MaxPool2D` class has two necessary parameters: `pool_size` (size of the pooling window) and `strides` (step size of the pooling window).

In [53]:
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

### Adding a Second Convolution Layer

Though, while not necessary, using a second convolutional layer in a Convolutional Neural Network allows the model to capture higher-level features by learning more complex patterns and representations from the input data.

In [54]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

### Flattening

We follow up by flattening the results of our convolutions and poolings into a one-dimensional vector.

In [55]:
cnn.add(tf.keras.layers.Flatten())

Now we can pass it on to a fully connected Neural Network.

### Full Connection

Join up to fully connected layers. Because of the complexity of computer vision, the hidden layer(s) can have large numbers of neurons.

In [56]:
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))

### Output Layer

Because we are doing a binary classification problem, we only need 1 neuron in the output layer.

In [57]:
cnn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

It is not recommended to use ReLU as the activation function of the output layer. Instead, the Sigmoid function would work better in this binary classification problem. If we where doing a multi-class classification problem, then Softmax would be the choice.

## Training

### Compiling the Neural Network

By compiling, we mean connecting the Neural Network to an optimizer, loss function and some metrics. We are using an **Atom Optimizer** to perform **Stochastic Gradient Descent**. Our **Loss Function** is the **Binary Cross Entropy Loss**. We also using **Accuracy Metrics** as this is the most relevant way to measure a classification model.

In [58]:
cnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

### Training and Evaluating

Using our training set, we train the neural network and use the test set to evaluate its performance. For this example we will run 30 **epochs**.

In [59]:
cnn.fit(x=training_set, validation_data=test_set, epochs=30)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.src.callbacks.History at 0x158ca18db90>

## Making a Prediction

Time to make a prediction. We have an example image that we are going to check.

<p align='center'>
<img src='./assets/images/cnn-dog-test.jpg'>
</p>

<p align='center'>An image of a dog</p>

Firstly, we load the image. The image MUST be the same size as the ones used during training.

In [75]:
test_image = image.load_img('./assets/images/cnn-dog-test.jpg', target_size=(150, 150))

Our test image must also be converted into a 2D-array as this is the input that is expected by our Neural Network.

In [76]:
test_image = image.img_to_array(test_image)

The `predict` method has to be called using the exact same format as was used during the training. Our Convolutional Neural Network was not trained on a single image but rather batches of images. So now we have an extra dimension of the batch. To solve this, we put our image in the batch so that the `predict` method can recognize the batch as the extra dimension. The batch is always the first dimension.

In [77]:
test_image = np.expand_dims(test_image, axis=0)

Now we predict.

In [78]:
result = cnn.predict(test_image)



Our model would either give us a 0 or a 1. Let us see what each one represents.

In [79]:
training_set.class_indices

{'cats': 0, 'dogs': 1}

So 0 represents cats and 1 represents dogs.

Let us see the result of our prediction. Recall that our result would be an batch (of which there is only 1) of only one element.

In [80]:
if result[0][0] == 1:
    prediction = "dog"
else:
    prediction = "cat"

prediction

'dog'

Let us see another example

<p align='center'>
<img src='./assets/images/cnn-cat-test.jpg'>
</p>

<p align='center'>Image of a Cat</p>

In [83]:
test_image = image.load_img('./assets/images/cnn-cat-test.jpg', target_size=(150, 150))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)

result = cnn.predict(test_image)
if result[0][0] == 1:
    prediction = "dog"
else:
    prediction = "cat"

prediction



'cat'