# Building a Convolutional Neural Network (CNN) with Keras

In this notebnook, we will use Keras to build a Convolutional Neural Network (CNN) for a Computer Vision use case.

**Attention:** The code in this notebook creates Google Cloud resources that can incur costs.

Refer to the Google Cloud pricing documentation for details.

For example:

* [Vertex AI Pricing](https://cloud.google.com/vertex-ai/pricing)


## Install protobuf 3.19

This step is required to work with older versions of Tensorflow.

In [None]:
!pip uninstall protobuf -y
!pip install protobuf==3.19.* --quiet

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

# (Wait for the kernel to restart before continuing...)

## Import libraries

Import the necessary libraries

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models, datasets, optimizers, utils
import numpy as np

## Check if GPU is available and set the device accordingly

We have attached a GPU to our Vertex AI Notebook Instance. In order to make our code more flexible, we will check for the presence of a GPU. If a GPU is present, we will set it as our training device; otherwise, we'll specify CPU as our training device.

You can try playing around with this code to see if you notice any differences when training on CPU vs GPU. Bear in mind that GPU's are designed to handle paralellizable tasks more efficiently.

In [None]:
if tf.config.list_physical_devices('GPU'):
    device = '/GPU:0'
else:
    device = '/CPU:0'

## Load and process the dataset

In the next cell, we wil load the [CIFAR-10 dataset](https://keras.io/api/datasets/cifar10/), which is a dataset that consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. 

The classes are: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', and 'truck'.

The code in the following cell will loads the dataset into four variables:
* x_train: Training images (50,000 images, each represented as a 32x32x3 array of pixel values).
* y_train: Training labels (50,000 labels, each an integer representing the class of the corresponding image, from 0 to 9).
* x_test: Test images (10,000 images, with the same structure as x_train).
* y_test: Test labels (10,000 labels, with the same structure as y_train).

In [None]:
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()

### Normalize the pixel values of the images

Next, we will normalize the pixel values of both the training and test images to a range between 0 and 1. 
(Pixel values in images usually range from 0 to 255, so dividing by 255 scales them to 0-1.)

This is important because:
1. Neural networks often work better with normalized input data.
1. It prevents features with larger ranges from dominating the learning process.

In [None]:
x_train, x_test = x_train / 255.0, x_test / 255.0

### Perform one-hot encoding on our class labels

Next, we will perform one-hot encoding to convert the class labels (which are represented as integers from 0 to 9 in the source dataset) into binary class matrices. 

This means that each label is represented as a 10-dimensional vector with a 1 in the position corresponding to the class, and 0s elsewhere (e.g., the label "3" would become [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]). This format is more suitable for neural networks.

In [None]:
# Convert class vectors to binary class matrices
y_train = utils.to_categorical(y_train, 10)
y_test = utils.to_categorical(y_test, 10)

## Define our CNN model

The code in the next cell will define a CNN with the following layers (see the descriptions in the text in Chapter-14 in our book for reference):

* **Convolutional layers:** These layers extract features from the input images using filters. 
* **Max pooling layers:** These layers reduce the dimensionality of the features, making the model more efficient and less prone to overfitting. We also downsample the input by taking the maximum value in 2x2 patches.
* **Flattening layer:** This layer flattens the 3D output of the convolutional layers into a 1D vector, suitable for input to the dense layers.
* **Dense layers:** These layers perform the final classification. We define one dense layer with 64 neurons and ReLU activation, and then an output layer with 10 neurons (one for each class) and softmax activation to produce probability scores. This is what provides the probability at which the input image was a member of each class.

In [None]:
# Define the CNN model
def create_model():
    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu'))

    model.add(layers.Flatten())
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(10, activation='softmax'))
    return model

## Train our CNN model

The code in the next cell will perform the following steps:

1. Set up the context manager for device placement: The line specifies a context manager to ensure that the subsequent code is executed on the specified device (i.e., CPU or GPU, based on the check we performed at the beginning of this notebook).
1. Call the previously defined create_model() function to build the CNN model architecture. It assigns the resulting model object to the variable `net`.
1. Configure the training process with the following parameters:
* Optimizer: This sets the optimizer to be used during training. Here, we use the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.001 and momentum of 0.9.
* Loss function: This sets the loss function to be minimized during training. Here, we use categorical crossentropy, which is a common choice for multi-class classification problems.
* Metrics: This specifies the metrics to be monitored during training and evaluation. Here, we use `accuracy`, which measures the percentage of correctly classified examples.
4. Train the model (using `net.fit()` and specifying the required configurations, such as input and validation datasets, batch size, and number of epochs).
5. Save the resulting model.

In [None]:
with tf.device(device):
    net = create_model()

    # Compile the model
    net.compile(optimizer=optimizers.SGD(learning_rate=0.001, momentum=0.9),
                loss='categorical_crossentropy',
                metrics=['accuracy'])

    # Train the network
    history = net.fit(x_train, y_train, epochs=20, batch_size=4,
                      validation_data=(x_test, y_test))    

# Save the trained model
net.save('cifar_net.keras')   

## Evaluate our model

Next, we load our trained model and use the `net.evaluate` function to evaluate the model against the test dataset.
Finally, we print the accuracy score from the evaluation.

In [None]:
# Load the saved model
net = models.load_model('cifar_net.keras')

# Evaluate the model on the test dataset
test_loss, test_acc = net.evaluate(x_test, y_test, verbose=2)
print(f'Accuracy of the network on the 10000 test images: {test_acc * 100} %')

## Get predictions from our model

The code in the next cell will perform the following steps:

1. Define the classes for mapping to numerical representations in source dataset ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
1. Define a function to display images from the dataset. This function takes an image as input and displays it using Matplotlib's imshow function.. It also rescales pixel values from 0-1 to 0-255 for visualization.
1. Get some random testing images and labels
1. Print the images
1. Predict labels for the images

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Define the classes for mapping to numerical representations in source dataset
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

# Function to display images
def imshow(img):
    img = img * 255
    plt.imshow(img.astype('uint8'))
    plt.show()

# Get some random testing images and labels
n = 4  # Number of images to display
idx = np.random.choice(x_test.shape[0], n, replace=False) # Randomly select n indices from the test set without replacement.
images, labels = x_test[idx], y_test[idx] # Retrieves the images and labels at those indices.

# Print images
imshow(np.hstack(images)) # Concatenates the images horizontally to display them side-by-side.
print('GroundTruth: ', ' '.join(classes[np.argmax(label)] for label in labels)) # Finds the class with the highest probability for each label, and joins the class names into a string separated by spaces.)


# Use the trained model to predict the classes for the selected images.
predicted = net.predict(images) 
predicted_classes = np.argmax(predicted, axis=1) # Finds the index of the highest probability class for each prediction.

# Convert the predicted indices back to class names and joins the class names into a string for printing.
print('Predicted: ', ' '.join(classes[cls] for cls in predicted_classes)) 