<div class="alert alert-block alert-success">
    <b>ARTIFICIAL INTELLIGENCE (E016350A)</b> <br>
ALEKSANDRA PIZURICA <br>
GHENT UNIVERSITY <br>
AY 2024/2025 <br>
Assistant: Nicolas Vercheval
</div>

# Convolutional neural networks (CNN) - image classification (part 1)


This tutorial demonstrates training a simple Convolutional Neural Network (CNN) to classify [CIFAR](https://www.cs.toronto.edu/~kriz/cifar.html) images.

### Import TensorFlow

In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

### CIFAR10 dataset

The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.

In [None]:
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

In [None]:
train_images[0].shape

To verify that the dataset looks correct, we plot the first 25 images from the training set and display the class name below each image.



In [None]:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5, 5, i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    # The CIFAR labels happen to be arrays, 
    # which is why you need the extra index
    plt.xlabel(class_names[train_labels[i][0]])
plt.show()

### Defining the model

Below, we will define the first part of our model. In this section, we will explain the [convolutional](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) and [pooling](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D) layers which aim to learn a new representation of the data. With the new data representation, we will later facilitate the work of the classification model (the second part of our model).

Convolutional layers are created at the library level using the `Conv2D` function. The number of filters is specified first (parameter `filters`, usually unnamed), then kernel sizes (`kernel_size` parameter), displacement size (`strides` parameter) and padding (`padding` parameter). <img src = https://upload.wikimedia.org/wikipedia/commons/5/55/Convolution_arithmetic_-_Arbitrary_padding_no_strides.gif style = 'height: 300px'>

As input, the grid will take the dimension input (image height, image width, channel number) with an additional dimension used for the batch size. The number of channels will be $3$ because the CIFAR10 images are in colour, i.e., there are red, green and blue channels.

For example, the figure shows a 3x3 kernel (grey square) that passes through an input (blue square) with a horizontal and vertical offset of size 2. Adding a white set of squares represents a framing and, depending on its presence, the size of the output image (green square) can be of the exact dimensions (in the Keras library, this is emphasized by the value of the `same` parameter `padding`) or somewhat smaller (in the Keras library this is noted by the value of the `valid` parameter` padding`).

The pooling layers (`MaxPooling2D` and` AvgPooling2D`) reduce the layers by reducing the blocks of the given majority to their maximum or average values. The `pool_size` parameter specifies the block size. <img src = https://d3i71xaburhd42.cloudfront.net/65a6f29bb5d9418f7ef6547c612d3c3445b7f962/3-Figure1-1.png style = 'width: 600px'>

As the dimensions of the images are $32\times 32$, the network will take the input size $(32,32,3)$, or $(3,32,32)$ in some other libraries that expect the number of channels to go first.

To define the input dimension, we can set the named argument `input_shape` when constructing the first layer in the` Sequential` model.

In [None]:
model = models.Sequential()
# We add a convolutional layer that has 32 3x3 filters with
# relu activation function.
model.add(layers.Input(shape=(32, 32, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
# We add a pooling layer that uses the maximum function
# where the filter size is 2x2.
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

Let's display the architecture of your model so far by using the `summary` function.

In [None]:
model.summary()

You can see above that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper into the network. The first argument controls the number of output channels for each Conv2D layer (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.

The formula for calculating the dimensions of the output of convolution and pooling is given by:
$$
O_w = \frac{W - K + 2P}{S} + 1
$$

$$
O_h = \frac{H - K + 2P}{S} + 1,
$$

where:
- $O_w$: input dimension - width
- $O_h$: output dimension - height
- $W$: width
- $H$: height
- $P$: padding
- $S$: stride

Often, width and height are equal ($W = H$). For example, in the first convolutional layer, we added transforms
(32, 32, 3) into (30, 30, 32).

- $W = H = 32$
- $H = 32$
- $P = 0$
- $S = 1$
- $K = 3$

$$
O = \frac{32 - 3 + 2 \cdot 0}{1} + 1 = 32 - 3 + 0 + 1 = 30
$$



### Classification model

The second part of our model will be the classification part and will be similar to the previous exercise, where we made a classification model for predicting fuel efficiency.

As the current output of the model gives something of a shape (4, 4, 64), it is necessary to transform it into a vector that will be given as the input to a fully connected neural network that will represent the classifier. We use `Flatten` to transform the images into vectors (the batch dimension stays the same).

That is, if a tensor Tensor has dimensions $(\text{batch size}, 4, 4, 64)$, Flatten(Tensor) will be a tensor of dimensions $(\text{batch size}, 4\times 4\times 64) = (\text{batch size}, 1024)$.

CIFAR has ten output classes, so you use a final Dense layer with ten outputs.


In [None]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

Here's the complete architecture of your model:

In [None]:
model.summary()

### Compiling and training the model

In [None]:
epochs = 10
batch_size = 64
num_classes = 10

train_labels_cat = tf.keras.utils.to_categorical(train_labels, num_classes)
test_labels_cat = tf.keras.utils.to_categorical(test_labels, num_classes)

print(f'train_labels.shape={train_labels.shape}')
print(f'test_labels.shape={test_labels.shape}')
print(f'train_labels_cat.shape={train_labels_cat.shape}')
print(f'test_labels_cat.shape={test_labels_cat.shape}')

model.compile(optimizer='adam',
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])


history = model.fit(train_images, train_labels_cat, epochs=epochs,
                    batch_size=64,  validation_data=(test_images, test_labels_cat))

### Evaluating the model

In [None]:
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')

test_loss, test_acc = model.evaluate(test_images,  test_labels_cat, verbose=2)

In [None]:
print(test_acc)

Your simple CNN has achieved a test accuracy of around 70%. Of course, this can be way better. 
In the following notebook example, we will see some of the techniques we can use to improve the accuracy by using image augmentation as well as dropout regularization.

Try to improve this accuracy yourself! Some of the ideas:
- Use a deeper model
- Increase the number of filters
- Train the network longer
- Change `batch_size`
- Use a dropout

This notebook is partially based on the official Tensorflow [tutorial](https://www-tensorflow-org.translate.goog/tutorials/images/cnn?_x_tr_sl=en&_x_tr_tl=nl&_x_tr_hl=nl&_x_tr_pto=nui%2Csc).