_This notebook is part of the material for the [ML Tutorials](https://github.com/NNPDF/como-2025) session._

# Convolutional Neural Network for Classification


In this exercise, we are building a convolutional neural network that learns to distinguish cat and dog images.

### Importing the libraries

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator

## Part 1 - Data Preprocessing

### Preprocessing the Training set

To have the images in an appropriate format, we need to preprocess them. A common way to do so is to first scale the pixel values down by 255 to bring them into the range $[0, 1]$. Moreover, we want to ensure that all images have the same size, so we map everything to 64x64 pixels. Then, as we know, when rotating, moving, flipping, or zooming into images of cats and dogs, the cat remains a cat, and a dog remains a dog. In other words, the class definition should be invariant under these transformations.

By construction, convolutional neural networks are already invariant under translations, so we do not have to bother about this in the preprocessing. However, to ensure the network learns that the same image, when flipped, rotated, or zoomed, remains in the same class, we typically perform data augmentation. This means you take an image, perform a random flip, rotation, or zoom, and then add the same image with the same label to the dataset. With this, the network effectively learns that these transformations are symmetries of the problem.

Of course, in many cases, it is also possible to directly embed symmetries into the network architectures, which allows us to drop the need to augment images that have been additionally transformed and add them to the training set. Here, we do not have such an architecture available, which means we must perform these transformations manually.

Luckily, there exists a Keras class, which does this automatically for you. It is called `ImageDataGenerator` ([Doc](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator)) which we will just use in the following:

In [2]:
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)
training_set = train_datagen.flow_from_directory('dataset/catdogs/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

Found 8000 images belonging to 2 classes.


### Preprocessing the Test set

In [3]:
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory('dataset/catdogs/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

Found 2000 images belonging to 2 classes.


## Part 2 - Building the CNN

In [4]:
cnn = Sequential()
cnn.add(  keras.Input(shape=[64, 64, 3]))

# Convolution and Max-Pooling
cnn.add(keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation='relu'))
cnn.add(keras.layers.MaxPool2D(pool_size=2, strides=2))
# 2nd Convolution and Max-Pooling
cnn.add(keras.layers.Conv2D(filters=64, kernel_size=3, padding='same', activation='relu'))
cnn.add(keras.layers.MaxPool2D(pool_size=2, strides=2))
# 2nd Convolution and Max-Pooling
cnn.add(keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', activation='relu'))
cnn.add(keras.layers.MaxPool2D(pool_size=2, strides=2))
# Flatten
cnn.add(keras.layers.Flatten())
# Fully connected
cnn.add(keras.layers.Dense(units=16, activation='relu'))
# Output layer
cnn.add(keras.layers.Dense(units=1, activation='sigmoid'))
print(cnn.summary())


None


## Part 3 - Training the CNN

### Compiling the CNN

In [5]:
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

### Training the CNN on the Training set and evaluating it on the Test set

In [6]:
cnn.fit(
    x = training_set, 
    validation_data = test_set, 
    epochs = 25
)

Epoch 1/25


  self._warn_if_super_not_called()


[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 30ms/step - accuracy: 0.5122 - loss: 0.6946 - val_accuracy: 0.5970 - val_loss: 0.6621
Epoch 2/25
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 32ms/step - accuracy: 0.5875 - loss: 0.6667 - val_accuracy: 0.6995 - val_loss: 0.6078
Epoch 3/25
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 34ms/step - accuracy: 0.6765 - loss: 0.6010 - val_accuracy: 0.7120 - val_loss: 0.5641
Epoch 4/25
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 35ms/step - accuracy: 0.7072 - loss: 0.5653 - val_accuracy: 0.7410 - val_loss: 0.5236
Epoch 5/25
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 36ms/step - accuracy: 0.7327 - loss: 0.5307 - val_accuracy: 0.7530 - val_loss: 0.4999
Epoch 6/25
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 36ms/step - accuracy: 0.7436 - loss: 0.5163 - val_accuracy: 0.7575 - val_loss: 0.4960
Epoch 7/25
[1m250/250[0m [32m━

<keras.src.callbacks.history.History at 0x16b0de540>

## Part 4 - Making a single prediction

Check your model on the sinlge pictures in `dataset/single_prediction` whether it predicts the correct
classes.

In [7]:
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('dataset/catdogs/single_prediction/cat_or_dog_2.jpg', target_size = (64, 64))
test_image = image.img_to_array(test_image)/255.0 # <- scaling important!
test_image = np.expand_dims(test_image, axis = 0)

In [8]:
result = cnn.predict(test_image)
if result[0][0] > 0.5:
  prediction = 'dog'
else:
  prediction = 'cat'
print(prediction)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
cat
