# **Lab 4.3: Convolutional Neural Networks (CNN)**

<hr>

## **1. Introduction**

The neural networks we have considered so far have always worked with *structured data*, meaning data that can be stored in tables. Nowadays, it is increasingly common to encounter problems that require handling **unstructured data**, such as images, text, or audio, with images being the most frequent type. In these kinds of problems, unlike those we've seen so far, the model inputs are not vectors of values extracted from a dataset, but rather images.

<div class="alert alert-block alert-warning">
    <strong>Note:</strong> Grayscale images are encoded as two-dimensional <b>matrices</b>, while color images are represented as three-dimensional <b>tensors</b>.
</div>

<center>
    <div style="border-radius:5px; padding:10px; background:white; max-width:1200px">
        <img src="https://i.imgur.com/urLXh6F.png">   
    </div>
</center>

The models we have seen so far are designed to work with vectors, so they cannot work directly with images. If we wanted to use a classic model (such as logistic regression or SVM) with images, it would be necessary to transform the image into a vector. A common way to do this is to *flatten* the image. Flattening an image means converting the 2D matrix (or 3D tensor) into a one-dimensional vector by concatenating the rows (or color channels) of the image one after another.

For example, a $28\times28$ pixel image would be transformed into a vector of $784$ elements. This resulting vector could then be used as input for a classic model. Although this method may imply the loss of relevant spatial information from the original image, it represents a simple way to adapt classic models to image processing. In traditional networks with dense (fully connected) layers, such as the ones we have seen so far, the same process would be necessary. This means that, even for small images like the one in the example, we obtain large vectors, which requires learning a large number of weights or parameters.

To solve these problems, **convolutional neural networks (CNNs)** have emerged, whose architecture is composed of two parts:

* **Feature extractor**: This part is responsible for extracting the relevant features from the image, that is, it learns a low-dimensional vector that represents the image. It is composed of a series of **convolutional** layers and **pooling** layers. 
* **Fully connected part**: This part is responsible for solving, based on the learned vector, the problem to be solved (classification, regression, ...). It is composed of a series of dense (fully connected) layers, like the networks used previously.
<center>
    <div style="border-radius:5px; padding:10px; background:white; max-width:1200px">
        <img src="https://i.imgur.com/vsSJBcu.png">   
    </div>
</center>

<div class="alert alert-block alert-warning">
    <strong>Note:</strong> The <i>convolutional part</i> or the <i>feature extractor</i> of a CNN is responsible for learning a <b>vector representation</b> of the image which will then be the input to the fully connected part responsible for making the final prediction.
</div>

### **Objective**
In this lab, you will learn how to solve an image classification problem by creating a convolutional network with `tensorFlow` and `keras`.

## **2. Set up GPU**

In order to accelerate training as much as possible, in this lab we will make use of the *GPUs* installed on the lab computers.

So far, all the models we have trained have run on *CPU* because `tensorflow` runs on this device by default. This library only allows the use of *GPU* on Windows through **Windows Subsystem for Linux (WSL)**.

This feature allows us to have a Linux environment directly on Windows without the need to create a virtual machine or set up dual boot. As you will see, we can run Linux command-line programs directly on Windows.

##### **Activate and install WSL**

You will need to open a `cmd.exe` command window and run:

In [None]:
wsl --install

* Indicate your **UO** as both the username and password.

Inside the Linux console (Ubuntu by default), we will need to install `conda` and create the environment for the course.

##### **Install conda**

Download `miniconda` (a minimal version of `conda`) and install it.

<div class="alert alert-block alert-warning">
    <strong>You will need to answer yes (Yes) to the question <i>Do you wish to update your shell profile to automatically initialize conda?</i>.</strong>
</div>

In [None]:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

* Close the WSL terminal and reopen it to update the environment variables.

You can find it again in the Windows menu by searching for `WSL`.

Next, we will recreate the `SSII` environment.

In [None]:
conda create --name "SSII" python=3.10
conda activate "SSII"

Install the necessary libraries within the environment, including the version of `tensorflow` with GPU support.

In [None]:
pip install ipykernel pandas seaborn scikit-learn
pip install tensorflow[and-cuda]

##### **Visual Studio Code**

At this point, we have two operating systems: native Windows and a Linux running within it. Our goal is to run this Notebook inside Linux.

To do this, we need to tell Visual Studio Code **to connect to a different machine**. The easiest way to do this is:

* Open a new VSCode window (`File > New Window`).
* In the new VSCode window, at the bottom left, there is an icon of two opposing arrows.
* Clicking on it will bring up a top menu, from which we select <i>Connect to WSL</i>.
* If everything works, at the bottom left, something like **WSL: Ubuntu** will appear.

Next, we will install the **VSCode extensions**. You had already installed them on Windows, but now you need to do it in the Linux distribution.

* Install *Python* and *Jupyter*.

##### **File System**

Remember, we are now inside another machine, so we will need to store the practice notebooks within it.

Windows makes this access much easier. If you open a file explorer, you simply need to go to the *Linux* folder that appears in the quick access on the left:

<center>
    <div style="border-radius:5px; padding:10px; background:white; max-width:600px">
        <img src="https://i.imgur.com/dEqSKsg.png" style="height:400px">   
    </div>
</center>

* Create the `SSII` folder in `Linux/Ubuntu/home/**your UO**/`.
* Copy this notebook into it.
* Open that folder from VSCode (`File > Open Folder...`).
* Open the notebook and select the `conda` environment you just created as the kernel.

The easiest way to verify that `tensorflow` has access to the GPU is by running the following code:

In [None]:
! export TF_CPP_MIN_LOG_LEVEL = 2  # To make tensorflow show only warnings and errors

import tensorflow as tf

seed = 2533

print('-'*50)
print('Num GPUs available: ', len(tf.config.list_physical_devices('GPU')))
print('-'*50)

You can also check the current GPU usage with the following command:

In [None]:
! nvidia-smi

If you prefer to monitor the GPU usage continuously, you can open a WSL terminal (from Windows or from VSCode in `Terminal > New Terminal`) and run the following command:

In [None]:
! watch -n1 nvidia-smi

This command simply executes the previous command every 1 second. To exit, you will need to press `Ctrl+C`.

<hr>

## **3. Convolutional Neural Networks**

Once everything is set up, let's propose a potential problem.

Imagine you work for the IT department at the post office, and currently, they have a person responsible for manually reading the recipients of letters (written by hand) and placing them, based on the postal code, in one box or another.

As you can see, this process is very slow, so they propose installing a camera that takes images of each letter and automatically reads the postal codes.

They already have images of the postal codes of the letters, but they don't know how to extract the digits from each image, so they present you with the following problem:

<div class="alert alert-block alert-success">
    <b>Create a model that, given an image of a handwritten number, can recognize which number it is.</b>
</div>

### **3.1 Data Preprocessing**

To help you, they have already created a manually curated dataset that contains images of individual digits and labels indicating the number that appears in the image.

<center>
    <div style="border-radius:5px; padding:10px; background:white; max-width:1000px">
        <img src="https://i.imgur.com/cHlLRXB.png">   
    </div>
</center>

You can download the dataset using the following code:

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

This dataset contains $70,000$ grayscale images of $28\times28$ handwritten digits from $0$ to $10$.

As you can see, it is already divided into $X$, $Y$, as well as into training ($60,000$) and test ($10,000$) sets.

In [None]:
print(x_train.shape, y_train.shape)

To simplify, let's keep only 20,000 random test examples.

In [None]:
import numpy as np

np.random.seed(seed)
indices = np.random.choice(len(x_train), size=20000, replace=False)

# Obtain the samples
x_train = x_train[indices]
y_train = y_train[indices]

print(x_train.shape, y_train.shape)

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Manually normalize the data between 0 and 1.
    <hr>
    The classes from <code>scikit-learn</code> do not work in this case, as they are designed to work with vectors.
</div>

In [None]:
# Your code here

Let's create a plot to view one of the examples from the training set.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize = (5,5))
plt.imshow(x_train[0], cmap = 'Greys')
plt.title(f'Example image of digit {y_train[1]}')
plt.show()

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Visualize image 1523 from the test set. Add its real label (class) in the title.
</div>

In [None]:
# Your code here

### **3.2 Machine Learning**

Remember, to tackle this problem with classic machine learning models, we first need to transform the $28\times28$ images into vectors.

We create two new variables to store the original images and their "flattened" version.

In [None]:
x_train_vector = x_train.reshape(-1, 28 * 28)
x_test_vector = x_test.reshape(-1, 28 * 28)

print(x_train.shape)
print(x_train_vector.shape)

Once the data has been adapted, we can attempt to tackle the problem with classic machine learning models.

To determine which metric is most appropriate, let's first analyze how many images there are of each number (class).

In [None]:
sns.countplot(x = y_train, hue = map(str, y_train))

In this case, the classes are fairly balanced, so the `f1 macro` metric will be sufficient.

Keep in mind that methods other than baselines will take some time to train, as now each example consists of $784$ inputs.

In [None]:
from sklearn.metrics import accuracy_score, f1_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.dummy import DummyClassifier
from tabulate import tabulate
    
def evaluate_model(Y_test, preds_test, model_name, average = 'binary'):
    preds_test = (preds_test >= 0.5).astype(int)
    metrics = [
        ('Accuracy', accuracy_score(Y_test, preds_test)),
        ('F1', f1_score(Y_test,preds_test, average = average))
    ]
    
    print(f'Results for {model_name}:')
    print(tabulate(metrics, headers = ['Metric', 'TEST'], tablefmt = 'rounded_outline'))
    print()
    
# Baseline Random
baseline_random = DummyClassifier(strategy = 'uniform')
baseline_random.fit(x_train_vector, y_train)
preds_test = baseline_random.predict(x_test_vector)
evaluate_model(y_test, preds_test, 'Baseline Random', average = 'macro')

# Baseline Zero-R
baseline_zero = DummyClassifier(strategy = 'most_frequent')
baseline_zero.fit(x_train_vector, y_train)
preds_test = baseline_zero.predict(x_test_vector)
evaluate_model(y_test, preds_test, 'Baseline Zero-R', average = 'macro')

# KNN
model_knn = KNeighborsClassifier()
model_knn.fit(x_train_vector, y_train)
preds_test = model_knn.predict(x_test_vector)
evaluate_model(y_test, preds_test, 'KNN', average = 'macro')

# Decision Trees
model_tree = DecisionTreeClassifier()
model_tree.fit(x_train_vector, y_train)
preds_test = model_tree.predict(x_test_vector)
evaluate_model(y_test, preds_test, 'Decision Tree', average = 'macro')

This returns the following results:

<center>

| Model             | Accuracy (Test) | F1 (Test) |
|-------------------|-----------------|-----------|
| Baseline Random   | 0.114           | 0.032     |
| Baseline Zero-R   | 0.114           | 0.020     |
| KNN               | 0.211           | 0.120     |
| Decision Tree     | 0.202           | 0.112     |

</center>

### **3.3 Deep Learning**

We are going to create two options, a *fully connected network* that receives the $784$-dimensional vector, and then a *convolutional network* that directly takes the images.

Before continuing, we will need to correctly encode the expected outputs $Y$ for our model.

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Encode and overwrite the Y using one-hot encoding.
</div>

In [None]:
# Your code here

##### **Fully Connected Network**

We will first set the seeds for `tensorflow` and recreate the helper function `plot_loss_history()` to visualize the evolution of the loss during training.

In [None]:
import tensorflow as tf
import pandas as pd
import numpy as np
import os, random

# Set the seeds for the libraries to ensure reproducibility of the results.
os.environ['PYTHONHASHSEED'] = str(seed)
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

def plot_loss_history(history):
    # Extract loss data from history
    loss = history.history['loss']
    val_loss = history.history.get('val_loss', None)  # May not exist if validation was not used
    epochs = range(1, len(loss) + 1)

    # Create a DataFrame for seaborn
    data = pd.DataFrame({ 'Epoch': list(epochs) * 2, 'Loss': loss + (val_loss if val_loss else []), 'Type': ['Train'] * len(loss) + (['Validation'] * len(val_loss) if val_loss else []) })

    # Create the plot
    plt.figure(figsize = (10, 5))
    sns.lineplot(data = data, x = 'Epoch', y = 'Loss', hue = 'Type')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title('Loss Evolution During Training')
    plt.legend(title = 'Dataset')
    plt.grid(True)
    plt.show()

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Create a <i>fully connected network</i> to solve this problem and fill in the table. Adjust the hyperparameters if you wish.
</div>

<center>

| Model             | Accuracy (Test) | F1 (Test) |
|-------------------|-----------------|-----------|
| Baseline Random   | 0.114           | 0.032     |
| Baseline Zero-R   | 0.114           | 0.020     |
| KNN               | 0.211           | 0.120     |
| Decision Tree     | 0.202           | 0.112     |
| Neural Network    |                 |           |

</center>

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import Adam

def red_fully_connected(learning_rate):

    model = Sequential()

    # Your code here

    optim = Adam(learning_rate = learning_rate)
    model.compile(loss = 'categorical_crossentropy', optimizer = optim, metrics = ['accuracy'])

    return model

# Create the model
model_fcn = red_fully_connected(learning_rate = 0.0005)

# Show model summary
model_fcn.summary()

# Train the model
history = model_fcn.fit(x_train_vector, y_train, validation_split = 0.2, batch_size = 256, epochs = 20, verbose = 2)

# Visualize loss history
plot_loss_history(history)

# Evaluate the model on test data
preds_test = model_fcn.predict(x_test_vector)
evaluate_model(y_test, preds_test, 'Neural Network', average = 'macro')

##### **Convolutional Neural Network**

Below is the architecture of a convolutional neural network designed to solve this problem.

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Complete the <i>input size</i>, the <i>activation function of the last layer</i>, and the <i>loss function</i>.
</div>

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.optimizers import Adam

def convolutional_network(learning_rate):
    model = Sequential()
    model.add(Input(shape = ()))  # Input as grayscale image

    # FEATURE EXTRACTOR -------------------------------------------
    # First convolution and max pooling block
    model.add(Conv2D(16, kernel_size = (3, 3), activation = 'relu'))
    model.add(MaxPooling2D(pool_size = (2, 2)))
    # Second convolution and max pooling block
    model.add(Conv2D(32, kernel_size = (3, 3), activation = 'relu'))
    model.add(MaxPooling2D(pool_size = (2, 2)))
    # Third convolution and max pooling block
    model.add(Conv2D(64, kernel_size = (3, 3), activation = 'relu'))
    model.add(MaxPooling2D(pool_size = (2, 2)))

    # FULLY CONNECTED PART ---------------------------------------------
    # Flatten and final dense layers
    model.add(Flatten())  # This vector is learned by the network during training and helps represent each image
    model.add(Dense(32, activation = 'relu'))
    model.add(Dense(10, activation = '', name = 'output_layer'))

    optim = Adam(learning_rate = learning_rate)
    model.compile(loss = '', optimizer = optim)

    return model

# Create the network from scratch
model_cnn = convolutional_network(learning_rate = 0.0005)

# View the summary
model_cnn.summary()

Finally, we train and evaluate on the test set.

In [None]:
# Train the model
history = model_cnn.fit(x_train, y_train, validation_split = 0.2, batch_size = 128, epochs = 20, verbose = 2)

# Visualize loss history
plot_loss_history(history)

<div class="alert alert-block alert-info">
    <b>Exercise:</b> Add the necessary code to evaluate your model and fill in the table.
</div>

<center>

| Model             | Accuracy (Test) | F1 (Test) |
|-------------------|-----------------|-----------|
| Baseline Random   | 0.114           | 0.032     |
| Baseline Zero-R   | 0.114           | 0.020     |
| KNN               | 0.211           | 0.120     |
| Decision Tree     | 0.202           | 0.112     |
| Neural Network    |                 |           |
| Convolutional Net |                 |           |

</center>

In [None]:
# Your code here

<hr>

## **4. Exercises**

<div class="alert alert-block alert-success">
    <b>Create a model that, given an image of a digit, predicts if <u>it is a four or a nine.</u> The code to create the necessary dataset is already provided.</b>
</div>

In [None]:
import numpy as np
import tensorflow as tf

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

def filter_and_relabel(x, y, class_pos = 1, class_neg = 7):
    # Filter only the desired digits
    idx = np.where((y == class_pos) | (y == class_neg))[0]
    x_filtered = x[idx]
    y_filtered = y[idx]

    # Label: 1 if it's class_pos, 0 if it's class_neg
    y_binary = (y_filtered == class_pos).astype(np.uint8)

    # Balance (it should already be almost balanced)
    idx_pos = np.where(y_binary == 1)[0]
    idx_neg = np.where(y_binary == 0)[0]

    n = min(len(idx_pos), len(idx_neg))
    np.random.seed(42)
    idx_pos_sampled = np.random.choice(idx_pos, size = n, replace = False)
    idx_neg_sampled = np.random.choice(idx_neg, size = n, replace = False)

    idx_total = np.concatenate([idx_pos_sampled, idx_neg_sampled])
    np.random.shuffle(idx_total)

    return x_filtered[idx_total], y_binary[idx_total]

# Apply for train and test
x_train, y_train = filter_and_relabel(x_train, y_train, class_pos = 4, class_neg = 9)
x_test, y_test = filter_and_relabel(x_test, y_test, class_pos = 4, class_neg = 9)

In [None]:
# Your code here