# **White Blood Cell Classification**

## Objective
In this notebook, you will train a convolutional neural network (ConvNet) to classify white blood cells into two categories:
1. Lymphoblasts (indicative of leukemia)
2. Normal white blood cells

We will use a custom dataset for this classification task. Details on the initial dataset can be found here: https://scotti.di.unimi.it/all/

The exercise will guide you step-by-step from data loading and preparation to building and training a deep learning model.

Types of leukemia: https://www.leukaemiacare.org.uk/types-of-leukaemia/

Morphology of leukemia: https://www.sciencedirect.com/science/article/pii/S0185106315000724

Examples of different WBC (Fig. 2): https://www.nature.com/articles/s41597-023-02378-7

Review on blood film analysis analysis with AI: https://www.sciencedirect.com/science/article/pii/S0268960X23001054



In [None]:
from google.colab import drive
drive.mount('/content/drive/')

## 📚 0. Import Libraries (Nothing to do)

In [None]:
# Importing necessary libraries
import tensorflow as tf  # TensorFlow for deep learning
import os  # For handling file paths
import numpy as np  # For data manipulation
import matplotlib.pyplot as plt  # For visualizing the dataset

from tensorflow.keras.preprocessing.image import ImageDataGenerator  # For loading and augmenting image data
from sklearn.model_selection import train_test_split  # For splitting custom dataset into train and test


## 👓 1. Download the data (Nothing to do)
You will use a custom dataset of images that contain lymphoblasts and normal white blood cells. Make sure the dataset is organized in two folders: "lymphoblast" and "normal", with images placed in their respective folders.


In [None]:
# Define dataset paths
base_dir = '/content/drive/MyDrive/all_data/'  # Update this to the actual path
#normal_dir = os.path.join(base_dir, 'normal')
#lymphoblast_dir = os.path.join(base_dir, 'lymphoblasts')

# Define parameters
IMG_HEIGHT, IMG_WIDTH = 128, 128
BATCH_SIZE = 32
NUM_EPOCHS=10
RAN_SEED=17

# Use ImageDataGenerator for loading and splitting data
datagen = ImageDataGenerator(
    rescale=1.0/255,  # Normalize pixel values
    validation_split=0.5)  # Split 20% of the data for validation

# Creating train and validation datasets
train_data = datagen.flow_from_directory(
    base_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training',
    seed=RAN_SEED
)

validation_data = datagen.flow_from_directory(
    base_dir,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation',
    seed=RAN_SEED
)

print("Training and validation datasets prepared successfully.")


###  1.2 : Visualize the Dataset

❗ Display 10 images from the training dataset to understand the data better. Use the function bellow or build your own.


In [None]:
# Function to display a few images from the dataset
def visualize_dataset(dataset, num_images=5):
    plt.figure(figsize=(10, 10))
    for i, (image, label) in enumerate(dataset):
        if i >= num_images:
            break
        plt.subplot(1, num_images, i + 1)
        plt.imshow(image[0])  # Image comes as a batch, so we take the first image
        plt.title("Lymphoblast" if label[0] == 1 else "Normal")
        plt.axis("off")
    plt.show()



# Displaying a few images from the training dataset
## ... add code here ... ##


## 🧱 3. Build a simple ConvNet

Here you will build a simple convolutional neural network (CNN) to classify lymphoblasts and normal white blood cells.

❗ **Add in the missing layers**. To prevent overfitting, add also 0.2 dropout rate during training where specified. Hint: check out https://keras.io/api/layers/

In [None]:
# Build a simple CNN model
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
    #Add a pooling layer here. You need to add "," after each layer
    ## add your code here ###,

    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    #Add a pooling layer here. You need to add "," after each layer
    ## add your code here #### ,

    #Add a Convolutional layer here with 64 filters. You need to add "," after each layer
    ## add your code here.... ##,

    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
     #Add a dropout layer here with a rate of 0.2. [In sequential mode, you need to add "," after each layer]
    tf.keras.layers.Dense(1, activation='sigmoid')  # Binary classification output
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)




❗**How many parameters does your model have?** Hint: use the keras.model API to summarize your model: https://keras.io/api/models/

In [None]:
# Print the model summary
## .. add line of code here .. ##

### 3.1 Train the Model

We will now train the model using the training and validation datasets.


In [None]:
# Train the model
history = model.fit(
    train_data,
    validation_data=validation_data,
    epochs=NUM_EPOCHS,  # You can increase the number of epochs based on your needs
    verbose=1
)


### 3.1: Evaluate and Visualize Results

❗Evaluate your model on the validation data. Hint: call the apropriate method using the keras API : https://keras.io/api/models/

What is the validation accuracy?

In [None]:
# Evaluate the model
loss, accuracy = ### write your line of code here ###
print(f"Validation Accuracy: {accuracy:.2f}")




# Plot training and validation accuracy over epochs
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(len(acc))

plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')

plt.show()


## ➡️ 4. Transfer Learning

In this section, you will use a pretrained MobileNet model, leveraging its features for classifying lymphoblasts vs normal white blood cells.

❗Modify the code below so that your model uses MobileNet's convolutional layers (without the top fully connected layers) pretrained on the ImageNet dataset. Hint: checkout the keras aplications examples: https://keras.io/api/applications/

❗Add a global pooling layer after the base model (Hint: it's already imported).

❗ Compare the following the following:

a) train only the FC layers (freezing the convnet layers)

b) train from scratch (no pre-trained weights) - (only if you have time left)

c) fine-tune all layers - (only if you have time left)

❗For a) you need to freeze the MobileNet base model layers so that only the newly added fully connected dense layers are optimized. (Hint: check out the keras layer documentation: https://keras.io/api/layers/base_layer/#layer-class. You need to change the value of one of the layer attributes) Or have a look at : https://keras.io/guides/transfer_learning/

In [None]:
import tensorflow
from tensorflow.keras.applications.mobilenet import MobileNet
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.layers import Input
# Load MobileNetV2 pretrained on ImageNet and add custom layers for our binary classification task
input_tensor = Input(shape=(IMG_HEIGHT, IMG_WIDTH, 3))
# create the base pre-trained model on ImageNet with a custom input tensor

base_model = MobileNet(
     ### ... add lines here ...###
)

# Freeze the base model to use it as a feature extractor
## ... add code here ... ##

# Add custom layers on top of the base model
model_mobilenet = tf.keras.Sequential([
    base_model,
    ## ... add code here ##
    # Add a global spatial average pooling layer
    ## .. add code here ###,

    # let's add a fully-connected layer
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),  # Add dropout to prevent overfitting
    # and a classification layer
    tf.keras.layers.Dense(1, activation='sigmoid')  # Binary classification output
])

#model_mobilenet2 = tensorflow.keras.models.clone_model(model_mobilenet)

# Compile the model
model_mobilenet.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Print the model summary
#model_mobilenet.summary()

### 4.1 Train the MobileNet Model (Nothing to do)

You will now train the model using the training and validation datasets.

In [None]:
# Train the model
history_mobilenet = model_mobilenet.fit(
    train_data,
    validation_data=validation_data,
    epochs=NUM_EPOCHS,  # You can increase the number of epochs based on your needs
    verbose=1
)

### 4.2 Visualize the results

In [None]:
# Evaluate the model
loss, accuracy = model_mobilenet.evaluate(validation_data)
print(f"Validation Accuracy: {accuracy:.2f}")

# Plot training and validation accuracy over epochs
acc = history_mobilenet.history['accuracy']
val_acc = history_mobilenet.history['val_accuracy']
loss = history_mobilenet.history['loss']
val_loss = history_mobilenet.history['val_loss']

epochs_range = range(len(acc))

plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')

plt.show()

## 🔧 5. Data Augmentation

To make our model more robust and reduce overfitting, here you will apply data augmentation techniques.


❗ **Add augmentation layers to your model before the first convolutional layer below.** An example of such a layer is provided. Add one geometrical augmentation and one spectral (related to the image color/luminance). *Hint*: use the image augmentation layers provided by the keras api:
https://keras.io/api/layers/preprocessing_layers/image_augmentation/

In [None]:
# Adding data augmentation to the training dataset

# Add augmentation layers before the base model
model_mobilenet2 = tf.keras.Sequential([
    #Add augmentation layers here [In sequential mode, you need to add "," after each layer]
   ##...add line of code here ..##

    base_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),  # Add dropout to prevent overfitting
    tf.keras.layers.Dense(1, activation='sigmoid')  # Binary classification output
])
# Compile the model
model_mobilenet2.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)
# Re-train the original model with augmented data
history_augmented = model_mobilenet2.fit(
    train_data,
    validation_data=validation_data,
    epochs=NUM_EPOCHS,  # You can adjust the number of epochs
    verbose=1
)

### 5.1 Visualize the results

In [None]:
# Evaluate the model
loss, accuracy = model_mobilenet2.evaluate(validation_data)
print(f"Validation Accuracy: {accuracy:.2f}")

# Plot training and validation accuracy over epochs
acc = history_augmented.history['accuracy']
val_acc = history_augmented.history['val_accuracy']
loss = history_augmented.history['loss']
val_loss = history_augmented.history['val_loss']

epochs_range = range(len(acc))

plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')

plt.show()

### 5.2 Compare the performance of your models

❗**Plot the ROC curves and compute the AUC for each of your trained models**. Use the function bellow or build your own.  

In [None]:
from sklearn.metrics import roc_curve, auc
import numpy as np

# Function to plot ROC curves for multiple models on the same plot
def plot_roc_curves(models, validation_data_list, labels, title="ROC Curve Comparison"):
    plt.figure(figsize=(10, 6))

    # Iterate over each model and its corresponding validation data
    for model, validation_data, label in zip(models, validation_data_list, labels):
        y_true = []
        y_pred = []

        # Get true labels and predicted probabilities
        for images, labels_batch in validation_data:
            y_true.extend(labels_batch)
            preds = model.predict(images)
            y_pred.extend(preds)

            # Break after one full pass (since it's a generator)
            if len(y_true) >= validation_data.samples:
                break

        y_true = np.array(y_true)
        y_pred = np.array(y_pred)

        # Compute False Positive Rate (FPR) and True Positive Rate (TPR)
        fpr, tpr, _ = roc_curve(y_true, y_pred)
        roc_auc = auc(fpr, tpr)

        # Plot the ROC Curve for the current model
        plt.plot(fpr, tpr, lw=2, label=f'{label} (AUC = {roc_auc:.2f})')

    # Plot the random guess line
    plt.plot([0, 1], [0, 1], color='gray', linestyle='--')  # Random guess line

    # Configure plot settings
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title(title)
    plt.legend(loc="lower right")
    plt.show()

# Plot ROC curves for the original model and MobileNet models on the same plot

###... Add your code here ... ###

