# DLiP Group 1: Malaria Detection Using Machine Learning

# Table of Contents

* [Introduction](#section-one)
* [Data Preprocessing](#section-two)
* [Model Fitting](#section-three)
    - [Basic Convolutional Network from scratch](#subsection-one)
    - [Pretrained Convolutional Network 1](#subsection-two)
        - [Model Improvement](#subsection-two-one)
    - [Pretrained Convolutional Network 2](#subsection-three)
        - [Model Improvement](#subsection-three-one)
    - [Customized Convolutional Network](#subsection-four)
* [Results](#section-four)
    - [Model Interpretation](#subsection-fourone)
* [Conclusion & Discussion](#section-five)
* [References](#section-six)
* [Member Contribution](#section-seven)

<a id="section-one"></a>
# 1. Introduction

## Data considerations

This project aims to build an algorithm to detect malaria parasites from thin blood smear images. This project was adopted from the study of Sivaramakrishnan et al. (2018) and the article from Sarkar (2019), by using the image dataset from the official NIH website. Our motivation is to replicate or improve the performance of the findings of the aforementioned literature. Therefore, the research question of this project is:

> <h4> Can we detect malaria parasite from blood images? </h4>   

First, we would like to give an overview of the data:

**1. Where does the data come from?**
* The data comes from the official [National Library of Medicine (NIH) website](https://ceb.nlm.nih.gov/repositories/malaria-datasets/), which provides a dataset for supporting automated image processing on clinical decision-making in disease screening and diagnostics.
* Since we only use thin smear images, rather than thick ones; our automated processing might not be generalizable to all kinds of malaria parasites. Additionally, we are not using other similar parasite-infected blood images to differentiate the training process, meaning it could be possible for the network to confuse with other diseases in other sets of blood images.

**2. What are candidate machine learning methods?**

* We have based our decision on Sivaramakrishnan et al. (2018) and Sarkar (2019):
<div style="display:flex; flex-direction: row; flex-wrap: nowrap; align-items: stretch; width:100%;">
    <div>
        <ul>
<h5> Model Candidates: </h5>          
<li> Basic convolutional neural network from scratch
<li> Pre-trained convolutional neural network
<li> Pre-trained convolutional neural network with Image Augmentation
<li> Customed convolutional neural network
    </div>

**3. What is the Bayes' error bound? (Any guestimate from scientific literature or web resources?)** 
* To have an idea of a lower bound on the Bayes bound, the best 'machine' we have to identify malaria parasites from blood images is human. The most popular method for Malaria diagnostic tests is the polymerase chain reaction (PCR). According to the findings of Feleke, Alemu, and Yemanebirhane (2021), the average performance on several tests was:


| AUC  |  SENSITIVITY  |  SPECIFICITY |
|---------:|--------:|--------:|
|  83.00%   | 75.20%   | 97.12%   |

--------------------------------

<a id="section-two"></a>
# 2. Data Preprocessing

The current step will involve:
1. Importing necessary packages and loading the data into a data frame;
2. Checking image size and quality;
3. Resizing images;
4. Splitting into training, validation, and a test data

## Import packages & Load data
All the packages necessary for the project are loaded below:

In [None]:
#import packages
import os
import numpy as np
import pandas as pd
from tqdm import tqdm
from skimage.io import imread, imshow
from skimage.transform import resize
import tensorflow as tf
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from PIL import Image
import glob
import PIL
import os.path
import cv2
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers, callbacks
import pandas as pd
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers.experimental import preprocessing
import sklearn.metrics as skmet
import itertools
from keras.models import load_model

Below, we have specified paths for 1) infected and 2) uninfected files which contain blood images with malaria parasites and images without the parasite respectively. Then, the data frame is defined with both files included, which is shown below.

In [None]:
#Load in the data 
infected_path = "/kaggle/input/cell-images-for-detecting-malaria/cell_images/Parasitized"
uninfected_path = "/kaggle/input/cell-images-for-detecting-malaria/cell_images/Uninfected"
infected_files = glob.glob(infected_path+'/*.png')
uninfected_files = glob.glob(uninfected_path+'/*.png')

np.random.seed(42)

files_df = pd.DataFrame({
    'filename': infected_files + uninfected_files,
    'label': ['malaria'] * len(infected_files) + ['healthy'] * len(uninfected_files)
}).sample(frac=1, random_state=42).reset_index(drop=True)

files_df.head()

## Check image size and quality

Now, we will check whether the image was loaded correctly. The following code will provide several images that indicate the color and quality of the images.

In [None]:
# code for displaying multiple images in one figure
# create figure
fig = plt.figure(figsize = (10, 7))

# setting values to rows and column variables
rows = 2
columns = 3

# reading images
M1 = Image.open(infected_files[567])
M2 = Image.open(infected_files[4783])
M3 = Image.open(infected_files[89])
H1 = Image.open(uninfected_files[20])
H2 = Image.open(uninfected_files[678])
H3 = Image.open(uninfected_files[472])


# Adds a subplot at the 1st position
fig.add_subplot(rows, columns, 1)

# showing image
plt.imshow(M1)
plt.axis("off")
plt.title("Malaria")

# Adds a subplot at the 2nd position
fig.add_subplot(rows, columns, 2)

# showing image
plt.imshow(M2)
plt.axis("off")
plt.title("Malaria")

# Adds a subplot at the 3rd position
fig.add_subplot(rows, columns, 3)

# showing image
plt.imshow(M3)
plt.axis("off")
plt.title("Malaria")

# Adds a subplot at the 4th position
fig.add_subplot(rows, columns, 4)

# showing image
plt.imshow(H1)
plt.axis("off")
plt.title("Healthy")

# Adds a subplot at the 5th position
fig.add_subplot(rows, columns, 5)

# showing image
plt.imshow(H2)
plt.axis("off")
plt.title("Healthy")

# Adds a subplot at the 6th position
fig.add_subplot(rows, columns, 6)

# showing image
plt.imshow(H3)
plt.axis("off")
plt.title("Healthy")

As the quality seems appropriate, we would like to check if the size of the images are the same and appropriate to proceed with model fitting.

In [None]:
# Check if the images are the same size

# get width and height
width1 = M1.width
height1 = M1.height
  
# display width and height
print("The height of image 1 is: ", height1)
print("The width of image 1 is: ", width1)

# get width and height
width2 = M2.width
height2 = M2.height
  
# display width and height
print("The height of image 2 is: ", height2)
print("The width of image 2 is: ", width2)


As can be seen from the outputs, the images are not the same size so first we need to resize them. As we have quite a lot of data, we would like to resize them to **50 x 50** pixels which would reduce the working memory of the computer and allow faster model fitting. Also, in order to correctly measure for the performance of the model, we scaled the data by dividing the data by **255** (as in the article by Sarkar (2019)).

In [None]:
resized_df = []
labels_names = []
for file in range(0, len(files_df)):
    img_array = cv2.imread(files_df.iloc[file,0])
    img = Image.fromarray(img_array, 'RGB')
    img_resized = img.resize((50,50), Image.ANTIALIAS)
    resized_df.append(np.array(img_resized))
    labels_names.append(files_df.iloc[file,1])

# recode labels to numbers instead of names
labels_recoded = []
for i in labels_names:
    if i == "malaria":
        labels_recoded.append(1)
    else:
        labels_recoded.append(0)


# change the data and the labels to arrays so it's suitable for the CNN
data = np.array(resized_df)
labels = np.array(labels_recoded)

# change to datatypes suitable for the CNN
data = data.astype(np.float32) 
labels = labels.astype(np.int32) 

# scale the data
data = data/255

Now, we can check if the resizing went well:

In [None]:
#Check if resizing worked
print("There are", data.shape[0], "pictures.") #All picteres are there!
print("All picture have a width of", data.shape[1], "and a length of", data.shape[2], "pixels.")
print("And all picture have", data.shape[3], "color channels.")

All pictures are there and the resizing worked. As an example, the following picture shows how the pictures look after resizing

In [None]:
# get image
img = data[1]

# show the image
plt.imshow(img)
plt.axis('off')
plt.show();

## Splitting data

To avoid overfitting (the model fits the training data too well), we split the data into training, validation, and test data using 60:10:30 ratios. This will ensure that our model is performing well and not losing any rich information.

In [None]:
# split data set into train, validation and test datset using 60:10:30 split
train_data, test_data, train_labels, test_labels = train_test_split(data,
                                                                    labels, 
                                                                    test_size = 0.3,
                                                                    random_state = 42)
train_data, val_data, train_labels, val_labels = train_test_split(train_data,
                                                                  train_labels,
                                                                  test_size = 0.1,
                                                                  random_state = 42)

In [None]:
print("Train files:", len(train_data),"\nVal files:", len(val_data),"\nTest files:", len(test_data))
print('Train labels:', len(train_labels), '\nVal labels:', len(val_labels), '\nTest labels:', len(test_labels))

<a id="section-three"></a>
# 3. Model Fitting

In total, we fitted four models, where the first 3 models were adopted from Sarkar (2019) and the last model was customized to increase performance on model fitting. The order follows:
1. Basic Convolutional Network from scratch
2. Pretrained Convolutional Network (VGG-19)
3. Pretrained Convolutional Network with Data Augmentation and Fine Tuning
4. Customized Convolutional Network

Before diving into the specifics of each model, we would like to mention that for all the models, the input_shape is the same as we don't change the images anymore. Therefore, we already specify the input shape for all the models.

In [None]:
input_shape = [50, 50, 3]

<a id="subsection-one"></a>
## 3.1 Basic Convolutional Network from scratch

#### Model Architecture
The model is specified according to Sarkar (2019); using three types of layers: Convolutional, Pooling, and Fully Connected layers.

In [None]:
model1 = keras.Sequential([
    # Input Shape
    layers.InputLayer(input_shape = input_shape),

    # First Convolutional Block
    layers.Conv2D(filters = 32, kernel_size = 3, activation = "relu", padding = "same"),
    layers.MaxPool2D(),

    # Second Convolutional Block
    layers.Conv2D(filters = 64, kernel_size = 3, activation = "relu", padding = "same"),
    layers.MaxPool2D(),

    # Third Convolutional Block
    layers.Conv2D(filters = 128, kernel_size = 3, activation = "relu", padding = "same"),
    layers.MaxPool2D(),

    # Classifier Head
    layers.Flatten(),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(1, activation = "sigmoid")
])
model1.summary()

#### Train the model
Now, we use training data and validation data to fit model 1.
During the training process, we had one issue which was having the warnings of "bad runs" and "faulty starting values". Because of these, we have saved a model, using `model1.save` code, which resulted from a good run to ensure that it still can be used for further evaluation of the model (in the result section).

In [None]:
model1.compile(
    optimizer = "adam",
    loss = "binary_crossentropy",
    metrics = ["accuracy"]
)

history1 = model1.fit(
    train_data,
    train_labels,
    validation_data = (val_data, val_labels),
    batch_size = 64,
    epochs = 25,
    verbose = 1,
)

model1.save("Model1.h5")

#### Plot
Then, we plot the model performance by comparing the accuracy and loss of train and validation sets.


In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize = (12, 4))
t = f.suptitle("Model 1 Performance", fontsize = 12)
f.subplots_adjust(top = 0.85, wspace = 0.3)

max_epoch = len(history1.history["accuracy"])+1
epoch_list = list(range(1, max_epoch))
ax1.plot(epoch_list, history1.history["accuracy"], label = "Train Accuracy")
ax1.plot(epoch_list, history1.history["val_accuracy"], label = "Validation Accuracy")
ax1.set_xticks(np.arange(1, max_epoch, 5))
ax1.set_yticks(np.arange(0.75, 1, 0.02))
ax1.set_ylabel("Accuracy Value")
ax1.set_xlabel("Epoch")
ax1.set_title("Accuracy")
l1 = ax1.legend(loc = 4)

ax2.plot(epoch_list, history1.history["loss"], label = "Train Loss")
ax2.plot(epoch_list, history1.history["val_loss"], label = "Validation Loss")
ax2.set_xticks(np.arange(1, max_epoch, 5))
ax2.set_yticks(np.arange(0.0, 0.5, 0.05))
ax2.set_ylabel("Loss Value")
ax2.set_xlabel("Epoch")
ax2.set_title("Loss")
l2 = ax2.legend(loc = "best")

It can be seen that the training and validation accuracy are both high (around 0.95), and after a certain amount of epochs (around 20), the training accuracy reaches an accuracy of (almost) 1.
When we observe validation loss, we see that after 6 epochs, it exponentially increases which indicates that the model is overfitting.

In [None]:
history1_frame = pd.DataFrame(history1.history)
print("Model 1 maximum validation accuracy: {}".format(history1_frame["val_accuracy"].max()))

<a id="subsection-two"></a>
## 3.2 Pretrained Convolutional Network (VGG-19)

#### Model Architecture
We use the VGG-19 model that is defined in the article. This model architecture contains 19 layers (convolutional and fully connected layers). The original model has a total of 16 convolution layers with 3x3 filters, and a total of two fully connected layers of 4096 units in each layer followed by a dense layer of 1000 units. The article, however, removed the last three layers from the original VGG-19 model so that it is used more effectively as a feature extractor.


In [None]:
# loading the VGG-19 model 
vgg = tf.keras.applications.vgg19.VGG19(include_top = False, weights = "imagenet", 
                                        input_shape = input_shape)

# Freeze the layers
for layer in vgg.layers:
    layer.trainable = False
    
# The pretrained base should not be trainable
# because otherwise the pre-trained weights would be updated with backpropagation
# better: fine-tune the model later 
vgg.trainable = False
    
# Attach head of Dense layers to perform the classification
# based on article, and added layers.BatchNormalization()!
model2 = keras.Sequential([
    vgg,
    layers.Flatten(),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(1, activation = "sigmoid")
])

In [None]:
print("Total Layers:", len(model2.layers))
print("Total trainable layers:", 
      sum([1 for l in model2.layers if l.trainable]))


#### Train the model

In [None]:
# specify loss function to be minimized & performance metrics
model2.compile(optimizer = tf.keras.optimizers.RMSprop(lr=1e-4),
                loss = "binary_crossentropy",
                metrics = ["accuracy"])
model2.summary()

In [None]:
history2 = model2.fit(train_data,
                    train_labels, 
                    validation_data = (val_data,val_labels),
                    batch_size = 64,
                    epochs = 25,
                    verbose = 1)

model2.save("Model2.h5")

#### Plot
Now the model is fitted, we will check the model performance by plotting the accuracy and loss of both training and validation sets.

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize = (12, 4))
t = f.suptitle("Model 2 Performance", fontsize = 12)
f.subplots_adjust(top = 0.85, wspace = 0.3)

max_epoch = len(history2.history["accuracy"])+1
epoch_list = list(range(1, max_epoch))
ax1.plot(epoch_list, history2.history["accuracy"], label = "Train Accuracy")
ax1.plot(epoch_list, history2.history["val_accuracy"], label = "Validation Accuracy")
ax1.set_xticks(np.arange(1, max_epoch, 5))
ax1.set_yticks(np.arange(0.75, 1, 0.02))
ax1.set_ylabel("Accuracy Value")
ax1.set_xlabel("Epoch")
ax1.set_title("Accuracy")
l1 = ax1.legend(loc = 4)

ax2.plot(epoch_list, history2.history["loss"], label = "Train Loss")
ax2.plot(epoch_list, history2.history["val_loss"], label = "Validation Loss")
ax2.set_xticks(np.arange(1, max_epoch, 5))
ax2.set_yticks(np.arange(0.0, 0.5, 0.05))
ax2.set_ylabel("Loss Value")
ax2.set_xlabel("Epoch")
ax2.set_title("Loss")
l2 = ax2.legend(loc = "best")

From the plot, we see that the validation accuracy is quite wiggly and unstable. The accuracy seemed to have dropped compared to that of the first basic CNN model, although the trend of overfitting seems to be less than before.

In [None]:
history2_frame = pd.DataFrame(history2.history)
print("Model 2 maximum validation accuracy: {}".format(history2_frame["val_accuracy"].max()))

<a id="subsection-two-one"></a>
### Improving Model 2
Due to the wiggly line and low performance of model 2, we will try to improve the model by adding optional stopping and adam optimizer.
Below we first specify early stopping and define model architecture as always.

In [None]:
early_stopping = EarlyStopping(
    min_delta = 0.001, # minimium amount of change to count as an improvement
    patience = 5, # how many epochs to wait before stopping
    restore_best_weights = True,
)

model2_2 = keras.Sequential([
    vgg,
    layers.Flatten(),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(1, activation = "sigmoid")
])
model2_2.summary()

#### Train the model

In [None]:
model2_2.compile(
    optimizer = "adam",
    loss = "binary_crossentropy",
    metrics = ["accuracy"]
)

history2_2 = model2_2.fit(
    train_data,
    train_labels,
    validation_data = (val_data,val_labels),
    batch_size = 64,
    epochs = 25,
    callbacks = [early_stopping],
    verbose = 1
)

model2_2.save("Model2_2.h5")

#### Plot

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize = (12, 4))
t = f.suptitle("Model 2.2 Performance", fontsize = 12)
f.subplots_adjust(top = 0.85, wspace = 0.3)

max_epoch = len(history2_2.history["accuracy"])+1
epoch_list = list(range(1, max_epoch))
ax1.plot(epoch_list, history2_2.history["accuracy"], label = "Train Accuracy")
ax1.plot(epoch_list, history2_2.history["val_accuracy"], label = "Validation Accuracy")
ax1.set_xticks(np.arange(1, max_epoch, 5))
ax1.set_yticks(np.arange(0.75, 1, 0.02))
ax1.set_ylabel("Accuracy Value")
ax1.set_xlabel("Epoch")
ax1.set_title("Accuracy")
l1 = ax1.legend(loc = 4)

ax2.plot(epoch_list, history2_2.history["loss"], label = "Train Loss")
ax2.plot(epoch_list, history2_2.history["val_loss"], label = "Validation Loss")
ax2.set_xticks(np.arange(1, max_epoch, 5))
ax2.set_yticks(np.arange(0.0, 0.5, 0.05))
ax2.set_ylabel("Loss Value")
ax2.set_xlabel("Epoch")
ax2.set_title("Loss")
l2 = ax2.legend(loc = "best")

We can observe that the lines of the validation sets are a little more stabilized and the trend is in accordance with the training sets. The loss seemed to have reduced, showing that there is less potential for the model to be overfitted.

In [None]:
history2_2_frame = pd.DataFrame(history2_2.history)
print("Model 2.2 maximum validation accuracy: {}".format(history2_2_frame["val_accuracy"].max()))

<a id="subsection-three"></a>
## 3.3 Pretrained Convolutional Network with Data Augumentation and Fine Tuning

#### Model Architecture
In this section, we explore fine-tuning the weights of the layers of the pre-trained VGG-19 model and use image augmentation. We will be using `ImageDataGenerator` in `tf.keras` which helps to build image augmentors.

In [None]:
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(zoom_range = 0.05, 
                                                                rotation_range = 25,
                                                                width_shift_range = 0.05, 
                                                                height_shift_range = 0.05, 
                                                                shear_range = 0.05,
                                                                horizontal_flip = True, 
                                                                fill_mode = "nearest")

val_datagen = tf.keras.preprocessing.image.ImageDataGenerator()

# configurations as in article
# build image augmentation generators
train_generator = train_datagen.flow(train_data, train_labels, batch_size = 64, shuffle = True)
val_generator = val_datagen.flow(val_data, val_labels, batch_size = 64, shuffle = False)

When we look at the example image transformation, we see that there are variations in the image (slightly).

In [None]:
img_id = 0
sample_generator = train_datagen.flow(train_data[img_id:img_id+1], train_labels[img_id:img_id+1],
                                      batch_size = 1)
sample = [next(sample_generator) for i in range(0,5)]
fig, ax = plt.subplots(1,5, figsize = (16, 6))
print('Labels:', [item[1][0] for item in sample])
l = [ax[i].imshow(sample[i][0][0]) for i in range(0,5)]

Thus, as in the article, we want to ensure that the last two blocks of the model are trainable:

In [None]:
vgg = tf.keras.applications.vgg19.VGG19(include_top = False, weights = "imagenet", 
                                        input_shape = input_shape)

# Freeze the layers
vgg.trainable = True

set_trainable = False
for layer in vgg.layers:
    if layer.name in ["block5_conv1", "block4_conv1"]:
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False

# model based on article 
model3 = keras.Sequential([
    vgg,
    layers.Flatten(),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(1, activation = "sigmoid")
])
model3.summary()

model3.compile(optimizer = tf.keras.optimizers.RMSprop(lr=1e-5),
                loss = "binary_crossentropy",
                metrics = ["accuracy"])

print("Total Layers:", len(model3.layers))
print("Total trainable layers:", sum([1 for l in model3.layers if l.trainable]))

#### Train the model

In [None]:
train_steps_per_epoch = train_generator.n // train_generator.batch_size
val_steps_per_epoch = val_generator.n // val_generator.batch_size
history3 = model3.fit_generator(train_generator,
                                steps_per_epoch = train_steps_per_epoch,
                                epochs = 25,
                                validation_data = val_generator,
                                validation_steps = val_steps_per_epoch, 
                                callbacks = [early_stopping],
                                verbose = 1)


model3.save("Model3.h5")

#### Plot

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize = (12, 4))
t = f.suptitle("Model 3 Performance", fontsize = 12)
f.subplots_adjust(top = 0.85, wspace = 0.3)

max_epoch = len(history3.history["accuracy"])+1
epoch_list = list(range(1, max_epoch))
ax1.plot(epoch_list, history3.history["accuracy"], label = "Train Accuracy")
ax1.plot(epoch_list, history3.history["val_accuracy"], label = "Validation Accuracy")
ax1.set_xticks(np.arange(1, max_epoch, 5))
ax1.set_yticks(np.arange(0.75, 1, 0.02))
ax1.set_ylabel("Accuracy Value")
ax1.set_xlabel("Epoch")
ax1.set_title("Accuracy")
l1 = ax1.legend(loc = 4)

ax2.plot(epoch_list, history3.history["loss"], label = "Train Loss")
ax2.plot(epoch_list, history3.history["val_loss"], label = "Validation Loss")
ax2.set_xticks(np.arange(1, max_epoch, 5))
ax2.set_yticks(np.arange(0.0, 0.5, 0.05))
ax2.set_ylabel("Loss Value")
ax2.set_xlabel("Epoch")
ax2.set_title("Loss")
l2 = ax2.legend(loc = "best")

The performance seemed to increase as compared to the last model. Also, when looking at both accuracy and loss, we can observe that there is not a visible trend of overfitting occurring.

In [None]:
history3_frame = pd.DataFrame(history3.history)
print("Model 3 maximum validation accuracy: {}".format(history3_frame["val_accuracy"].max()))

<a id="subsection-three-one"></a>
### Improving Model 3
We will try to improve the model by adding the adam optimizer, to check whether any tuning can help increase the accuracy.
Spoiler alert: this did not help.

In [None]:
model3_2 = keras.Sequential([
    vgg,
    layers.Flatten(),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(units = 512, activation = "relu"),
    layers.Dropout(rate = 0.3),
    layers.Dense(1, activation = "sigmoid")
])
model3_2.summary()

In [None]:
model3_2.compile(optimizer = "adam",
                loss = "binary_crossentropy",
                metrics = ["accuracy"])

history3_2 = model3_2.fit_generator(
    train_generator, 
    steps_per_epoch = train_steps_per_epoch, 
    epochs = 25,
    validation_data = val_generator, 
    validation_steps = val_steps_per_epoch, 
    callbacks = [early_stopping],
    verbose = 1)

model3_2.save("Model3_2.h5")

#### Plot

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize = (12, 4))
t = f.suptitle("Model 3.2 Performance", fontsize = 12)
f.subplots_adjust(top = 0.85, wspace = 0.3)

max_epoch = len(history3_2.history["accuracy"])+1
epoch_list = list(range(1, max_epoch))
ax1.plot(epoch_list, history3_2.history["accuracy"], label = "Train Accuracy")
ax1.plot(epoch_list, history3_2.history["val_accuracy"], label = "Validation Accuracy")
ax1.set_xticks(np.arange(1, max_epoch, 5))
ax1.set_yticks(np.arange(0.75, 1, 0.02))
ax1.set_ylabel("Accuracy Value")
ax1.set_xlabel("Epoch")
ax1.set_title("Accuracy")
l1 = ax1.legend(loc = 4)

ax2.plot(epoch_list, history3_2.history["loss"], label = "Train Loss")
ax2.plot(epoch_list, history3_2.history["val_loss"], label = "Validation Loss")
ax2.set_xticks(np.arange(1, max_epoch, 5))
ax2.set_yticks(np.arange(0.0, 0.5, 0.05))
ax2.set_ylabel("Loss Value")
ax2.set_xlabel("Epoch")
ax2.set_title("Loss")
l2 = ax2.legend(loc = "best")

As it was already mentioned, the accuracy and loss seemed to be similar to what we observed in the previous model. This indicates that the adam optimizer did not improve the performance of the model. Furthermore, we see that the validation accuracy is quite wiggly and unstable. To conclude, these adjustments didn't improve the model.

In [None]:
history3_2_frame = pd.DataFrame(history3_2.history)
print("Model 3.2 maximum validation accuracy: {}".format(history3_2_frame["val_accuracy"].max()))

<a id="subsection-four"></a>
## 3.4 Customized Convolutional Network

Finally, based on all the models from the article and our attempt to tune the model, we customized the model to increase the overall performance.

#### Attempts & Model Architecture
We tried multiple options and settings to see if we could increase the accuracy.
- Multiple data augmentations and different settings for the data augmentation arguments
- More/less convolutional layers
- More/fewer filters
- Bigger/smaller kernel sizes
- Changing the padding
- More/less hidden layers in the classifier head
- More/fewer nodes in the hidden layers

Most changes made didn't improve the accuracy so, in the end, we decided to keep the network simple. In this way, we keep the training time low and the accuracy is not really affected. We did however want to include a lot of data augmentation to account for possible variations in data.

**In the end, we settled on these options:**
- Data Augmentation
    - Random Contrast (factor 0.05, 0.10, and 0.15)
    - Random Flip (horizontal only as vertical flip drastically decreased the accuracy for some reason)
    - Random Rotation (factor 0.05, 0.10, and 0.15)
 
 
- 3 Convolutional layers and after every layer Max pooling
    - Layer 1: 32 filters, kernel_size of 3, relu activation, and "same" padding.
    - Layer 2: 64 filters, kernel_size of 3, relu activation, and "same" padding.
    - Layer 3: 128 filters, kernel_size of 3, relu activation, and "same" padding.
    

- Classifier head with 2 hidden layers with each 10 units and both with the relu activation.

- Include early stopping to prevent overfitting

In [None]:
model4 = keras.Sequential([
   layers.InputLayer(input_shape = input_shape),
    
    # Data Augmentation
    preprocessing.RandomContrast(factor = 0.05),
    preprocessing.RandomContrast(factor = 0.10),
    preprocessing.RandomContrast(factor = 0.15),
    preprocessing.RandomFlip(mode = "horizontal"),
    preprocessing.RandomRotation(factor = 0.05),
    preprocessing.RandomRotation(factor = 0.10),
    preprocessing.RandomRotation(factor = 0.15),
    
    # First Convolutional Block
    layers.Conv2D(filters = 32, kernel_size = 3, activation = "relu", padding = "same"),
    layers.MaxPool2D(),

    # Second Convolutional Block
    layers.Conv2D(filters = 64, kernel_size = 3, activation = "relu", padding = "same"),
    layers.MaxPool2D(),

    # Third Convolutional Block
    layers.Conv2D(filters = 128, kernel_size = 3, activation = "relu", padding = "same"),
    layers.MaxPool2D(),


    # Classifier Head
    layers.Flatten(),
    layers.Dense(units = 10, activation = "relu"),
    layers.Dense(units = 10, activation = "relu"),
    layers.Dense(units = 1, activation = "sigmoid"),
])
model4.summary()

#### Train model
Since this is a two-class problem, we used the binary versions of cross-entropy and accuracy. We also chose adam optimizer, as it generally performs well.

In [None]:
model4.compile(
    optimizer = "adam",
    loss = "binary_crossentropy",
    metrics = ["accuracy"]
)

history4 = model4.fit(
    train_data,
    train_labels,
    validation_data = (val_data,val_labels),
    epochs = 40,
    callbacks = [early_stopping],
    verbose = 1
)

#### Plot

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize = (12, 4))
t = f.suptitle("Model 4 Performance", fontsize = 12)
f.subplots_adjust(top = 0.85, wspace = 0.3)

max_epoch = len(history4.history["accuracy"])+1
epoch_list = list(range(1, max_epoch))
ax1.plot(epoch_list, history4.history["accuracy"], label = "Train Accuracy")
ax1.plot(epoch_list, history4.history["val_accuracy"], label = "Validation Accuracy")
ax1.set_xticks(np.arange(1, max_epoch, 5))
ax1.set_yticks(np.arange(0.75, 1, 0.02))
ax1.set_ylabel("Accuracy Value")
ax1.set_xlabel("Epoch")
ax1.set_title("Accuracy")
l1 = ax1.legend(loc = 4)

ax2.plot(epoch_list, history4.history["loss"], label = "Train Loss")
ax2.plot(epoch_list, history4.history["val_loss"], label = "Validation Loss")
ax2.set_xticks(np.arange(1, max_epoch, 5))
ax2.set_yticks(np.arange(0.0, 0.5, 0.05))
ax2.set_ylabel("Loss Value")
ax2.set_xlabel("Epoch")
ax2.set_title("Loss")
l2 = ax2.legend(loc = "best")

This seems to be one of the best model performance we have. The validation accuracy seemed to peak among all four models, as well as having almost no loss for both training and validation sets.

In [None]:
history4_frame = pd.DataFrame(history4.history)
print("Model 4 maximum validation accuracy: {}".format(history4_frame["val_accuracy"].max()))

<a id="section-four"></a>
# 4. Result:
Now, we will compare the models in the following step:
1. Use all six models (main four and two fine-tunned models) to predict the test data 
2. Make confusion matrices and calculate the accuracy, precision, recall, and f1-score for every model.

For the confusion matrix, we used a function created by [Roi Polanitzer](https://medium.com/@polanitzer/building-a-convolutional-neural-network-in-python-predict-digits-from-gray-scale-images-of-550d79b358b) (Keep in mind that we made a few changes so the function accommodates our goal)


In [None]:
# Confusion matrix function

def plot_confusion_matrix(cm,
                          classes,
                          title = "Confusion matrix",
                          cmap = plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    """
    plt.imshow(cm, interpolation = "nearest", cmap = cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation = 45)
    plt.yticks(tick_marks, classes)
    thresh = cm.max() / 2
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, 
                 cm[i, j],
                 horizontalalignment = "center",
                 color = "white" if cm[i, j] > thresh else "black")
        
    plt.tight_layout()
    plt.ylabel("True label")
    plt.xlabel("Predicted label")

### Confusion Matrix
First, we will use the function to visualize the confusion matrices of every model, and we will explain what we can conclude from the matrices at the end of section 4.

#### Model 1: Basic Convolutional Network from scratch

In [None]:
# Model 1
model1 = load_model("Model1.h5")

y_pred_model1 = model1.predict(test_data)
y_pred1 = np.round(y_pred_model1)

cm_plot_labels = ["Uninfected", "Infected"]
cm1 = skmet.confusion_matrix(y_true = test_labels, y_pred = y_pred1)

plot_confusion_matrix(cm = cm1, classes = cm_plot_labels, title = "Model 1")


#### Model 2: Pretrained Convolutional Network (VGG-19)

In [None]:
# Model 2
model2 = load_model("Model2.h5")

y_pred_model2 = model2.predict(test_data)
y_pred2 = np.round(y_pred_model2)

cm2 = skmet.confusion_matrix(y_true = test_labels, y_pred = y_pred2)

plot_confusion_matrix(cm = cm2, classes = cm_plot_labels, title = "Model 2")


#### Model 2.2: Pretrained Convolutional Network (VGG-19) with Model Improvement

In [None]:
# Model 2.2
model2_2 = load_model("Model2_2.h5")

y_pred_model2_2 = model2_2.predict(test_data)
y_pred2_2 = np.round(y_pred_model2_2)

cm2_2 = skmet.confusion_matrix(y_true = test_labels, y_pred = y_pred2_2)

plot_confusion_matrix(cm = cm2_2, classes = cm_plot_labels, title = "Model 2.2")

#### Model 3: Pretrained Convolutional Network with Data Augumentation and Fine Tuning

In [None]:
# Model 3
model3 = load_model("Model3.h5")

y_pred_model3 = model3.predict(test_data)
y_pred3 = np.round(y_pred_model3)

cm3 = skmet.confusion_matrix(y_true = test_labels, y_pred = y_pred3)

plot_confusion_matrix(cm = cm3, classes = cm_plot_labels, title = "Model 3")

#### Model 3.2: Pretrained Convolutional Network with Data Augumentation and Fine Tuning & Model Improvement

In [None]:
# Model 3.2
model3_2 = load_model("Model3_2.h5")

y_pred_model3_2 = model3_2.predict(test_data)
y_pred3_2 = np.round(y_pred_model3_2)

cm3_2 = skmet.confusion_matrix(y_true = test_labels, y_pred = y_pred3_2)

plot_confusion_matrix(cm = cm3_2, classes = cm_plot_labels, title = "Model 3.2")

#### Model 4: Customized Convolutional Network

In [None]:
# Model 4
model4 = load_model("Model4.h5")

y_pred_model4 = model4.predict(test_data)
y_pred4 = np.round(y_pred_model4)

cm4 = skmet.confusion_matrix(y_true = test_labels, y_pred = y_pred4)

plot_confusion_matrix(cm = cm4, classes = cm_plot_labels, title = "Model 4")

### Model comparison
Now, we will combine the outputs of the confusion matrix of all models into one big data frame to visualize the difference between all models.

In [None]:
#Model 1

# accuracy
accuracy_m1 = skmet.accuracy_score(test_labels, y_pred1)
# precision 
precision_m1 = skmet.precision_score(test_labels, y_pred1)
# recall
recall_m1 = skmet.recall_score(test_labels, y_pred1)
# f1
f1_m1 = skmet.f1_score(test_labels, y_pred1)


#Model 2

# accuracy
accuracy_m2 = skmet.accuracy_score(test_labels, y_pred2)
# precision 
precision_m2 = skmet.precision_score(test_labels, y_pred2)
# recall
recall_m2 = skmet.recall_score(test_labels, y_pred2)
# f1
f1_m2 = skmet.f1_score(test_labels, y_pred2)


#Model 2.2

# accuracy
accuracy_m2_2 = skmet.accuracy_score(test_labels, y_pred2_2)
# precision 
precision_m2_2 = skmet.precision_score(test_labels, y_pred2_2)
# recall
recall_m2_2 = skmet.recall_score(test_labels, y_pred2_2)
# f1
f1_m2_2 = skmet.f1_score(test_labels, y_pred2_2)


#Model 3

# accuracy
accuracy_m3 = skmet.accuracy_score(test_labels, y_pred3)
# precision 
precision_m3 = skmet.precision_score(test_labels, y_pred3)
# recall
recall_m3 = skmet.recall_score(test_labels, y_pred3)
# f1
f1_m3 = skmet.f1_score(test_labels, y_pred3)


#Model 3.2

# accuracy
accuracy_m3_2 = skmet.accuracy_score(test_labels, y_pred3_2)
# precision 
precision_m3_2 = skmet.precision_score(test_labels, y_pred3_2)
# recall
recall_m3_2 = skmet.recall_score(test_labels, y_pred3_2)
# f1
f1_m3_2 = skmet.f1_score(test_labels, y_pred3_2)


#Model 4

# accuracy
accuracy_m4 = skmet.accuracy_score(test_labels, y_pred4)
# precision 
precision_m4 = skmet.precision_score(test_labels, y_pred4)
# recall
recall_m4 = skmet.recall_score(test_labels, y_pred4)
# f1
f1_m4 = skmet.f1_score(test_labels, y_pred4)


# Create dataframe
data = {"Accuracy": [accuracy_m1, accuracy_m2, accuracy_m2_2, accuracy_m3, accuracy_m3_2, accuracy_m4],
        "Precision": [precision_m1, precision_m2, precision_m2_2, precision_m3, precision_m3_2, precision_m4],
        "Recall": [recall_m1, recall_m2, recall_m2_2, recall_m3, recall_m3_2, recall_m4],
        "F1-score": [f1_m1, f1_m2, f1_m2_2, f1_m3, f1_m3_2, f1_m4]
       }
 
model_comp = pd.DataFrame(data, index = ["Model 1", "Model 2", "Model 2.2", "Model 3", "Model 3.2", "Model 4"])
model_comp

<a id="subsection-fourone"></a>
## Table interpretation
### 1. Accuracy:
Among all the models, `Model 3` - Data augumented pre-trained model with fine tuning - appears to be the most accurate when predicting the test sets. It seems like `Model 1`, `Model 3.2` and `Model 4` also have similarly high accuracy scores.
The accuracy scores for the Basic CNN, Pretrained VGG-19, and VGG-19 Fine-tuned models in the article were: 0.9497, 0.9376, and 0.9600 respectively. Considering randomness and robustness in the splitted data and model fitting, we can assume that the results are successfully replicated.

### 2. Precision:
Precision, again, is the highest for `Model 3` among all the models. This indicates that the number of uninfected images correctly predicted to belong in the uninfected images is 0.972. Similarly, the precision scores for `Model 1`, `Model 3.2` and `Model 4` were high as well.

### 3. Recall / Sensitivity:
The recall score of `Model 3` is again the highest among all. Except for `Model 2` and `Model 2.2`, which have around 0.90 recall scores, every model seems to perform well as they all have recall scores higher than  0.93.

### 4. F1-Score:
As F1 is the average score of precision and recall of the model, we again have `Model 3` as the best performing model. The high F1 score also indicates that we have low false positives and false negatives.

### Robustness analysis:
When comparing the accuracy scores with training data with test data, we observed that the scores approximately lie in a similar range of values. This indicates that the model performances do not vary a lot when using new sets of data vs. training data. However, this does not ensure that we have a permanent, reliable solution to malaria detection with ML, as we have assumed the data is not corrupted and organized as the provided data. More specific limitations will follow in [Conclusion & Discussion](#section-five).

<a id="section-five"></a>
# 5. Overall Conclusion & Discussion:

Our project aimed to 1) build the best-performing algorithm to predict and classify the malaria parasite from thin blood smear images and 2) replicate the finding of Sakar (2019). We have used Basic Convolutional Network from scratch, Pretrained Convolutional Network (VGG-19), Pretrained Convolutional Network with Data Augmentation and Fine-Tuning, and Customized Convolutional Network, as well as two sub-models that focused on model improvement by adding several hyperparameters. Based on our confusion matrices with accuracy, precision, recall, and F1-score, we can conclude that **Model 3 - Pre-trained Convolutional Network Model with Data Augmentation and Fine-tuning** performs the best among all the models. Since we have split the data into training, validation, and test data, we have a high probability for the model to fit into new data successfully.

However, our project also has several limitations. First, our data might not be generalizable to all populations with malaria parasites because we trained and tested our model solely on thin blood smear images, not other images. Second, the images we used were clean and organized – that there was no noise in the image – meaning it might not be the best representation of the real world's data, which is more corrupted and unorganized.

During the project, our group noticed there might be some room for improvement on the model or the project as a whole. First, we can try different model configurations, for example, decreasing the learning rate, as this may allow the model to learn a more optimal set of weights. Second, we can use other datasets containing malaria-infected blood images to have more variations in the model. In addition, we can use images of other diseases to have the model differentiate malaria from other diseases. These will yield higher performance and better reflection of the real world.

<a id="section-six"></a>
# References

Feleke, D. G., Alemu, Y., & Yemanebirhane, N. (2021). Performance of rapid diagnostic tests, microscopy, loop-mediated isothermal amplification (LAMP) and PCR for malaria diagnosis in Ethiopia: a systematic review and meta-analysis. *Malaria Journal*, *20*(1), 1-11.

Polanitzer, R. (2022, February 5). Building a convolutional neural network in python; predict digits from gray-scale images of... Medium. Retrieved December 19, 2022, from https://medium.com/@polanitzer/building-a-convolutional-neural-network-in-python-predict-digits-from-gray-scale-images-of-550d79b358b 

Rajaraman, S., Antani, S. K., Poostchi, M., Silamut, K., Hossain, M. A., Maude, R. J., ... & Thoma, G. R. (2018). Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. *PeerJ*, *6*, e4568.

Sarkar, D. (2019). *Detecting malaria with Deep Learning*. Medium. Retrieved from https://towardsdatascience.com/detecting-malaria-with-deep-learning-9e45c1e34b60 