# Identifying Pneumonia with CNNs

### Deep Learning-Based Pneumonia Diagnosis from Chest X-ray Images | Transfer Learning & Keras Tuner

This project aims to develop an accurate and efficient machine learning model for the early detection and diagnosis of pneumonia using deep learning techniques. It's prompted by Kaggle's Chest-X-Ray Pneumonia Dataset.

**From Kaggle:**

*"The dataset is meticulously organized into three folders: train, test, and validation. Each folder contains subfolders for each image category, namely Pneumonia and Normal. The dataset comprises 5,863 X-Ray images (JPEG format) across two categories (Pneumonia/Normal).
The chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients aged one to five years old from the Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of the patients' routine clinical care."*


## Filepaths

In [1]:
import os
import cv2 

current_path = os.getcwd()

testing_path = f"{current_path}/test"
training_path = f"{current_path}/train"
validation_path = f"{current_path}/val"

# print("File Paths:",
#       f"Testing: {testing_path}",
#       f"Training: {training_path}"
# ,
#       f"Validation: {validation_path}",
#       sep="\n"
# )

## Libraries

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

### Layers

In [3]:
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
from keras import Sequential
from keras.layers import Activation, BatchNormalization, Conv2D, Dense, Dropout, Flatten, MaxPooling2D, GlobalAveragePooling2D
from keras.applications import ResNet50

### Optimizers

In [4]:
from tensorflow.keras.optimizers import Adam, SGD,RMSprop, Adadelta, Nadam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping

### Metrics

In [5]:
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split

## Loading Data into Sets

In [6]:
# Collecting current image sizes:
files = (os.listdir(f"{training_path}/normal"))

image = cv2.imread(f"{training_path}/normal/{files[0]}")
width = image.shape[1]
height = image.shape[0]

print("The images are {}x{}".format(width, height)) # Too big, recommended reshaping

The images are 2090x1858


In [7]:
IMAGE_SIZE = (255, 255)
BATCH_SIZE = 32
SEED_TRAIN = 7 # Enables reproducibility of experiments

In [8]:
# Using Keras to collect the data from the directories. It automatically recognizes the classification "normal", "pneumonia"

training_dataset = keras.preprocessing.image_dataset_from_directory(
    directory=training_path,
    image_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    seed=SEED_TRAIN,
    color_mode='rgb', # (255, 255, 255) x 3-Dimensional Output
    label_mode='categorical',
    shuffle=True
)

validation_dataset = keras.preprocessing.image_dataset_from_directory(
    directory=validation_path,
    image_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    seed=SEED_TRAIN,
    color_mode='rgb',
    label_mode='categorical',
    shuffle=True
)

testing_dataset = keras.preprocessing.image_dataset_from_directory(
    directory=testing_path,
    image_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    seed=SEED_TRAIN,
    color_mode='rgb',
    label_mode='categorical',
    shuffle=False
)

Found 5216 files belonging to 2 classes.
Found 16 files belonging to 2 classes.
Found 624 files belonging to 2 classes.


## Pretrained ResNet Model with Keras Tuner - Transfered Learning & HP Optimization

[Keras Tuner Documentation](https://keras.io/keras_tuner/) -> A Tuner Searches for the best architecture (hyperparameters) for the model, including number of perceptrons, the best optimizer and the loss function

In [9]:
import keras_tuner
from keras.models import Sequential, Model

### Step 1: Define Tuner Function with all Hyperparameters

In [10]:
def build_model(hp):
    """Params: hp, hyperparameters"""
    
    LEARNING_RATES = [1e-2, 1e-3, 1e-4] # Optimizers learning rate
    
    # Initialize pretrained model to build on top of
    pretrained_model = ResNet50(
        include_top=False,
        weights='imagenet',
        input_shape=(IMAGE_SIZE[0], IMAGE_SIZE[1], 3) # rgb dimensions
    )
    
    # Freeze the layers of the pre-trained model! It doesn't need fine-tuning
    for layer in pretrained_model.layers:
        layer.trainable = False
        
    # Flattening the Output of ResNet50 to add to next layer:
    output = pretrained_model.output
    output = Dropout(0.2)(output)
    output = Flatten()(output)
    
    preds = Dense(2, activation='softmax')(output) # 2: normal, pneumonia
    
    # Stacking the existing base_model on top of the new prediction layer 
    fine_tuning_candidate = Model(inputs=pretrained_model.input, outputs=preds)
    
    # Optimizers. The hp parameter sets the iteration's compiler during the tuner search
    adam = Adam(hp.Choice('learning_rate',values=LEARNING_RATES))
    sgd = SGD(hp.Choice('learning_rate',values=LEARNING_RATES))
    rmsprop = RMSprop(hp.Choice('learning_rate',values=LEARNING_RATES))
    nadam = Nadam(hp.Choice('learning_rate',values=LEARNING_RATES))
    adadelta = Adadelta(hp.Choice('learning_rate',values=LEARNING_RATES))
    
    optimizer = hp.Choice('optimizer', values = ['adam','sgd','rmsprop','nadam','adadelta'])
    
    # Compiling our candidate
    fine_tuning_candidate.compile(optimizer=optimizer,
                           loss = 'categorical_crossentropy',
                           metrics = ['accuracy'])
    
    return fine_tuning_candidate

### Step 2: Perform Tuner Search by defining metric to be optimized

In [13]:
tuner = keras_tuner.RandomSearch(
    build_model,  # Pass the build_model function that defines the model architecture
    keras_tuner.Objective("accuracy", direction="max"),  # Optimization objective 
    max_trials=5,
    executions_per_trial=1,
)

tuner.search(
    # Finding the best set of hyperparameters to achieve optimal performance
    training_dataset, 
    epochs = 6, 
    validation_data=validation_dataset,
    steps_per_epoch = len(training_dataset)//32 , # Goes through all of the data
    validation_steps=len(validation_dataset),
)

Trial 5 Complete [00h 01m 22s]
accuracy: 0.956250011920929

Best accuracy So Far: 0.96875
Total elapsed time: 00h 06m 39s
INFO:tensorflow:Oracle triggered exit


### Step 3: Use the best parameters as the final model

In [14]:
# Best Model
print(tuner.get_best_hyperparameters()[0].values)

# Fitting!
model = tuner.get_best_models(num_models=1)[0]
model.fit(training_dataset, 
    epochs = 20, 
    validation_data=validation_dataset,
    steps_per_epoch = len(training_dataset),
    validation_steps=len(validation_dataset),
)

{'learning_rate': 0.0001, 'optimizer': 'adam'}
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1af92209250>

## Saving the Model

In [15]:
cwd = os.getcwd()
path = f"{cwd}/pneumonia_detector"

# Create the directory if it doesn't exist
if not os.path.exists(path):
    os.makedirs(path)
    
model.save(path) # Save




INFO:tensorflow:Assets written to: C:\Users\gabri\OneDrive\Desktop\Coding\A.I. and Machine Learning\pneumonia-chest-xray/pneumonia_detector\assets


INFO:tensorflow:Assets written to: C:\Users\gabri\OneDrive\Desktop\Coding\A.I. and Machine Learning\pneumonia-chest-xray/pneumonia_detector\assets


## Predicting and Evaluating

In [16]:
# from keras.models import load_model

# Loading the model
# model = load_model(path)


# Making a prediction
predictions = model.predict(testing_dataset)
test_predicted_labels = np.argmax(predictions, axis=1)

# Evaluating predicted vs. true labels with one_hot_encoding
test_true_labels = []
for _, labels in testing_dataset:
    test_true_labels.extend(labels.numpy())  # Convert the labels tensor to a numpy array and add it to the list

test_true_labels = np.array(test_true_labels)  # Convert to NumPy array if not already

def one_hot_to_category(one_hot_encoded, categories):
    index = np.argmax(one_hot_encoded, axis=1)
    return [categories[i] for i in index]

test_true_labels = one_hot_to_category(test_true_labels,list(range(len(['normal', 'pneumonia']))))


report = classification_report(test_true_labels, test_predicted_labels, target_names=['normal', 'pneumonia'], digits=4)
print(report)

              precision    recall  f1-score   support

      normal     0.9828    0.4872    0.6514       234
   pneumonia     0.7638    0.9949    0.8641       390

    accuracy                         0.8045       624
   macro avg     0.8733    0.7410    0.7578       624
weighted avg     0.8459    0.8045    0.7844       624

