### Authors: Prof. Dr. Soumi Ray, Ravi Teja Kothuru and Abhay Srivastav

### Acknowledgements:
I would like to thank my team mates Prof. Dr. Soumi Ray and Abhay Srivastav for their guidance and support throughout this project.

**Title of the Project:** Comparative Analysis of Image-Based and Feature-Based Approaches for Pneumonia Detection in Chest X-rays

**Description of the Project:** This project focuses on detecting pneumonia from chest X-ray images using Advanced Machine Learning and Deep Learning techniques (Rajpurkar et al., 2017; Wang et al., 2017). By leveraging a comprehensive dataset, including annotated images of pneumonia and normal cases, we aim to develop and compare image-based and feature-based approaches. Our goal is to identify the most effective method for accurate and interpretable pneumonia detection, contributing to improved patient outcomes through early diagnosis and treatment. This model will classify patients based on their chest X-ray images as either having pneumonia (1) or not having pneumonia (0).

**Objectives of the Project:** 

- **Image Analysis:** Develop and evaluate deep learning models to classify chest X-rays directly. This approach leverages deep learning models, particularly Convolutional Neural Networks (CNNs), to perform end-to-end image classification. The models directly process raw chest X-ray images to classify them as normal or pneumonia.

- **Feature Analysis:** Extract meaningful features from the images and use them to train and evaluate traditional machine learning models. In this approach, we first extract features from the chest X-ray images. These features are then used as inputs for traditional machine learning algorithms. The process includes steps such as feature extraction, selection, and transformation, followed by the application of machine learning techniques like Support Vector Machines (SVM), Random Forests.

**Name of the Dataset:** The dataset used in this project is the Chest X-ray dataset considered from the Research paper named **Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification**.

**Description of the Dataset:** The Diabetes Health Indicators Dataset contains healthcare statistics and lifestyle survey information about people in general along with their diagnosis of diabetes. The 35 features consist of some demographics, lab test results, and answers to survey questions for each patient. The target variable for classification is whether a patient has diabetes, is pre-diabetic, or healthy.

**Dataset Source:** 

- https://data.mendeley.com/datasets/rscbjbr9sj/2

**Type of the Dataset:**

- X-ray Images

**Description of Dataset:** 
The considered dataset has the following information for better reference:
- Separate folders to train and validate/test the model.
- Enough number of Chest X-ray images to train the model to detect and diagnose Pneumonia.
- The target variable for classification is whether patient has pneumonia or not.

**Goal of the Project using this Dataset:**
The goal of this project is to conduct a comprehensive comparative analysis of image-based and feature-based approaches for pneumonia detection using chest X-ray images. By evaluating the performance, robustness, and interpretability of deep learning and traditional machine learning models, we aim to identify the most effective method for accurately classifying chest X-rays as normal or pneumonia. This comparison will provide valuable insights into the strengths and limitations of each approach, ultimately contributing to improved detection and diagnosis of pneumonia, which can enhance patient outcomes and survival rates.

**Why did we choose this dataset?**
We selected this dataset based on several factors. For more detailed information, please refer to the following:
- The dataset is extensive, providing a large number of images suitable for evaluating and training deep learning models.
- It aligns well with the project's objectives by offering a challenging and realistic scenario for developing an image classification model using deep learning, specifically for Chest X-ray images.
- The dataset is annotated with images of two different diseases, enabling the development of a binary-class classification model.
- It is publicly available, facilitating easy access for research and development purposes.

**Size of dataset:**
- Total images size = 1.27 GB
- Dataset has 2 folders:
  -  **Train:**
    -  Normal (without Pneumonia) = 1349 images
    -  Pneumonia = 3884 images
  -  **Test:**
    -  Normal (without Pneumonia) = 234 images
    -  Pneumonia = 390 images
    
**Expected Behaviors and Problem Handling:**
- Classify Chest X-ray images with high accuracy.
- Handle variations in image quality, resolution, and orientation.
- Be robust to noise and artifacts in the images.
- Provide interpretable results.

**Issues to focus on:**
- Improving model interpretability and explainability.
- Optimizing model performance on a held-out test set.
- Following AI Ethics and Data Safety practices.

# Import all the required files and libraries

In [21]:
import os
import ssl

# Disable SSL certificate verification
ssl._create_default_https_context = ssl._create_unverified_context

# Automatically reload imported modules when their source code changes
%load_ext autoreload
%reload_ext autoreload
%autoreload 2

# Import python files from local to use the corresponding function
from cxr_image_features_extraction import CxrImageFeatureExtraction

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Perform Chest X-ray Images Feature Extraction

## Create an object of the Image Feature Extraction class

In [22]:
image_feature_extraction = CxrImageFeatureExtraction()

## Fetch the absolute path of the image dataset

In [23]:
excel_file_name = image_feature_extraction.fetch_images_features_excel_file_path()
print(f"Absolute path of the Excel file where the features will be updated - {excel_file_name}")

Absolute path of the Excel file where the features will be updated - /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/image_information/chest_xray_images_features.xlsx


In [24]:
# Define the path to the dataset
dataset_path = image_feature_extraction.get_base_path_of_dataset() + "_nrm"
print(f"Normalized Dataset Path = {dataset_path}")

# Fetch train, test, NORMAL and PNEUMONIA folder names
train_folder_name = str(image_feature_extraction.train_test_image_dirs[0])
test_folder_name = str(image_feature_extraction.train_test_image_dirs[1])

normal_img_folder_name = str(image_feature_extraction.normal_pneumonia_image_dirs[0])
pneumonia_img_folder_name = str(image_feature_extraction.normal_pneumonia_image_dirs[1])

# Define the paths to the train and test datasets
# Train
train_normal = os.path.join(dataset_path, train_folder_name + "_nrm", normal_img_folder_name + "_nrm")
train_pneumonia = os.path.join(dataset_path, train_folder_name + "_nrm", pneumonia_img_folder_name + "_nrm")

# Test
test_normal = os.path.join(dataset_path, test_folder_name + "_nrm", normal_img_folder_name + "_nrm")
test_pneumonia = os.path.join(dataset_path, test_folder_name + "_nrm", pneumonia_img_folder_name + "_nrm")

# Print the paths to the train and test datasets
print("\nNormalized Train Images")
print("************************")
print(f"NORMAL = {train_normal}")
print(f"\nPNEUMONIA = {train_pneumonia}")

print("\n\nNormalized Test Images")
print("***************************")
print(f"NORMAL = {test_normal}")
print(f"\nPNEUMONIA = {test_pneumonia}")

Normalized Dataset Path = /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm

Normalized Train Images
************************
NORMAL = /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm/train_nrm/NORMAL_nrm

PNEUMONIA = /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm/train_nrm/PNEUMONIA_nrm


Normalized Test Images
***************************
NORMAL = /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm/test_nrm/NORMAL_nrm

PNEUMONIA = /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm/test_nrm/PNEUMONIA_nrm


In [25]:
image_feature_extraction.update_features_to_excel_file(
    excel_file_name=excel_file_name, train_normal=train_normal,
	train_pneumonia=train_pneumonia, test_normal=test_normal, 
    test_pneumonia=test_pneumonia)

Extracting all image features from : /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm/train_nrm/NORMAL_nrm


Train Normal: 100%|██████████████████████████████████████████████████████████████| 1349/1349 [3:44:04<00:00,  9.97s/it]


Extracted to: /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/image_information/chest_xray_images_features.xlsx
Sheet Name: Train Normal Features


Extracting all image features from : /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm/train_nrm/PNEUMONIA_nrm


Train Pneumonia: 100%|███████████████████████████████████████████████████████████| 3883/3883 [5:00:59<00:00,  4.65s/it]


Extracted to: /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/image_information/chest_xray_images_features.xlsx
Sheet Name: Train Pneumonia Features


Extracting all image features from : /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm/test_nrm/NORMAL_nrm


Test Normal: 100%|███████████████████████████████████████████████████████████████████| 234/234 [39:44<00:00, 10.19s/it]


Extracted to: /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/image_information/chest_xray_images_features.xlsx
Sheet Name: Test Normal Features


Extracting all image features from : /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/dataset/chest_xray_nrm/test_nrm/PNEUMONIA_nrm


Test Pneumonia: 100%|████████████████████████████████████████████████████████████████| 390/390 [23:04<00:00,  3.55s/it]


Extracted to: /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/image_information/chest_xray_images_features.xlsx
Sheet Name: Test Pneumonia Features


All the First and Second Order features are extracted to the Excel file: /Users/raviteja/Documents/Teja_Career/Master_Degree/USD/MS_AAI/AAI-501/Final_Project/pneumonia-detection-in-chest-X-rays/image_information/chest_xray_images_features.xlsx
Please check the Excel file for further analysis and interpretation
