### Authors: Prof. Dr. Soumi Ray, Ravi Teja Kothuru and Abhay Srivastav

### Acknowledgements:
I would like to thank my team mates Prof. Dr. Soumi Ray and Abhay Srivastav for their guidance and support throughout this project.

**Title of the Project:** Comparative Analysis of Image-Based and Feature-Based Approaches for Pneumonia Detection in Chest X-rays

**Description of the Project:** This project focuses on detecting pneumonia from chest X-ray images using Advanced Machine Learning and Deep Learning techniques (Rajpurkar et al., 2017; Wang et al., 2017). By leveraging a comprehensive dataset, including annotated images of pneumonia and normal cases, we aim to develop and compare image-based and feature-based approaches. Our goal is to identify the most effective method for accurate and interpretable pneumonia detection, contributing to improved patient outcomes through early diagnosis and treatment. This model will classify patients based on their chest X-ray images as either having pneumonia (1) or not having pneumonia (0).

**Objectives of the Project:** 

- **Image Analysis:** Develop and evaluate deep learning models to classify chest X-rays directly. This approach leverages deep learning models, particularly Convolutional Neural Networks (CNNs), to perform end-to-end image classification. The models directly process raw chest X-ray images to classify them as normal or pneumonia.

- **Feature Analysis:** Extract meaningful features from the images and use them to train and evaluate traditional machine learning models. In this approach, we first extract features from the chest X-ray images. These features are then used as inputs for traditional machine learning algorithms. The process includes steps such as feature extraction, selection, and transformation, followed by the application of machine learning techniques like Support Vector Machines (SVM), Random Forests.

**Name of the Dataset:** The dataset used in this project is the Chest X-ray dataset considered from the Research paper named **Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification**.

**Description of the Dataset:** The Diabetes Health Indicators Dataset contains healthcare statistics and lifestyle survey information about people in general along with their diagnosis of diabetes. The 35 features consist of some demographics, lab test results, and answers to survey questions for each patient. The target variable for classification is whether a patient has diabetes, is pre-diabetic, or healthy.

**Dataset Source:** 

- https://data.mendeley.com/datasets/rscbjbr9sj/2

**Type of the Dataset:**

- X-ray Images

**Description of Dataset:** 
The considered dataset has the following information for better reference:
- Separate folders to train and validate/test the model.
- Enough number of Chest X-ray images to train the model to detect and diagnose Pneumonia.
- The target variable for classification is whether patient has pneumonia or not.

**Goal of the Project using this Dataset:**
The goal of this project is to conduct a comprehensive comparative analysis of image-based and feature-based approaches for pneumonia detection using chest X-ray images. By evaluating the performance, robustness, and interpretability of deep learning and traditional machine learning models, we aim to identify the most effective method for accurately classifying chest X-rays as normal or pneumonia. This comparison will provide valuable insights into the strengths and limitations of each approach, ultimately contributing to improved detection and diagnosis of pneumonia, which can enhance patient outcomes and survival rates.

**Why did we choose this dataset?**
We selected this dataset based on several factors. For more detailed information, please refer to the following:
- The dataset is extensive, providing a large number of images suitable for evaluating and training deep learning models.
- It aligns well with the project's objectives by offering a challenging and realistic scenario for developing an image classification model using deep learning, specifically for Chest X-ray images.
- The dataset is annotated with images of two different diseases, enabling the development of a binary-class classification model.
- It is publicly available, facilitating easy access for research and development purposes.

**Size of dataset:**
- Total images size = 1.27 GB
- Dataset has 2 folders:
  -  **Train:**
    -  Normal (without Pneumonia) = 1349 images
    -  Pneumonia = 3884 images
  -  **Test:**
    -  Normal (without Pneumonia) = 234 images
    -  Pneumonia = 390 images
    
**Expected Behaviors and Problem Handling:**
- Classify Chest X-ray images with high accuracy.
- Handle variations in image quality, resolution, and orientation.
- Be robust to noise and artifacts in the images.
- Provide interpretable results.

**Issues to focus on:**
- Improving model interpretability and explainability.
- Optimizing model performance on a held-out test set.
- Following AI Ethics and Data Safety practices.

# Import all the required files and libraries

In [1]:
import os
import ssl

# Disable SSL certificate verification
ssl._create_default_https_context = ssl._create_unverified_context

# Automatically reload imported modules when their source code changes
%load_ext autoreload
%reload_ext autoreload
%autoreload 2

# Import python files from local to use the corresponding function
from cxr_image_features_extraction import CxrImageFeatureExtraction

# Perform Chest X-ray Images Feature Extraction

## Create an object of the Image Feature Extraction class

In [2]:
image_feature_extraction = CxrImageFeatureExtraction()

## Fetch the absolute paths of the normalized image dataset

In [3]:
# Define the path to the dataset
dataset_path = image_feature_extraction.get_base_path_of_dataset() + "_nrm"
print(f"Normalized Dataset Path = {dataset_path}")

# Fetch train, test, NORMAL and PNEUMONIA folder names
train_folder_name = str(image_feature_extraction.train_test_image_dirs[0])
test_folder_name = str(image_feature_extraction.train_test_image_dirs[1])

normal_img_folder_name = str(image_feature_extraction.normal_pneumonia_image_dirs[0])
pneumonia_img_folder_name = str(image_feature_extraction.normal_pneumonia_image_dirs[1])

# Define the paths to the train and test datasets
# Train
train_normal = os.path.join(dataset_path, train_folder_name + "_nrm", normal_img_folder_name + "_nrm")
train_pneumonia = os.path.join(dataset_path, train_folder_name + "_nrm", pneumonia_img_folder_name + "_nrm")

# Test
test_normal = os.path.join(dataset_path, test_folder_name + "_nrm", normal_img_folder_name + "_nrm")
test_pneumonia = os.path.join(dataset_path, test_folder_name + "_nrm", pneumonia_img_folder_name + "_nrm")

# Print the paths to the train and test datasets
print("\nNormalized Train Images")
print("************************")
print(f"NORMAL = {train_normal}")
print(f"\nPNEUMONIA = {train_pneumonia}")

print("\n\nNormalized Test Images")
print("***************************")
print(f"NORMAL = {test_normal}")
print(f"\nPNEUMONIA = {test_pneumonia}")

Normalized Dataset Path = /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm

Normalized Train Images
************************
NORMAL = /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/train_nrm/NORMAL_nrm

PNEUMONIA = /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/train_nrm/PNEUMONIA_nrm


Normalized Test Images
***************************
NORMAL = /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/Univers

## Convert all Normalized image folder absolute paths to a list

In [4]:
image_normalized_folders = [
    train_normal, train_pneumonia,
    test_normal, test_pneumonia
]

image_normalized_folders

['/Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/train_nrm/NORMAL_nrm',
 '/Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/train_nrm/PNEUMONIA_nrm',
 '/Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/test_nrm/NORMAL_nrm',
 '/Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/

### Extract Second Order GLRLM Features of All Images and Write Into the Existing Excel File

1. **Short Run Emphasis (SRE)**

   **Definition**: Measures the distribution of short runs in the image. 

   **Formula**:
   $$
   \text{SRE} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j) \cdot j}
   $$

2. **Long Run Emphasis (LRE)**

   **Definition**: Measures the distribution of long runs in the image.

   **Formula**:
   $$
   \text{LRE} = \frac{\sum_{i=1}^{G} \sum_{j=R-1}^{R} p(i,j)}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
   $$

3. **Gray Level Non-Uniformity (GLN)**

   **Definition**: Measures the non-uniformity of gray levels in the image.

   **Formula**:
   $$
   \text{GLN} = \frac{\sum_{i=1}^{G} \left(\sum_{j=1}^{R} p(i,j)\right)^2}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
   $$

4. **Run Length Non-Uniformity (RLN)**

   **Definition**: Measures the non-uniformity of run lengths in the image.

   **Formula**:
   $$
   \text{RLN} = \frac{\sum_{j=1}^{R} \left(\sum_{i=1}^{G} p(i,j)\right)^2}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
   $$

5. **Run Percentage (RP)**

   **Definition**: Measures the percentage of runs in the image.

   **Formula**:
   $$
   \text{RP} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}{N}
   $$
   where \( N \) is the total number of pixels in the image.

6. **Low Gray Level Run Emphasis (LGLRE)**

   **Definition**: Measures the distribution of low gray level runs in the image.

   **Formula**:
   $$
   \text{LGLRE} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{R} \frac{p(i,j)}{i^2}}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
   $$

7. **High Gray Level Run Emphasis (HGLRE)**

   **Definition**: Measures the distribution of high gray level runs in the image.

   **Formula**:
   $$
   \text{HGLRE} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j) \cdot i^2}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
   $$

8. **Short Run Low Gray Level Emphasis (SRLGLE)**

   **Definition**: Measures the distribution of short runs with low gray levels in the image.

   **Formula**:
   $$
   \text{SRLGLE} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{2} \frac{p(i,j)}{i^2}}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
   $$

9. **Short Run High Gray Level Emphasis (SRHGLE)**

   **Definition**: Measures the distribution of short runs with high gray levels in the image.

   **Formula**:
   $$
   \text{SRHGLE} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{2} p(i,j) \cdot i^2}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
   $$

10. **Long Run Low Gray Level Emphasis (LRLGLE)**

    **Definition**: Measures the distribution of long runs with low gray levels in the image.

    **Formula**:
    $$
    \text{LRLGLE} = \frac{\sum_{i=1}^{G} \sum_{j=R-1}^{R} \frac{p(i,j)}{i^2}}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
    $$

11. **Long Run High Gray Level Emphasis (LRHGLE)**

    **Definition**: Measures the distribution of long runs with high gray levels in the image.

    **Formula**:
    $$
    \text{LRHGLE} = \frac{\sum_{i=1}^{G} \sum_{j=R-1}^{R} p(i,j) \cdot i^2}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
    $$

12. **Run Variance (RV)**

    **Definition**: Measures the variance of run lengths in the image.

    **Formula**:
    $$
    \text{RV} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j) \cdot (j - \mu)^2}{\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)}
    $$
    where \( \mu \) is the mean run length.

13. **Run Entropy (RE)**

    **Definition**: Measures the entropy of run lengths in the image.

    **Formula**:
    $$
    \text{RE} = -\sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j) \cdot \log_2 p(i,j)
    $$

14. **Difference Average (DA)**

    **Definition**: Measures the average difference between gray levels in the image.

    **Formula**:
    $$
    \text{DA} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{G} p(i,j) \cdot |i-j|}{\sum_{i=1}^{G} \sum_{j=1}^{G} p(i,j)}
    $$

15. **Difference Variance (DV)**

    **Definition**: Measures the variance of differences between gray levels in the image.

    **Formula**:
    $$
    \text{DV} = \frac{\sum_{i=1}^{G} \sum_{j=1}^{G} p(i,j) \cdot (|i-j| - \mu)^2}{\sum_{i=1}^{G} \sum_{j=1}^{G} p(i,j)}
    $$
    where \( \mu \) is the mean difference.

16. **Difference Entropy (DE)**

    **Definition**: Measures the entropy of differences between gray levels in the image.

    **Formula**:
    $$
    \text{DE} = -\sum_{i=1}^{G} \sum_{j=1}^{G} p(i,j) \cdot \log_2 p(i,j)
    $$

17. **Number of Runs**

    **Definition**: The total number of runs in the image.

    **Formula**:
    $$
    \text{Num_of_runs} = \sum_{i=1}^{G} \sum_{j=1}^{R} p(i,j)
    $$

18. **Number of Pixels**

    **Definition**: The total number of pixels in the image.

    **Formula**:
    $$
    \text{Num_of_pixels} = \text{Image_Height} \times \text{Image_Width}
    $$

In [5]:
image_feature_extraction.update_second_order_glrlm_features_to_excel_file(folders=image_normalized_folders)

Extracted GLRLM features will be saved to - /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/image_information/chest_xray_images_second_order_features_glrlm.xlsx


Extracting second-order features GLRLM from: /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/train_nrm/NORMAL_nrm


Folder: train_nrm/NORMAL_nrm: 100%|██████████████████████████████████████████████████████| 1349/1349 [06:31<00:00,  3.44it/s]


Extracting second-order features GLRLM from: /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/train_nrm/PNEUMONIA_nrm


Folder: train_nrm/PNEUMONIA_nrm: 100%|███████████████████████████████████████████████████| 3883/3883 [07:46<00:00,  8.32it/s]


Extracting second-order features GLRLM from: /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/test_nrm/NORMAL_nrm


Folder: test_nrm/NORMAL_nrm: 100%|█████████████████████████████████████████████████████████| 234/234 [01:07<00:00,  3.49it/s]


Extracting second-order features GLRLM from: /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/dataset/chest_xray_nrm/test_nrm/PNEUMONIA_nrm


Folder: test_nrm/PNEUMONIA_nrm: 100%|██████████████████████████████████████████████████████| 390/390 [00:38<00:00, 10.05it/s]




All first-order features are extracted to the Excel file: /Users/ravkothu/Documents/Personal_items_at_Oracle/Master_Degree/University_of_San_Diego/Online_Masters/MS_in_Applied_AI/Subjects_and_Resources/AAI-501_Introduction_to_AI/AAI-501_Final_Team_Project/pneumonia_detection/image_information/chest_xray_images_second_order_features_glrlm.xlsx
Please check the Excel file for further analysis and interpretation
