Skip to content

lcswsher/Group_4_Project_4

Repository files navigation

Convolutional Neural Networks:
Machine Learning and Pneumonia

In this project we have developed, trained and evaluated a convolutional neural network model (VGG16) which can be used by physicians for pneumonia diagnostics based on X_ray image processing.

Project motivation

According to the American Thoracic Society and the American Lung Association pneumonia is the world’s leading cause of death among children under 5 years of age.

More than 150,000 people are estimated to die from lung cancer each year with infections, including pneumonia, being the second most common cause of death in people with lung cancer.

These days, with the onset of COVID, the health system is stretched to its limit. Availability of the physicians and specifically pulmonologists is scarce, with many doctors working overtime for 2nd year running. Which is why developing a tool, capable of speeding up X-ray image processing and eventual pneumonia diagnostics is so vital.

Dataset

Data for this study was sourced from Kaggle.

The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou, China.
All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert. Figure 1 contains some of the sample X-ray images from the dataset.

Examples of Chest X-Rays in Patients with and without Pneumonia

Figure 1. Examples of Chest X-Rays in Patients with Pneumonia

Model Construction, Testing & Optimization

This CNN model is from the Keras library and was trained and fitted using publicly available x-ray images provided in Kaggle: 5,216 chest x-ray images (1,341 negative, 3,875 positive). In addition, the model was trained at various epoch stages while monitoring for the most optimal accuracy scores for both precision and recall. Final notebook (developed using Google Collab) can be found here, with the corresponding model saved into an chest_xray_30_Epochs.h5 file.

Tensorflow Keras
  • Model - VGG16 as a basis for CNN model
  • Image size = 224, 224
Compiling Model
  • Loss = categorical crossentropy
  • Optimizer = adam
Creating Generator
  • Keras ImageDataGenerator
  • Scaled (1./255)
    (to scale RGB pixel
    values from range [0,255] -> [0,1])
→ ...
... → Building and Fitting Models
  • 10, 20, 30, 40, 100 Epochs
  • Precision and Recall scores
  • Validation/Loss Accuracy
  • Train/Loss Accuracy
  • Confusion Matrix
Final Training Model (hdf5 file)
  • 1-30 Epochs
  • 62% Precision
  • 67% Recall
  • 92% Accuracy
Figure 2. Data Model Implementation

Model Accuracy

As with any CNN model, the above model predictions are not always accurate. Confusion matrix containing corresponding TP, TN, FP & FN counts is shown in Figure 3. Model training accuracy is 92%, with less favorable testing accuracy scores of 62% precision and 67% recall.

During the model testing phase (624 test images), the most optimal CNN model was generated while using ~1-30 epochs cycles. Model accuracy declines significantly after 50 plus epoch cycles due to overfitting

Confusion matrix and accuracy calculation

Confusion matrix and accuracy calculation

Figure 3. Confusion matrix & accuracy calculation

Data Model Optimization:

  • Training the model to be more accurate at predictions required several steps

  • Training for more epochs with different iterations of 1, 10, 20, 100 etc.

    • With more epochs used, validation loss and training accuracy increased
    • Validation accuracy remained at around ~90%
  • Risk of overfitting to training data remained and increased with the increase of epochs over 30 epochs

  • Created a confusion matrix to determine true positive/negative and false positive/negative values

    Model loss stats

    Figure 4. Change in time (epochs) of the training vs validation loss
    of the developed model

    Model loss stats

    Figure 5. Change in time (epochs) of the training vs validation accuracy
    of the developed model

Frontend & deployment

Frontend was implemented via Flask, to enable X-ray image upload by users.

To run the tool user needs to run python main.py from the project folder after cloning and then proceed to the local host listed within the git Bash stdo window.

img

img

Contributors

The following people have contributed to the work presented in this repository:

Clay Swisher

Iryna Marchiano

Simon Castellanos

Elijah Abuel

About

Group_4_Project_4

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published