Convolutional Neural Networks:
Machine Learning and Pneumonia

In this project we have developed, trained and evaluated a convolutional neural network model (VGG16) which can be used by physicians for pneumonia diagnostics based on X_ray image processing.

Project motivation

According to the American Thoracic Society and the American Lung Association pneumonia is the world’s leading cause of death among children under 5 years of age.

More than 150,000 people are estimated to die from lung cancer each year with infections, including pneumonia, being the second most common cause of death in people with lung cancer.

These days, with the onset of COVID, the health system is stretched to its limit. Availability of the physicians and specifically pulmonologists is scarce, with many doctors working overtime for 2nd year running. Which is why developing a tool, capable of speeding up X-ray image processing and eventual pneumonia diagnostics is so vital.

Dataset

Data for this study was sourced from Kaggle.

The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou, China.
All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert. Figure 1 contains some of the sample X-ray images from the dataset.

Figure 1. Examples of Chest X-Rays in Patients with Pneumonia

Model Construction, Testing & Optimization

This CNN model is from the Keras library and was trained and fitted using publicly available x-ray images provided in Kaggle: 5,216 chest x-ray images (1,341 negative, 3,875 positive). In addition, the model was trained at various epoch stages while monitoring for the most optimal accuracy scores for both precision and recall. Final notebook (developed using Google Collab) can be found here, with the corresponding model saved into an chest_xray_30_Epochs.h5 file.

Tensorflow Keras

Model - VGG16 as a basis for CNN model
Image size = 224, 224

→

Compiling Model

Loss = categorical crossentropy
Optimizer = adam

→

Creating Generator

Keras ImageDataGenerator
Scaled (1./255)
(to scale RGB pixel
values from range [0,255] -> [0,1])

→ ...

... →

Building and Fitting Models

10, 20, 30, 40, 100 Epochs
Precision and Recall scores
Validation/Loss Accuracy
Train/Loss Accuracy
Confusion Matrix

→

Final Training Model (hdf5 file)

1-30 Epochs
62% Precision
67% Recall
92% Accuracy

Figure 2. Data Model Implementation

Model Accuracy

As with any CNN model, the above model predictions are not always accurate. Confusion matrix containing corresponding TP, TN, FP & FN counts is shown in Figure 3. Model training accuracy is 92%, with less favorable testing accuracy scores of 62% precision and 67% recall.

During the model testing phase (624 test images), the most optimal CNN model was generated while using ~1-30 epochs cycles. Model accuracy declines significantly after 50 plus epoch cycles due to overfitting

Figure 3. Confusion matrix & accuracy calculation

Data Model Optimization:

Training the model to be more accurate at predictions required several steps
Training for more epochs with different iterations of 1, 10, 20, 100 etc.
- With more epochs used, validation loss and training accuracy increased
- Validation accuracy remained at around ~90%
Risk of overfitting to training data remained and increased with the increase of epochs over 30 epochs
Created a confusion matrix to determine true positive/negative and false positive/negative values

Figure 4. Change in time (epochs) of the training vs validation loss
of the developed model
Figure 5. Change in time (epochs) of the training vs validation accuracy
of the developed model

Frontend & deployment

Frontend was implemented via Flask, to enable X-ray image upload by users.

To run the tool user needs to run python main.py from the project folder after cloning and then proceed to the local host listed within the git Bash stdo window.

Contributors

The following people have contributed to the work presented in this repository:

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
input		input
static		static
templates		templates
testing		testing
.gitignore		.gitignore
CNN_Group4_Project4.ipynb		CNN_Group4_Project4.ipynb
CNN_Group4_Project4__100_Epochs.ipynb		CNN_Group4_Project4__100_Epochs.ipynb
CNN_Group4_Project4__30_Epochs.ipynb		CNN_Group4_Project4__30_Epochs.ipynb
CNN_Group4_Project4__40_Epochs.ipynb		CNN_Group4_Project4__40_Epochs.ipynb
CNN_Group4_Project4_test_file_3_epochs.ipynb		CNN_Group4_Project4_test_file_3_epochs.ipynb
CNN_Group4_Project4_ver2.ipynb		CNN_Group4_Project4_ver2.ipynb
CNN_Pneumonia_Diagnostics_Slides.pdf		CNN_Pneumonia_Diagnostics_Slides.pdf
README.md		README.md
app.py		app.py
main.py		main.py

lcswsher/Group_4_Project_4

Folders and files

Latest commit

History

Repository files navigation

Convolutional Neural Networks:Machine Learning and Pneumonia

Project motivation

Dataset

Model Construction, Testing & Optimization

Model Accuracy

Data Model Optimization:

Frontend & deployment

Contributors

About

Resources

Stars

Watchers

Forks

Languages

Convolutional Neural Networks:
Machine Learning and Pneumonia