# Pneumonia Detection using X-Ray Images

This project aims to practice using CNN to process images and detect pneumonia based on the X-Ray images. Given the current COVID-19 pandemic, this project is both meaningful and interesting. Along with physical examination, imaging diagnosis plays a central role in the detection of pneumonia. In the chest X-Ray images, opacity areas are often correlated to pneumonia affected regions. However, the identification of opacity areas in chest X-Ray images is sometimes challenging. Machine learning and artificial intelligence can be used to detect pneumonia based on chest X-Ray images.

## 1. The dataset

The dataset for this project is an adapted version of dataset submitted by Paul Mooney at [Kaggle](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia), which contains a more balanced train-validation-test split of the data. In total, there are 5856 observations in the dataset that is split into 4192 training examples (1082 normal and 3110 opacity), 1040 validation examples (267 normal and 773 opacity), and 624 testing examples (234 normal and 390 opacity).

All the chest X-ray images were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients routine clinical care.

## 2. Setups

### 2.1 Import libraries

Along with traditional machine learning libraries, Keras image preprocessing ([ImageDataGenerator](https://keras.io/api/preprocessing/image/#imagedatagenerator-class)) and deep learning objects (Sequential, Conv2D, MaxPooling2D, Flatten, Dense) are used in this project.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

from sklearn.metrics import confusion_matrix

from keras.preprocessing.image import ImageDataGenerator, array_to_img
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

Using TensorFlow backend.


### 2.2 Some hyperparameters

- 'hyper_dimension': target image width and length in pixels considered when original images need to be rescaled for processing;
- 'hyper_epochs': number of epochs (leaning iterations through which the whole dataset is exposed to the machine for weight updates);
- 'hyper_batch_size': size of image batches;
- 'hyper_feature_maps': reference number of feature maps generated by convolutional layers;
- 'hyper_channels' and 'hyper_mode': number of channels utilized in the learning process. For colored RGB images, hyper_channels = 3 and hyper_mode = 'rgb', yet for grayscale images hyper_channels = 1 and hyper_mode = 'grayscale'.

In [2]:
hyper_dimension = 500
hyper_epochs = 100
hyper_batch_size = 16
hyper_feature_maps = 32
hyper_channels = 1
hyper_mode = 'grayscale'

## 3. Deep learning- Convolutional Neural Network (CNN)

### 3.1 Create and compile the CNN

In [3]:
# Initializing the CNN
classifier = Sequential()

# Convolution & pooling - First convolution layer
classifier.add(Conv2D(hyper_feature_maps, (3, 3),
                      input_shape = (hyper_dimension,
                                     hyper_dimension,
                                     hyper_channels),
                      activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Convolution & pooling - Second convolution layer (same as first layer)
classifier.add(Conv2D(hyper_feature_maps, (3, 3),
                      input_shape = (hyper_dimension,
                                     hyper_dimension,
                                     hyper_channels),
                      activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Convolution & pooling - Third convolution layer
classifier.add(Conv2D(hyper_feature_maps * 2, (3, 3),
                      input_shape = (hyper_dimension,
                                     hyper_dimension,
                                     hyper_channels),
                      activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Flattening
classifier.add(Flatten())

# Full connection
classifier.add(Dense(units = hyper_feature_maps * 2, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))

# Compiling the CNN
classifier.compile(optimizer = 'adam',
                   loss = 'binary_crossentropy',
                   metrics = ['accuracy'])

Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
