<a href="https://colab.research.google.com/github/reizkian/chestxray/blob/master/main_chestxray.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Chest XRay - Pneumonia Identification**

<p align="justify">
The 2019 novel coronavirus (COVID-19) presents several unique features. While the diagnosis is confirmed using polymerase chain reaction (PCR), infected patients with pneumonia may present on chest X-ray and computed tomography (CT) images with a pattern that is only moderately characteristic for the human eye Ng, 2020. COVID-19’s rate of transmission depends on our capacity to reliably identify infected patients with a low rate of false negatives. In addition, a low rate of false positives is required to avoid further increasing the burden on the healthcare system by unnecessarily exposing patients to quarantine if that is not required. Along with proper infection control, it is evident that timely detection of the disease would enable the implementation of all the supportive care required by patients affected by COVID-19.
</p>

<p align="justify">
In late January, a Chinese team published a paper detailing the clinical and paraclinical features of COVID-19. They reported that patients present abnormalities in chest CT images with most having bilateral involvement Huang 2020. Bilateral multiple lobular and subsegmental areas of consolidation constitute the typical findings in chest CT images of intensive care unit (ICU) patients on admission Huang 2020. In comparison, non-ICU patients show bilateral ground-glass opacity and subsegmental areas of consolidation in their chest CT images Huang 2020. In these patients, later chest CT images display bilateral ground-glass opacity with resolved consolidation Huang 2020.
</p>

<p align="justify">
COVID is possibly better diagnosed using radiological imaging Fang, 2020 and Ai 2020.
</p>


**citation**

[1] Joseph Paul Cohen and Paul Morrison and Lan Dao. COVID-19 image data collection, arXiv, 2020. https://github.com/ieee8023/covid-chestxray-dataset

[2] https://github.com/JordanMicahBennett/SMART-CT-SCAN_BASED-COVID19_VIRUS_DETECTOR/



## Data Preparation

**The 1.11GB images of Chest X-Ray canbe downloaded directly from:**

https://www.kaggle.com/alifrahman/chestxraydataset uploaded on kaggle by Alif Rahman 31 August 2020

### download data

Before we can import data from Kaggle to google.colab, we need to download the API token by **Login to Kaggle > My Account > Home > Create New API Token**. The API token wil be downloaded in the format of **kaggle.json**, then we need to upload it to goole colab hosted runtime.


In [None]:
# upload your 'kaggle.json' to hosted runtime
from google.colab import files
files.upload()

we need to do several adjustment such as installing kaggle library using pip and so on, until we adjust the access permisions

In [None]:
# Install kaggle library 
!pip install -q kaggle
# Make ".kaggle" directory in root directory
!mkdir -p ~/.kaggle
# Copy the API token to the kaggle directory
!cp kaggle.json ~/.kaggle/
# Check the directory
!ls ~/.kaggle
# Adjust access permissions
!chmod 600 /root/.kaggle/kaggle.json

The cell bellow contains the command to download the data into your hosted directory (google server). Basicly you just migrated the whole dataset from kaggle's server into google's server. 

In [None]:
# Download the data
# you need to copy the API command from the kaggle link above
!kaggle datasets download -d alifrahman/chestxraydataset 

In [None]:
# unzip the data
!unzip -q chestxraydataset.zip -d .
!ls

we need to specify the specific path for train and test data, and each folder containing PNEUMONIA and NORMAL images dataset.

In [None]:
import os
# label
pneumo = "PNEUMONIA"
normal = "NORMAL"

base_directory = "/content/chest_xray"
# Train dataset directory
train_dir = os.path.join(base_directory,"train") 
train_dir_pneumo = os.path.join(train_dir,pneumo)
train_dir_normal = os.path.join(train_dir,normal)
# Test dataset directory
test_dir = os.path.join(base_directory,"test")
test_dir_pneumo = os.path.join(test_dir,pneumo)
test_dir_normal = os.path.join(test_dir,normal)

print("CHECK SPECIFIC PATH HAS BEEN CREATED SUCCESSFULLY")
print()
print(base_directory)
print("-------------------------------------------")
print(train_dir)
print(train_dir_pneumo)
print(train_dir_normal)
print("-------------------------------------------")
print(test_dir)
print(test_dir_pneumo)
print(test_dir_normal)

In [None]:
# PREVIEW RANDOM IMAGES IN TRAIN DATA
# INSPECT RAW DATA BEFORE AUGMENT
import matplotlib.pyplot as plt
import tensorflow as tf

nrows = 2
ncols = 4
img_index = int(nrows*ncols / 2)

# Set up matplotlib fig, and size it to fit 2x4 pics
fig = plt.gcf()
fig.set_size_inches(ncols*4, nrows*4)


def RandomSamplingClassPath(path, number_of_images):
  """
  take a directory path (eg. TRAIN dir which contain 2 class), 
  and return number_of_images path of the sampled images 
  """
  fnames = os.listdir(path)
  fnames = (tf.random.shuffle(fnames)[:number_of_images]).numpy()
  fnames = [tf.compat.as_str(fname) for fname in fnames]
  fnames = [os.path.join(path, fname) for fname in fnames]
  return fnames

# Pick random image from train normal
normal_images = RandomSamplingClassPath(train_dir_normal, img_index)
# Pick random image from train pneumo
pneumonia_images =RandomSamplingClassPath(train_dir_pneumo, img_index)

print("SAMPLE OF TRAINING IMAGES (Before Augmentation)")
for i, img_path in enumerate(normal_images + pneumonia_images):
    # Set up subplot; subplot indices start at 1
    sp = plt.subplot(nrows, ncols, i+1)
    sp.axis('Off') # Don't show axes (or gridlines)
    if i < len(normal_images):
        plt.title('NORMAL', fontweight='bold')
    else:
        plt.title('PNEUMONIA', fontweight='bold')
    img = plt.imread(img_path)
    plt.imshow(img,cmap="bone")

plt.show()

### Preprocessing data
We implement image augmentation (rescale, rotation, zoom, shear, flip, etc) from the HARD DRIVE before it get processed in the RAM.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
batch_size = 20 # load 20 batch of images per preprocessing and for later training
target_size = (256,256) # resize into (256,256) pixel

# Data Generator (to specify the augmentation)
train_datagen = ImageDataGenerator(rescale = 1.0/255,
                                   rotation_range = 3,
                                   #width_shift_range = 0.2,
                                   #height_shift_range = 0.2,
                                   #shear_range = 0.1,
                                   horizontal_flip=True,
                                   zoom_range=0.1,
                                   #featurewise_std_normalization=True,
                                   #featurewise_center=True,
                                   #fill_mode="nearest"
                                   )
test_datagen = ImageDataGenerator(rescale = 1.0/255)

# Specify Image Class Name
classes = ["NORMAL","PNEUMONIA"]

# Train Generator (Agmentation done pre loading on RAM)
train_generator = train_datagen.flow_from_directory(train_dir,
                                                    shuffle=True,
                                                    batch_size=batch_size,
                                                    #color_mode="grayscale",
                                                    class_mode="binary",
                                                    classes=classes,
                                                    target_size=target_size
                                                    )
test_generator = test_datagen.flow_from_directory(test_dir,
                                                  batch_size=batch_size,
                                                  #color_mode="grayscale",
                                                  class_mode="binary",
                                                  classes=classes,
                                                  target_size=target_size
                                                  )

In [None]:
# PREVIEW IMAGES DATA (after augmentation)
print("SAMPLE OF TRAINING IMAGES (After Augmentation)")
import matplotlib.pyplot as plt
plt.style.use('seaborn')
import tensorflow as tf
# Obtain one batch of testing images
images, labels = next(train_generator)
labels = labels.astype('int')
# Plot the images in the batch, along with predicted and true labels
nrows = 4
ncols = batch_size / nrows
fig = plt.figure(figsize=(15, 14))
for idx in range(20):
    ax = fig.add_subplot(nrows, ncols, idx+1, xticks=[], yticks=[])
    plt.imshow(images[idx], cmap="bone")
    ax.set_title(classes[labels[idx]])

## Deep Learning Model
First we try to build standard model with several **Convolution, Maxpooling, and Dense** layer as a base model. Later we try to implement more advanced model by **Transfer Learning** and **Fine Tuning**.

### Base Model

In [None]:
# Build Model

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, BatchNormalization, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam, RMSprop, SGD

model = Sequential([
    # Note the input shape is the desired size of the image 224x224 with 3 bytes color
    Conv2D(32, (3,3), activation='relu', input_shape=(256, 256, 3)),
    MaxPooling2D(2,2),
    Conv2D(32, (3,3), activation='relu'),
    MaxPooling2D(2,2), 
    Conv2D(32, (3,3), activation='relu'), 
    MaxPooling2D(2,2),
    Conv2D(32, (3,3), activation='relu'), 
    MaxPooling2D(2,2),
    Conv2D(32, (3,3), activation='relu'), 
    MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    Flatten(), 
    # Dense hidden layer
    Dense(512, activation='relu'),
    Dense(256, activation='relu'),
    Dense(128, activation='relu'),   
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('melanoma') and 1 for the other ('not melanoma')
    Dense(1, activation='sigmoid')  
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

In [None]:
# Callback

from tensorflow.keras.callbacks import Callback, EarlyStopping
class myCallback(Callback):
    def on_epoch_end(self, epoch, logs={}):
        if(logs.get('accuracy')>0.93):
            print("\nReached 93% accuracy so cancelling training!")
            self.model.stop_training = True
# callbacks = myCallback()
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

In [None]:
# Train the model
history = model.fit(train_generator,
                    validation_data=test_generator,
                    steps_per_epoch=10,
                    epochs=35,
                    validation_steps=130,
                    callbacks=[early_stopping],
                    verbose=2)

In [None]:
print(history.history.keys())

#### training performance
by extracting accuracy and loss parameters during training, we can see how those paramaters evolve for each epoch.

In [None]:
import matplotlib.pyplot as mpl
mpl.style.use("default")
plt.plot(history.history['accuracy'],'C0')
plt.plot(history.history['loss'],'C1')
plt.title('Model Training Performance ')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['accucary', 'loss'], loc='upper right')
plt.show()

#### heat map of features
<p align="justify">By plotting the heat map we can take a look how the convolution layer gives an attention to some specifics parts of the image in order to classify the images. To visualize the heatmap, we will use a technique called Grad-CAM (Gradient Class Activation Map). The idea behind it is quite simple. To find the importance of a certain class in our model, we simply take its gradient with respect to the final convolutional layer and then weigh it against the output of this layer.
</p>

In [None]:
from tensorflow.keras.applications.inception_v3 import preprocess_input
import numpy as np

def HeatMapExtract(conv2d_n,class_type):
  """
  return a vector of n-th convolution layer corelates to
  a certain_type (0:NORMAL, 1:PNEUMONIA)
  """

  # take a batch of training data (1 batch = 20 images)
  images, labels = next(train_generator)

  # take a sample of vectorized image from a batch of training images
  for i in range (len(labels)):
    if labels[i] == class_type:
      sample_from_batch_images = images[i]
      break

  # image vector adjustment to fit the model input
  x = image.img_to_array(sample_from_batch_images)
  x = np.expand_dims(x, axis=0)

  with tf.GradientTape() as tape:
    last_conv_layer = model.get_layer(conv2d_n)
    iterate = tf.keras.models.Model([model.inputs], [model.output, last_conv_layer.output])
    model_out, last_conv_layer = iterate(x)
    class_out = model_out[:, np.argmax(model_out[0])]
    grads = tape.gradient(class_out, last_conv_layer)
    pooled_grads = backend.mean(grads, axis=(0, 1, 2))
    # ceating heat map   
    heatmap = tf.reduce_mean(tf.multiply(pooled_grads, last_conv_layer), axis=-1)
    heatmap = np.maximum(heatmap, 0)
    heatmap /= np.max(heatmap)
    hm_shape = heatmap.shape
    return heatmap.reshape((hm_shape[1],hm_shape[2]))
    
HeatMapPneumo = HeatMapExtract("conv2d_42",1)
plt.imshow(HeatMapPneumo, cmap="jet")
plt.grid(False)
plt.title("HEAT MAP OF PNEUMONIA")
plt.colorbar()
plt.show()

HeatMapNormal = HeatMapExtract("conv2d_42",0)
plt.imshow(HeatMapNormal, cmap="jet")
plt.grid(False)
plt.title("HEAT MAP OF NORMAL")
plt.colorbar()
plt.show()

In [None]:
import matplotlib.pyplot as plt
plt.style.use('seaborn')
import tensorflow as tf
from tensorflow.keras import backend
import cv2

# Obtain one batch of testing images
images_test, labels_test = next(test_generator)
labels = labels.astype('int')

def HeatMapApply (image_vector,heatmap):
  INTENSITY = 0.6
  img = images[0]
  hm = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
  hm = cv2.applyColorMap(np.uint8(255*hm), cv2.COLORMAP_JET)
  return (hm/255)*INTENSITY+image_vector

# Plot the images in the batch, along with predicted and true labels
nrows = 4
ncols = batch_size / nrows
fig = plt.figure(figsize=(15, 14))
for idx in range(10):
    ax = fig.add_subplot(nrows, ncols, idx+1, xticks=[], yticks=[])
    imhm = HeatMapApply(images[idx], HeatMapPneumo)
    plt.imshow(imhm)
    ax.set_title(classes[labels[idx]])