# Jonathan Chua's JumpStart DL Project - Predicting Abnormal ECG
While doctors might be used to reading ECGs and interpreting them with extremely high accuracy, reading ECGs are not nearly as straighforward to the layman. With some portable ECG devices already in the market, an automatic ECG interpretation will allow users to know if they should seek further medical advice. This ECG dataset comprises of over 4000 normal ECGs and 10,000 ECGs, with each data entry representing one heart beat and stored in a normalized array of 187 values. While it is possible to use a RNN to analyze it (and it may well be the best and most efficient method), humans generally analyze ECGs as a 2D graph. Thus, I aim to build a CNN that will mimick a human doctor to interpret the ECG and determine if it is abnormal.

### Building the Model
After processing the data and saving them as png images in google drive, this notebook will load the data and train a CNN model to classify the 2 categories of ECGs (normal and abnormal)


In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

root_path = 'gdrive/My Drive/Colab Notebooks/JumpStart Intro to Deep Learning'

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
import pandas as pd
import seaborn as sns
import numpy as np
from PIL import Image

import tensorflow as tf
import tensorflow.keras

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, Flatten,Input
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.regularizers import l2

  import pandas.util.testing as tm


### Load Data
The PNG images are loaded and stored as (299, 299) numpy arrays. Only 2000 images per category were used as the RAM in google colab was not sufficient to load more


In [0]:
normal_arr = []

for i in range(2000):
  im = Image.open(f"{root_path}/normal_graphs/n_graph_{i}.png")
  im_arr = np.array(im)/255
  normal_arr.append(im_arr)

normal_arr = np.asarray(normal_arr)

In [0]:
abnormal_arr = []

for i in range(2000):
  im = Image.open(f"{root_path}/abnormal_graphs/a_graph_{i}.png")
  im_arr = np.array(im)/255
  abnormal_arr.append(im_arr)

abnormal_arr = np.asarray(abnormal_arr)

Inputs and targets are stored in np arrays and then shuffled to remove bias that may occur during training

In [0]:
X = np.concatenate((normal_arr, abnormal_arr))
y = np.concatenate((np.ones(2000), np.zeros(2000)))

# clear variables to conserve RAM
normal_arr, abnormal_arr = None, None

from sklearn.utils import shuffle
X, y = shuffle(X, y)

use train test split twice to get a train-val-test split of 70-20-10

In [0]:
from sklearn.model_selection import train_test_split
X_train, X_val_test, y_train, y_val_test = train_test_split(X, y, test_size=0.30, random_state=42)

# clear variables to conserve RAM
X, y = None, None

X_val, X_test, y_val, y_test = train_test_split(X_val_test, y_val_test, test_size=0.3333, random_state=36)

### Build CNN Model

Create and train model. Since it is a binary classification, the number of filters do not need to be that many. However, from previous runs, the training accuracy is very high (~99%). Thus, adding some regularization will improve the validation accuracy

In [0]:
Inp = Input(shape=(299, 299, 1))
x = Conv2D(16, kernel_size=(3, 3), activation='relu', name = 'Conv_01')(Inp)
x = Conv2D(16, (3, 3), activation='relu', kernel_regularizer=l2(0.015), bias_regularizer=l2(0.015), name = 'Conv_02')(x)
x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_01')(x)
x = Conv2D(32, (3, 3), activation='relu', kernel_regularizer=l2(0.015), bias_regularizer=l2(0.015), name = 'Conv_03')(x)
x = Conv2D(32, (3, 3), activation='relu', kernel_regularizer=l2(0.015), bias_regularizer=l2(0.015), name = 'Conv_04')(x)
x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_02')(x)
x = Conv2D(64, (3, 3), activation='relu', kernel_regularizer=l2(0.015), bias_regularizer=l2(0.015), name = 'Conv_05')(x)
x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_03')(x)
x = Dropout(0.4 ,name = 'Dropout_01')(x)
x = Flatten(name = 'Flatten_01')(x)
x = Dense(32, activation='relu', name = 'Dense_01')(x)
x = Dropout(0.5, name = 'Dropout_02')(x)
output = Dense(1, activation='sigmoid', name = 'Dense_02')(x)

model = Model(Inp,output)

In [0]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 299, 299, 1)]     0         
_________________________________________________________________
Conv_01 (Conv2D)             (None, 297, 297, 16)      160       
_________________________________________________________________
Conv_02 (Conv2D)             (None, 295, 295, 16)      2320      
_________________________________________________________________
MaxPool_01 (MaxPooling2D)    (None, 147, 147, 16)      0         
_________________________________________________________________
Conv_03 (Conv2D)             (None, 145, 145, 32)      4640      
_________________________________________________________________
Conv_04 (Conv2D)             (None, 143, 143, 32)      9248      
_________________________________________________________________
MaxPool_02 (MaxPooling2D)    (None, 71, 71, 32)        0     

In [0]:
opt = tf.keras.optimizers.Adam(lr = 0.001)

model.compile(loss= tf.keras.losses.binary_crossentropy, optimizer= opt, metrics=['accuracy'])

Train on 30 epochs which seem to be adequate for a high accuracy

In [0]:
hist = model.fit(X_train, y_train,
          batch_size=64,
          epochs=30,
          verbose=1,
          callbacks = None,
          validation_data=(X_val, y_val))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


Evaluate the model on a completely foreign dataset of 400 entries

In [0]:
results = model.evaluate(X_test, y_test, batch_size=40)
print(f'test loss: {results[0]}, test acc: {results[1]}')

test loss: 0.33955714106559753, test acc: 0.949999988079071
