This data set is available by MIT. This is a multiclass classification model to determine if heartbeats are normal or contain abnormalities.

# Description

The dataset contains measurements with 80 time steps and each time step has one measurement. 
They are labeled from T1 to T80 and classified in the following categories:

0 = Normal
1 = Supraventricular premature beat
2 = Premature ventricular contraction
3 = Fusion of ventricular and normal beat
4 = Unclassifiable beat

## Goal

Use the data set **hearbeat_cleaned.csv** to predict the column called **Target**. The input variables are columns labeled as **T1 to T80**. 

# Preparing the Data

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
np.random.seed(63829204)
tf.random.set_seed(63829204)

In [2]:
heart = pd.read_csv(r"./heartbeat_cleaned.csv")

In [3]:
heart.head()

Unnamed: 0,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,...,T72,T73,T74,T75,T76,T77,T78,T79,T80,Target
0,0.987,0.892,0.461,0.113,0.149,0.19,0.165,0.162,0.147,0.138,...,0.197,0.197,0.196,0.203,0.201,0.199,0.201,0.205,0.208,0
1,1.0,0.918,0.621,0.133,0.105,0.125,0.117,0.0898,0.0703,0.0781,...,0.195,0.191,0.152,0.172,0.207,0.211,0.207,0.207,0.172,0
2,1.0,0.751,0.143,0.104,0.0961,0.0519,0.0442,0.0416,0.0364,0.0857,...,0.226,0.242,0.244,0.286,0.468,0.816,0.977,0.452,0.0519,0
3,1.0,0.74,0.235,0.0464,0.0722,0.0567,0.0103,0.0155,0.0284,0.0155,...,0.0851,0.0747,0.0515,0.0593,0.067,0.0361,0.121,0.451,0.869,0
4,1.0,0.833,0.309,0.0191,0.101,0.12,0.104,0.0874,0.0765,0.0765,...,0.205,0.421,0.803,0.951,0.467,0.0,0.0519,0.082,0.0628,0


In [4]:
heart.shape

(7960, 81)

In [5]:
heart['Target'].unique()

array([0, 1, 2, 3, 4], dtype=int64)

In [6]:
y = heart['Target']
x = heart.drop('Target', axis=1)

In [7]:
from sklearn.model_selection import train_test_split
 
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.2)

# Data Transformation

In [8]:
train_y = np.array(train_y)
test_y = np.array(test_y)

In [9]:
train_y[0:10]

array([4, 4, 2, 1, 0, 0, 0, 0, 0, 2], dtype=int64)

In [10]:
train_x = np.array(train_x)
test_x = np.array(test_x)

train_x = train_x.astype(np.float32)
test_x = test_x.astype(np.float32)

In [11]:
train_x

array([[0.964 , 0.478 , 0.481 , ..., 0.353 , 0.338 , 0.344 ],
       [1.    , 0.535 , 0.581 , ..., 0.221 , 0.198 , 0.189 ],
       [0.    , 0.0332, 0.0452, ..., 0.335 , 0.309 , 0.312 ],
       ...,
       [0.958 , 0.953 , 0.708 , ..., 0.0729, 0.0833, 0.0781],
       [0.948 , 0.817 , 0.187 , ..., 0.0835, 0.0875, 0.0855],
       [0.797 , 0.131 , 0.0254, ..., 0.428 , 0.403 , 0.428 ]],
      dtype=float32)

In [12]:
#Reshape training data for keras input
train_x = np.reshape(train_x, (train_x.shape[0], train_x.shape[1], 1))
test_x = np.reshape(test_x, (test_x.shape[0], test_x.shape[1], 1))

In [13]:
train_x.shape, test_x.shape

((6368, 80, 1), (1592, 80, 1))

# Finding the baseline

In [14]:
heart['Target'].value_counts()

0    4633
4    1584
2    1237
1     445
3      61
Name: Target, dtype: int64

In [15]:
heart['Target'].value_counts()/len(heart)

0    0.582035
4    0.198995
2    0.155402
1    0.055905
3    0.007663
Name: Target, dtype: float64

# Baseline is 58.2% accuracy

In [16]:
heart['Target'].count()

7960

# A cross-sectional deep model using Keras

In [21]:
model = keras.models.Sequential([
    
    keras.layers.Flatten(input_shape=[80, 1]),
    keras.layers.Dense(80, activation='relu'),
    keras.layers.Dense(40, activation='relu'),
    keras.layers.Dense(20, activation='relu'),
    keras.layers.Dense(5, activation='softmax'),
    
])

In [22]:
optimizer = tf.keras.optimizers.Nadam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=15,
                    validation_data=(test_x, test_y))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [23]:
# Score to evaluate the cross sectional Keras mdoel

scores = model.evaluate(test_x, test_y, verbose=0)

scores

[0.2293974757194519, 0.9371859431266785]

In [24]:
print(f'{model.metrics_names[0]}: {round(scores[0],3)}')
print(f'{model.metrics_names[1]}: {round(scores[1]*100,2)}%')

loss: 0.229
accuracy: 93.72%


# Starting with a shallow LSTM model

In [120]:
model = keras.models.Sequential([
    
    keras.layers.LSTM(80, activation='relu',  input_shape=[80, 1]),
    keras.layers.Dense(5, activation='softmax')
    
])

In [121]:
#Use instead of the model optimizer so as to be able to tune the learning rate
optimizer = tf.keras.optimizers.Nadam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=15,
                    validation_data=(test_x, test_y))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [122]:
# Score to evaluate the cross sectional Keras mdoel

scores = model.evaluate(test_x, test_y, verbose=0)

scores

[1.090551495552063, 0.5822864174842834]

In [123]:
print(f'{model.metrics_names[0]}: {round(scores[0],3)}')
print(f'{model.metrics_names[1]}: {round(scores[1]*100,2)}%')

loss: 1.091
accuracy: 58.23%


# 2 Layer LSTM Model

In [108]:
from tensorflow.keras.callbacks import EarlyStopping

earlystop = EarlyStopping(monitor='val_loss', patience=3, mode='auto')

In [109]:
model = keras.models.Sequential([
    keras.layers.LSTM(80, return_sequences=True),
    keras.layers.LSTM(20),
    keras.layers.Dense(5, activation='softmax')
])

In [110]:
optimizer = keras.optimizers.Nadam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=15,
                   validation_data = (test_x, test_y), callbacks=earlystop)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15


In [111]:
# Score to evaluate the cross sectional Keras mdoel

scores = model.evaluate(test_x, test_y, verbose=0)

scores

[1.134664535522461, 0.5816583037376404]

In [112]:
print(f'{model.metrics_names[0]}: {round(scores[0],3)}')
print(f'{model.metrics_names[1]}: {round(scores[1]*100,2)}%')

loss: 1.135
accuracy: 58.17%


# Sequential deep GRU Model

In [92]:
model = keras.models.Sequential([
    keras.layers.GRU(80, return_sequences=True),
    keras.layers.GRU(40),
    keras.layers.Dense(5, activation='softmax')
])

In [96]:
optimizer = keras.optimizers.Nadam(learning_rate=0.001)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=15,
                   validation_data = (test_x, test_y), callbacks=earlystop)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [97]:
scores = model.evaluate(test_x, test_y, verbose=0)

scores

[0.29914870858192444, 0.9026381969451904]

In [98]:
print(f'{model.metrics_names[0]}: {round(scores[0],3)}')
print(f'{model.metrics_names[1]}: {round(scores[1]*100,2)}%')

loss: 0.299
accuracy: 90.26%


## Results

From evaluating the test scores, I received values of:
accuracy: 93.72% for the cross sectional deep model
accuracy: 58.23% for the shallow LSTM model
accuracy: 58.17% for the deep LSTM
accuracy: 90.26% for the deep GRU model

The regular deep model performs the best. Both RNNs and regular dense networks, for short sequences, can handle sequential data and the dense layers are still suited for sequences that I am using in this set. The way the neural network handles the weights and training may be a factor.
The LSTM would seem to be suited for the task so the use of the forget gate is likely causing it to perform worse along with the logistic function although GRU is based on LSTM. LSTM and GRUs cannot typically longer handle sequences. 

It performs much better than the baseline 58% accuracy since it is 93.72% accurate. 