# Unit 3 Assessment

# Swetha Veerla(U62395128)

In this assignment, we will focus on healthcare. This data set is made available by MIT. It contains data about 9,026 heartbeat measurements. Each row represents a single measurement (captured on a timeline). There are a total of 80 data points (columns). This is a multiclass classification task: predict whether the measurement represents a normal heartbeat or other anomalies. 

## Description of Variables

You will use the **hearbeat_cleaned.csv** data set for this assignment. Each row represents a single measurement. Columns labeled as T1 from T80 are the time steps on the timeline (there are 80 time steps, each time step has only one measurement). 

The last column is the target variable. It shows the label (category) of the measurement as follows:<br>
0 = Normal<br>
1 = Supraventricular premature beat<br>
2 = Premature ventricular contraction<br>
3 = Fusion of ventricular and normal beat<br>
4 = Unclassifiable beat

## Goal

Use the data set **hearbeat_cleaned.csv** to predict the column called **Target**. The input variables are columns labeled as **T1 to T80**. 

## Submission:

Please save and submit this Jupyter notebook file. The correctness of the code matters for your grade. **Readability and organization of your code is also important.** You may lose points for submitting unreadable/undecipherable code. Therefore, use markdown cells to create sections, and use comments where necessary.


# Read and Prepare the Data (1 points)

In [1]:
# Insert as many cells as you need for data prep

In [2]:
# Common imports
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd



In [3]:
data = pd.read_csv("heartbeat_cleaned.csv")

In [4]:
data.shape

(7960, 81)

In [5]:
data.head()

Unnamed: 0,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,...,T72,T73,T74,T75,T76,T77,T78,T79,T80,Target
0,0.987,0.892,0.461,0.113,0.149,0.19,0.165,0.162,0.147,0.138,...,0.197,0.197,0.196,0.203,0.201,0.199,0.201,0.205,0.208,0
1,1.0,0.918,0.621,0.133,0.105,0.125,0.117,0.0898,0.0703,0.0781,...,0.195,0.191,0.152,0.172,0.207,0.211,0.207,0.207,0.172,0
2,1.0,0.751,0.143,0.104,0.0961,0.0519,0.0442,0.0416,0.0364,0.0857,...,0.226,0.242,0.244,0.286,0.468,0.816,0.977,0.452,0.0519,0
3,1.0,0.74,0.235,0.0464,0.0722,0.0567,0.0103,0.0155,0.0284,0.0155,...,0.0851,0.0747,0.0515,0.0593,0.067,0.0361,0.121,0.451,0.869,0
4,1.0,0.833,0.309,0.0191,0.101,0.12,0.104,0.0874,0.0765,0.0765,...,0.205,0.421,0.803,0.951,0.467,0.0,0.0519,0.082,0.0628,0


In [6]:
y = data['Target']
x = data.drop('Target', axis=1)

## Split the Data

In [7]:
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.3)

## Data Transformation

In [8]:
#Target variables need to be an array with integer type
train_y = np.array(train_y)
test_y = np.array(test_y)

train_y = train_y.astype(np.int32)
test_y = test_y.astype(np.int32)



In [9]:
#Check the first 10 values of the train_y data set
train_y[0:10]

array([0, 0, 4, 4, 0, 0, 0, 0, 0, 2], dtype=int32)

In [10]:
#Convert input variables to a 2-D array with float data type
train_x= np.array(train_x)
test_x= np.array(test_x)

train_x = train_x.astype(np.float32)
test_x = test_x.astype(np.float32)

In [11]:
train_x

array([[1.    , 0.654 , 0.276 , ..., 0.352 , 0.354 , 0.376 ],
       [0.988 , 0.931 , 0.82  , ..., 0.235 , 0.227 , 0.234 ],
       [0.918 , 0.484 , 0.5   , ..., 0.186 , 0.189 , 0.186 ],
       ...,
       [0.887 , 0.496 , 0.521 , ..., 0.202 , 0.213 , 0.199 ],
       [0.931 , 0.868 , 0.637 , ..., 0.0686, 0.0294, 0.0686],
       [1.    , 0.546 , 0.586 , ..., 0.132 , 0.139 , 0.132 ]],
      dtype=float32)

In [12]:
#Keras expects a different input format:
#Data needs to have 3 dimensions

train_x = np.reshape(train_x, (train_x.shape[0], train_x.shape[1], 1))
test_x = np.reshape(test_x, (test_x.shape[0], test_x.shape[1], 1))

In [13]:
train_x.shape, train_y.shape

((5572, 80, 1), (5572,))

In [14]:
train_x

array([[[1.    ],
        [0.654 ],
        [0.276 ],
        ...,
        [0.352 ],
        [0.354 ],
        [0.376 ]],

       [[0.988 ],
        [0.931 ],
        [0.82  ],
        ...,
        [0.235 ],
        [0.227 ],
        [0.234 ]],

       [[0.918 ],
        [0.484 ],
        [0.5   ],
        ...,
        [0.186 ],
        [0.189 ],
        [0.186 ]],

       ...,

       [[0.887 ],
        [0.496 ],
        [0.521 ],
        ...,
        [0.202 ],
        [0.213 ],
        [0.199 ]],

       [[0.931 ],
        [0.868 ],
        [0.637 ],
        ...,
        [0.0686],
        [0.0294],
        [0.0686]],

       [[1.    ],
        [0.546 ],
        [0.586 ],
        ...,
        [0.132 ],
        [0.139 ],
        [0.132 ]]], dtype=float32)

# Find the baseline (0.5 point)

In [15]:
from sklearn.dummy import DummyClassifier

dummy_clf = DummyClassifier(strategy="most_frequent")

dummy_clf.fit(train_x, train_y)

DummyClassifier(strategy='most_frequent')

In [16]:
from sklearn.metrics import accuracy_score

In [17]:
#Baseline Train Accuracy
dummy_train_pred = dummy_clf.predict(train_x)

baseline_train_acc = accuracy_score(train_y, dummy_train_pred)

print('Baseline Train Accuracy: {}' .format(baseline_train_acc))

Baseline Train Accuracy: 0.5816582914572864


In [18]:
#Baseline Test Accuracy
dummy_test_pred = dummy_clf.predict(test_x)

baseline_test_acc = accuracy_score(test_y, dummy_test_pred)

print('Baseline Test Accuracy: {}' .format(baseline_test_acc))

Baseline Test Accuracy: 0.5829145728643216


# Build a cross-sectional (i.e., a regular) Neural Network model using Keras (with only one hidden layer) (2 points)

# A normal (cross-sectional) NN


In [19]:
from tensorflow.keras.callbacks import EarlyStopping


earlystop = EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')

callback = [earlystop]

In [20]:
model = keras.models.Sequential([
    
    keras.layers.Flatten(input_shape=[80, 1]),
    keras.layers.Dense(80, activation='relu'),
    keras.layers.Dense(5, activation='softmax')
    
])

In [21]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = tf.keras.optimizers.Nadam(learning_rate=0.01)

# If multiclass, use "sparse_categorical_crossentropy" as the loss function
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])


history = model.fit(train_x, train_y, epochs=50,
                    validation_data=(test_x, test_y),callbacks=callback)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 13: early stopping


In [22]:
# evaluate the model

scores = model.evaluate(test_x, test_y, verbose=0)

scores

# In results, first is loss, second is accuracy

[0.24461400508880615, 0.9237855672836304]

In [23]:
# extract the accuracy from model.evaluate

print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))


loss: 0.24
accuracy: 92.38%


# Build a deep cross-sectional (i.e., regular) Neural Network model using Keras (with two or more hidden layers) (2 points)

In [24]:
model = keras.models.Sequential([
    
    keras.layers.Flatten(input_shape=[80, 1]),
    keras.layers.Dense(80, activation='relu'),
    keras.layers.Dense(80, activation='relu'),
    keras.layers.Dense(80, activation='relu'),
    keras.layers.Dense(5, activation='softmax')
    
])

In [25]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = keras.optimizers.Nadam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=50,
                   validation_data = (test_x, test_y), callbacks=callback)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 8: early stopping


In [26]:
# evaluate the model

scores = model.evaluate(test_x, test_y, verbose=0)

scores

# In results, first is loss, second is accuracy

[0.3584342896938324, 0.8881909251213074]

In [27]:
# extract the accuracy from model.evaluate

print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))


loss: 0.36
accuracy: 88.82%


# Build a LSTM Model (with only one layer) (2 points)

In [28]:
n_steps = 80
n_inputs = 1

model = keras.models.Sequential([
    
    keras.layers.LSTM(40, input_shape=[n_steps, n_inputs]),
    keras.layers.Dense(5, activation='softmax')
])

In [29]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = keras.optimizers.Nadam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=50,
                   validation_data = (test_x, test_y), callbacks=callback)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 42: early stopping


In [30]:
scores = model.evaluate(test_x, test_y, verbose=0)

scores


[0.2727125585079193, 0.9221105575561523]

In [31]:
print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))


loss: 0.27
accuracy: 92.21%


# Build a deep LSTM Model (with only two layers) (2 points)

In [32]:
n_steps = 80
n_inputs = 1

model = keras.models.Sequential([
    keras.layers.LSTM(40, return_sequences=True, input_shape=[n_steps, n_inputs]),
    keras.layers.LSTM(40),
    keras.layers.Dense(5, activation='softmax')
])

In [33]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = keras.optimizers.Nadam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=50,
                   validation_data = (test_x, test_y), callbacks=callback)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 37: early stopping


In [34]:
scores = model.evaluate(test_x, test_y, verbose=0)

scores


[0.3553027808666229, 0.8994975090026855]

In [35]:
print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))


loss: 0.36
accuracy: 89.95%


# Build a GRU Model (with only one layer) (2 points)

In [36]:
model = keras.models.Sequential([
    keras.layers.GRU(80, input_shape=[n_steps, n_inputs]),
    keras.layers.Dense(5, activation='softmax')
])

In [37]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = keras.optimizers.Nadam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=50,
                   validation_data = (test_x, test_y), callbacks=callback)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 20: early stopping


In [38]:
scores = model.evaluate(test_x, test_y, verbose=0)

scores

[0.224001944065094, 0.9329982995986938]

In [39]:
print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

loss: 0.22
accuracy: 93.30%


# Build a deep GRU Model (with only two layers) (2 points)

In [40]:
model = keras.models.Sequential([
    keras.layers.GRU(40, return_sequences=True, input_shape=[n_steps, n_inputs]),
    keras.layers.GRU(40),
    keras.layers.Dense(5, activation='softmax')
])

In [41]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = keras.optimizers.Nadam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=50,
                   validation_data = (test_x, test_y), callbacks=callback)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 20: early stopping


In [42]:
scores = model.evaluate(test_x, test_y, verbose=0)

scores

[0.27363279461860657, 0.9221105575561523]

In [43]:
print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

loss: 0.27
accuracy: 92.21%


# Discussion

## List the test values of each model you built (0.5 points)

## Which model performs the best and why? (0.5 points) 

## How does it compare to baseline? (0.5 points)