# **Predicting Heart Attacks With TensorFlow Deep Learning**

### About the Dataset

**This dataset contains a series of recorded medical attributes from patients with varying likelihoods of heart attack.**

*From the University California, Irvine Machine Learning Repository:*

<br>
Creators:
1. Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
2. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
3. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
4. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D.

Of the four original datasets (Cleveland, Hungary, Switzerland, VA Long Beach), only the Cleveland dataset it used.

The original 76 attributes recorded in the study was narrowed down to 14 (13 features and a predicted value).

<br>
The features used are:

1) age<br>
2) sex<br>
3) chest pain type (4 values)<br>
4) resting blood pressure<br>
5) serum cholestoral in mg/dl<br>
6) fasting blood sugar > 120 mg/dl<br>
7) resting electrocardiographic results (values 0,1,2)<br>
8) maximum heart rate achieved<br>
9) exercise induced angina<br>
10) oldpeak = ST depression induced by exercise relative to rest<br>
11) the slope of the peak exercise ST segment<br>
12) number of major vessels (0-3) colored by flourosopy<br>
13) thal: 0 = normal; 1 = fixed defect; 2 = reversable defect<br>

<br>
The predicted value is based on the likelihood of a heart attack occurring:
- Value 0: < 50% diameter narrowing in any major vessel
- Value 1: > 50% diameter narrowing in any major vessel

## Training a Deep Learning Model to Predict Heart Attack Likelihood

#### *Brief Overview:*
We will use TensorFlow and Keras to train a model on the dataset.

* To begin, we will load the .csv data in a Pandas DataFrame using the pd.read_csv() method.
* We will then format the data into a numpy array in order to simplify the process.
* From here, we will process the data and feed it into a TensorFlow ANN with a single hidden layer.
* Finally, we will examine the prediction accuracy of the model.


### Import dependencies

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

### Loading data

We being by loading the .csv file using pd.read_csv().

Additionally, we see some preliminary stats about the data.

In [None]:
heart_df = pd.read_csv("../input/health-care-data-set-on-heart-attack-possibility/heart.csv")
heart_df.describe()

### Data preprocessing

We now convert the Pandas DataFrame into a numpy array, for easier use with TensorFlow.

We also shuffle the array to avoid any inherent bias in the data.

In [None]:
heart_np = heart_df.to_numpy()

np.random.shuffle(heart_np)

We allocate 80% of the data to the training set and 20% to the test set.

In [None]:
train_test_split = .8

num_examples = heart_np.shape[0]
num_train_examples = int(np.floor(num_examples*train_test_split))
num_test_examples = int(np.ceil(num_examples*(1 - train_test_split)))

print("Training Examples:", num_train_examples)
print("Test Examples:", num_test_examples)
print("\nTotal Examples:", num_examples)

We then split the dataset between train and test data, and separate the label column from each subset to obtain the necessary format for training.

In [None]:
train_data = heart_np[0:num_train_examples, :]
test_data = heart_np[num_train_examples:len(heart_np), :]

X_train = train_data[:, 0:-1]
y_train = train_data[:, -1]

X_test = test_data[:, 0:-1]
y_test = test_data[:, -1]

We can see that our training set contains 242 training examples.<br>
Our input has 13 features and our output will be a single column used for binary classification.

In [None]:
print(X_train.shape)
print(y_train.shape)

### Building and compiling our model

We define our neural network model with a single hidden layer consisting on 16 activation nodes.<br>
A simple design like this works best for a smaller dataset.

In [None]:
inputs = keras.Input(shape=(13), name="features")
x = layers.Dense(16, activation="relu", name="dense_1")(inputs)
outputs = layers.Dense(2, activation="softmax", name="predictions")(x)

model = keras.Model(inputs=inputs, outputs=outputs)

model.summary()

We will train our model with a batch size of 64 and train for 300 epochs.

In [None]:
BATCH_SIZE = 64
EPOCHS = 300

We will use the Adam optimization algorithm with the standard (recommended) parameter values for $\beta_1$, $\beta_2$, and $\epsilon$.<br>
Our learning rate will be 0.001.

In [None]:
optimizer = tf.keras.optimizers.Adam(
    learning_rate=0.001,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07
)

Compile the model using Keras' SparseCategoricalCrossentropy loss function.

In [None]:
model.compile(
    optimizer=optimizer,
    loss=keras.losses.SparseCategoricalCrossentropy(),
    metrics=[keras.metrics.SparseCategoricalAccuracy()]
)

### Training the model

We are ready to train our model.<br>
We pass in X_train as our array of features, and use y_train as our label column.

We enable shuffling to further reduce any bias in the ordering of the training examples.

In [None]:
history = model.fit(
    X_train,
    y_train,
    shuffle=True,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS
)

### Post-training evaluation

We will now examine the performance of the model on the test set.<br>
Let's create an array of predictions for the data in X_test.

In [None]:
predictions = model.predict(X_test)

Quickly define a function for comparing our model's predictions to the actual values in y_test.

In [None]:
def get_prediction_array(predictions, y):
    pred_arr = (predictions[:, 0] < 0.5)
    pred_arr = np.column_stack((pred_arr, y))
    return pred_arr

We tally up the number of correct predictions and obtain a percentage for accuracy of the model.

We obtain an accuracy in the 80-90% range.

In [None]:
results = get_prediction_array(predictions, y_test)

num_correct = 0

for i in range(num_test_examples):
    if results[i, 0] == results[i, 1]:
        num_correct += 1

print("Accuracy:", num_correct/num_test_examples)

We can see that TensorFlow's evaluate() function gives us the same value.

In [None]:
score = model.evaluate(X_test, y_test)
print("\nAccuracy:", score[1])

Create an array for prediction success (1 = Correct, 0 = Incorrect).

In [None]:
performance = (results[:, 0] == results[:, 1]).astype(int)

We can see from the bar chart that our model performed fairly well.

In [None]:
plt.title("Model Performance (Correct vs. Incorrect)")
plt.xlabel("1 = Correct  0 = Incorrect")
plt.ylabel("Count")
plt.xticks([1, 0])
plt.xlim(1.25, -0.25)

plt.hist(performance)
plt.show()