# Cardiovascular Disease Detection

## 1 - Problem Description

Cardiovascular disease is a generic term that represents any sort of disorder that is associated with the heart. 695,000 people in the United States die each year due to cardiovascular disease. It is life-threatening and it is imporant to diagnose. That is why I have decided to build a machine learning model that can accurately predict whether a person has heart disease based on 12 key features.

The dataset used to train the model is a cardiovascular disease dataset found on kaggle.com. The data was acquired from a multispecialty hospital in India. There are 1,000 total patient examples with 12 different features for each. The purpose for the compilation of this dataset was to generate a predictive machine-learning model to detect early-stage heart disease.

## 2 - Biomedical Dataset and Preprocessing


### 2.1 - Packages 

First, running the cell below will import the packages needed for this project.
- [numpy](https://numpy.org/) is the fundamental package for scientific computing with Python.
- [tensorflow](https://www.tensorflow.org/) a popular platform for machine learning.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

**Tensorflow and Keras**  
Tensorflow is a machine learning package developed by Google. In 2019, Google integrated Keras into Tensorflow and released Tensorflow 2.0. Keras is a framework developed independently by François Chollet that creates a simple, layer-centric interface to Tensorflow. This project will be using the Keras interface. 

### 2.2 - Dataset Desciption

This dataset contains 1,000 patient examples, each with a patient ID, a classification, and the following 12 features:
- Age: The patient's age in years
- Gender: The patient's gender (0 - female, 1 - male)
- Chest Pain Type: 0 - typical angina, 1 - atypical angina, 2 - non-anginal pain, 3 - asymptomatic
- Resting Blood Pressure: The patients resting blood pressure (mmHg)
- Serum Cholesterol: The patient's total amount of cholesterol in their blood (mg/dL)
- Fasting Blood Sugar: 1 - greater than 120 mg/dL, 0 - not greater than 120 mg/dL
- Resting Electrocardiogram Results: 0 - normal, 1 - having ST-T wave abnormality, 2 - showing probable or definite left ventricular hypertrophy
- Maximum Heart Rate Achieved: The patient's max heart rate (BPM)
- Exercise Induced Angina: 0 - no, 1 - yes
- Old Peak ST: How low the ST segment is below the baseline
- Slope of the Peak Exercise ST Segment: 1 - upsloping, 2 - flat, 3 - downsloping
- Number of Major Vessels: 0, 1, 2, or 3 major vessels

Each example provides a classification where a 0 indicates the absence of heart disease and a 1 indicates the presence of heart disease.

There is no missing data in this dataset; however, there are many binary and nominal values that will need to be one-hot encoded. This will be discussed more later.

### 2.3 - Loading the Data

Before processing, we will load the data into two variables. The first, X, is an m x n matrix that stores each patient example and the 12 corresponding features (m examples, n features). The second, y, is an m x 1 array that provides a classification for each patient example (m examples). A 0 indicates the absence of heart disease and 1 indicates the presence of heart disease.

In [2]:
def load_data(filename):
    """
    Loads and formats data from the WDBC dataset

    Args:
    filename : relative path for the file that holds the data

    Returns:
    X : (ndarray Shape (m,n)) data, m examples by n features
    y : (array_like Shape (m,)) outputs, 1 == heart disease present, 0 == absent
    """
    # Load the data from the file
    data = np.loadtxt(filename, dtype=str, delimiter=',')

    # Store the 12 features from each example into a 2D matrix and convert the type to float
    X = np.array(data[1:,1:13])
    X = X.astype(float)

    # Store the outputs for each example and reshape the array to (m,)
    y = np.array(data[1:,13])
    y = y.astype(float)
    y = np.reshape(y, (len(y), 1))
    
    # Return data and outputs
    return X, y

In [3]:
# Load dataset
X, y = load_data("./data/Cardiovascular_Disease_Dataset.csv")

#### 2.3.1 View the Variables

To ensure that data is loaded into the notebook correctly, it is wise to check the first element of both variables. The code below prints the first elements of the variables `X`and `y`.  

In [4]:
# Print the first elements of X and y
print ('The first element of X is: ', X[0])
print ('The first element of y is: ', y[0])

The first element of X is:  [ 53.    1.    2.  171.    0.    0.    1.  147.    0.    5.3   3.    3. ]
The first element of y is:  [1.]


#### 2.3.2 Check the dimensions of your variables

Another way to ensure that the data has been loaded correctly is to view its dimensions. Printing the shape of `X` and `y` will show the number of examples and features in the data set. The shape of `X` should be `1000, 12`, and the shape of `Y` should be `1000, 1`.

In [5]:
# Print the shapes of X and y
print ('The shape of X is: ' + str(X.shape))
print ('The shape of y is: ' + str(y.shape))

The shape of X is: (1000, 12)
The shape of y is: (1000, 1)


### 2.4 - Data Preprocessing

To provide the best possible data for the model, two things should be done to the data. For numerical features, z-score normalization should be applied so that some features do not have a greater weight than others. For binary and nominal features, one-hot encoding should be used so the model can better interpret the data.

#### Z-Score Normalization Description

For continuous valued features, z-score normalization will help ensure that each feature carries the same weight in the model. Z-score normalization is a method of feature scaling that involves using the mean and standard deviation of a feature to calculate the scaled version for each example. To calculate the scaled value for each example in feature j, use the following function. 
$$ x_{j, scaled} = \frac{x_{j} - M_{j}}{\sigma_{j}}$$
Where $M_{j}$ is the mean of the examples in feature j, and $\sigma_{j}$ is the standard deviation.

#### One-Hot Encoding Description

For features that are either binary or nominal, one-hot encoding will be used to provide the model with a better understanding of the data. Some features, such as gender, do not fit well in the model as a simple 1 or 0. The same applies to nominal features, where a 0, 1, 2, or 3 indicates different attributes of some feature. Instead, it may be wise to one-hot encode that feature. This involves splitting one feature into p different features, where p represents the number of distinct possible outputs. For example, instead of having 1 feature for gender (male or female), one-hot encoding uses two features. One of the features represents whether or not the patient is female and the other represents if the patient is male.

#### Normalizing and One-Hot Encoding Features

In this section, we will go through each feature and apply z-score normalization if it is numerical or apply one-hot encoding if it is binary or nominal.

In [6]:
# Create a new variable to store each feature added
X_new = np.empty(shape = (1000,0))

# Create temporary variables to store the new columns that will be added to X_new
tmp = np.empty(shape = (1000,1))
tmp2 = np.empty(shape = (1000,1))
tmp3 = np.empty(shape = (1000,1))
tmp4 = np.empty(shape = (1000,1))

# Feature 1 - Numeric
mean = np.mean(X[:,0], axis=0)
sigma  = np.std(X[:,0], axis=0) 
tmp = (X[:,0] - mean) / sigma
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))

# Feature 2 - Binary
for i in range(0, 1000):
    if X[i,1] == 0:
        tmp[i] = 1
        tmp2[i] = 0
    else:
        tmp[i] = 0
        tmp2[i] = 1
        
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp2.reshape(-1, 1)))

# Feature 3 - Nominal
for i in range(0, 1000):
    if X[i,2] == 0:
        tmp[i] = 1
        tmp2[i] = 0
        tmp3[i] = 0
        tmp4[i] = 0
    elif X[i,2] == 1:
        tmp[i] = 0
        tmp2[i] = 1
        tmp3[i] = 0
        tmp4[i] = 0
    elif X[i, 2] == 2:
        tmp[i] = 0
        tmp2[i] = 0
        tmp3[i] = 1
        tmp4[i] = 0
    else:
        tmp[i] = 0
        tmp2[i] = 0
        tmp3[i] = 0
        tmp4[i] = 1
        
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp2.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp3.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp4.reshape(-1, 1)))

# Feature 4 - Numeric
mean = np.mean(X[:,3], axis=0)
sigma  = np.std(X[:,3], axis=0) 
tmp = (X[:,3] - mean) / sigma
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))

# Feature 5 - Numeric
mean = np.mean(X[:,4], axis=0)
sigma  = np.std(X[:,4], axis=0) 
tmp = (X[:,4] - mean) / sigma
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))

# Feature 6 - Binary
for i in range(0, 1000):
    if X[i,5] == 0:
        tmp[i] = 1
        tmp2[i] = 0
    else:
        tmp[i] = 0
        tmp2[i] = 1
        
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp2.reshape(-1, 1)))

# Feature 7 - Nominal
for i in range(0, 1000):
    if X[i,6] == 0:
        tmp[i] = 1
        tmp2[i] = 0
        tmp3[i] = 0
    elif X[i,6] == 1:
        tmp[i] = 0
        tmp2[i] = 1
        tmp3[i] = 0
    else:
        tmp[i] = 0
        tmp2[i] = 0
        tmp3[i] = 1
        
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp2.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp3.reshape(-1, 1)))

# Feature 8 - Numeric
mean = np.mean(X[:,7], axis=0)
sigma  = np.std(X[:,7], axis=0) 
tmp = (X[:,7] - mean) / sigma
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))

# Feature 9 - Binary
for i in range(0, 1000):
    if X[i,8] == 0:
        tmp[i] = 1
        tmp2[i] = 0
    else:
        tmp[i] = 0
        tmp2[i] = 1
        
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp2.reshape(-1, 1)))

# Feature 10 - Numeric
mean = np.mean(X[:,9], axis=0)
sigma  = np.std(X[:,9], axis=0) 
tmp = (X[:,9] - mean) / sigma
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))

# Feature 11 - Nominal
# Using four values because there are 0's in the dataset
for i in range(0, 1000):
    if X[i,10] == 0:
        tmp[i] = 1
        tmp2[i] = 0
        tmp3[i] = 0
        tmp4[i] = 0
    elif X[i,10] == 1:
        tmp[i] = 0
        tmp2[i] = 1
        tmp3[i] = 0
        tmp4[i] = 0
    elif X[i, 10] == 2:
        tmp[i] = 0
        tmp2[i] = 0
        tmp3[i] = 1
        tmp4[i] = 0
    else:
        tmp[i] = 0
        tmp2[i] = 0
        tmp3[i] = 0
        tmp4[i] = 1
        
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp2.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp3.reshape(-1, 1)))
X_new = np.hstack((X_new, tmp4.reshape(-1, 1)))

# Feature 12 - Numeric
mean = np.mean(X[:,11], axis=0)
sigma  = np.std(X[:,11], axis=0) 
tmp = (X[:,11] - mean) / sigma
X_new = np.hstack((X_new, tmp.reshape(-1, 1)))

Now, let's print out the first element of the new, processed variable, `X_new`, and its shape.

In [7]:
# Printing the first element of X, and then printing the shape of X
print ('The first element of X_new is: ', X_new[0])
print ('The shape of X_new is: ' + str(X_new.shape))

The first element of X_new is:  [ 0.21046388  0.          1.          0.          0.          1.
  0.          0.64283287 -2.35271743  1.          0.          0.
  1.          0.          0.04456713  1.          0.          1.50724524
  0.          0.          0.          1.          1.81967847]
The shape of X_new is: (1000, 23)


Now, `X_new` represents the processed data from the dataset. The shape of `X_new` should be `1000, 23`.

### 2.5 - Splitting the Dataset

Now, we are going to split the dataset into three different sets. The first set is the training set. It will contain 60% of the examples (600 examples) and be used to train the models. The second set is the cross validation set that will contain 20% of the examples (200 examples). This set will be used in cross validation to compare the different models to see which performs best. The third set is the test set and will contain the remaining 20% of the examples (200 examples). The test set will be used to test the accuracy of the final model. 

In [8]:
# Create a list of indexes from 0 to 999
arr = list(range(len(y)))

# Calculate the lengths of each set 
data_len = len(arr)
training_len = int(data_len * 0.6) # 60%
cv_len = int(data_len * 0.2) # 20%
# The test set will include the last 20%

# Split the array of indices into different indices for each set
train_indices = arr[:training_len]
cv_indices = arr[training_len:training_len + cv_len]
test_indices = arr[training_len + cv_len:]

# Split the X and y into three different sets: training, cross validation, and testing
X_train = np.array([X_new[i] for i in train_indices])
y_train = np.array([y[i] for i in train_indices])

X_cv = np.array([X_new[i] for i in cv_indices])
y_cv = np.array([y[i] for i in cv_indices])

X_test = np.array([X_new[i] for i in test_indices])
y_test = np.array([y[i] for i in test_indices])

Now, let's see the shapes of the new sets.

In [9]:
print ('The shape of X_train is: ' + str(X_train.shape))
print ('The shape of y_train is: ' + str(y_train.shape))
print ('The shape of X_cv is: ' + str(X_cv.shape))
print ('The shape of y_cv is: ' + str(y_cv.shape))
print ('The shape of X_test is: ' + str(X_test.shape))
print ('The shape of y_test is: ' + str(y_test.shape))

The shape of X_train is: (600, 23)
The shape of y_train is: (600, 1)
The shape of X_cv is: (200, 23)
The shape of y_cv is: (200, 1)
The shape of X_test is: (200, 23)
The shape of y_test is: (200, 1)


## 3 - Model Selection and Implementation

A neural network will be used as the model for this project. The input size is 23 units to match the number of features for each example. The output layer will have 1 unit to represent the output, either heart disease absent (0) or present (1). As for the hidden layers in the middle, three seperate architectures will be tested. One with one hidden layer, one with two, and one with three. A cross validation set will be used to test which model has the best accuracy.

To determine an initial architecture, the regularization and learning rate constants will be 0.1 and 0.001 respectively.

The activation for the hidden layers will be ReLu, or rectified linear unit. This is almost the same as a linear activation, but any value that is below 0 is set to 0.

The output activation will be linear at first. Afterwards, the sigmoid function (shown below) will be applied.

$$ g(z) = \frac{1}{1+e^{-z}} $$

g(z) will then be a value that is between 1 and 0. To interpret that output as a classification, a threshold, $\kappa$, will be used. In this project, $\kappa$ will equal 0.5.

If g(z) $\ge$ $\kappa$, then prediction = 1. If g(z) < $\kappa$, then prediction = 0.

### 3.1 - One Hidden Layer

The first architecture that will be tested is a neural network with 1 hidden layer that consists of 15 units.

<img src="images/Model1.png" style="width:400px;height:300px;">

In [10]:
model1 = Sequential(
    [               
        tf.keras.Input(shape=(23,)),   #specify input size
        tf.keras.layers.Dense(15, activation='relu', kernel_regularizer = tf.keras.regularizers.l2(0.1)),
        tf.keras.layers.Dense(1, activation='linear')
    ], name = "my_model1" 
) 

Next, we will compile the model and specify the loss function and the optimizer.

In [11]:
model1.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits = True),
    optimizer=tf.keras.optimizers.Adam(0.001),
)

model1.fit(
    X_train,y_train,
    epochs=100
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x10fce27abb0>

Then, we will test the accuracy on the training and cross validation sets.

In [12]:
# Calculate and print the training accuracy
logit1train = model1(X_train)
p1train = tf.nn.sigmoid(logit1train)
trainAcc1 = np.mean((p1train>=0.5) == y_train) * 100
print('Train Accuracy: %f'%trainAcc1)

# Calculate and print the cross validation accuracy
logit1cv = model1(X_cv)
p1cv = tf.nn.sigmoid(logit1cv)
cvAcc1 = np.mean((p1cv>=0.5) == y_cv) * 100
print('CV Accuracy: %f'%cvAcc1)

Train Accuracy: 96.166667
CV Accuracy: 96.500000


### 3.2 - Two Hidden Layers

The second architecture that will be tested is a neural network with 2 hidden layers. The first will consist of 15 units and the second will consist of 10 units.

<img src="images/Model2.png" style="width:550px;height:300px;">

In [13]:
model2 = Sequential(
    [               
        tf.keras.Input(shape=(23,)),   #specify input size
        tf.keras.layers.Dense(15, activation='relu', kernel_regularizer = tf.keras.regularizers.l2(0.1)),
        tf.keras.layers.Dense(10, activation='relu', kernel_regularizer = tf.keras.regularizers.l2(0.1)),
        tf.keras.layers.Dense(1, activation='linear')
    ], name = "my_model2" 
) 

Next, we will compile the model and specify the loss function and the optimizer.

In [14]:
model2.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits = True),
    optimizer=tf.keras.optimizers.Adam(0.001),
)

model2.fit(
    X_train,y_train,
    epochs=100
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x10fcf837160>

Then, we will test the accuracy on the training and cross validation sets.

In [15]:
# Calculate and print the training accuracy
logit2train = model2(X_train)
p2train = tf.nn.sigmoid(logit2train)
trainAcc2 = np.mean((p2train>=0.5) == y_train) * 100
print('Train Accuracy: %f'%trainAcc2)

# Calculate and print the cross validation accuracy
logit2cv = model2(X_cv)
p2cv = tf.nn.sigmoid(logit2cv)
cvAcc2 = np.mean((p2cv>=0.5) == y_cv) * 100
print('CV Accuracy: %f'%cvAcc2)

Train Accuracy: 96.000000
CV Accuracy: 96.500000


### 3.3 - Three Hidden Layers

The third architecture that will be tested is a neural network with 3 hidden layers. The first will consist of 15 units, the second will have 10 units, and the third will have 5 units.

<img src="images/Model3.png" style="width:700px;height:300px;">

In [16]:
model3 = Sequential(
    [               
        tf.keras.Input(shape=(23,)),   #specify input size
        tf.keras.layers.Dense(15, activation='relu', kernel_regularizer = tf.keras.regularizers.l2(0.1)),
        tf.keras.layers.Dense(10, activation='relu', kernel_regularizer = tf.keras.regularizers.l2(0.1)),
        tf.keras.layers.Dense(5, activation='relu', kernel_regularizer = tf.keras.regularizers.l2(0.1)),
        tf.keras.layers.Dense(1, activation='linear')
    ], name = "my_model3" 
)                            

Next, we will compile the model and specify the loss function and the optimizer.

In [17]:
model3.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits = True),
    optimizer=tf.keras.optimizers.Adam(0.001),
)

model3.fit(
    X_train,y_train,
    epochs=100
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x10fd0a93520>

Then, we will test the accuracy on the training and cross validation sets.

In [18]:
# Calculate and print the training accuracy
logit3train = model3(X_train)
p3train = tf.nn.sigmoid(logit3train)
trainAcc3 = np.mean((p3train>=0.5) == y_train) * 100
print('Train Accuracy: %f'%trainAcc3)

# Calculate and print the cross validation accuracy
logit3cv = model3(X_cv)
p3cv = tf.nn.sigmoid(logit3cv)
cvAcc3 = np.mean((p3cv>=0.5) == y_cv) * 100
print('CV Accuracy: %f'%cvAcc3)

Train Accuracy: 94.666667
CV Accuracy: 93.500000


### 3.4 - Comparing the Architectures

When running the code, these are the results that I got. I tried to make a fancy plot that would graph the current values, but matplot kept killing my terminal.

| # of Hidden Layers | Training Accuracy | CV Accuracy |
|--------------------|-------------------|-------------|
| 1 Hidden Layer     | 96.1667           | 96.00       |
| 2 Hidden Layers    | 96.00             | 95.00       |
| 3 Hidden Layers    | 95.166            | 95.00       |

Surpisingly, the architecture with only 1 hidden layer performed the best in both training accuracy and cross validation accuracy. There likely was not enough data in the training set to fine tune the parameters in the larger neural networks. However, all of these accuracies are acceptable, so it does not seem like a bug.

## 4 - Regularization and Parameter Tuning

Now that the architecture has been decided, we are going to fine tune the regularization parameter. The Adam optimization algorithm alters the learning rate between each epoch depending on the change in the loss function values. Since the learning rate is dynamically changed, this value does not require tuning.

### Regularization Tuning

This code below will test different regularization parameter values and then output the training and cross validation accuracies for each. Note: plotting is not used because matplot keeps killing my terminal.

In [19]:
lambdas = [0.0, 0.001, 0.01, 0.05, 0.1, 0.2, 0.3]
models=[None] * len(lambdas)
for i in range(len(lambdas)):
    lambda_ = lambdas[i]
    models[i] =  Sequential(
        [
        tf.keras.Input(shape=(23,)),   #specify input size
        tf.keras.layers.Dense(15, activation='relu', kernel_regularizer = tf.keras.regularizers.l2(0.1)),
        tf.keras.layers.Dense(1, activation='linear')
        ]
    )
    models[i].compile(
        loss=tf.keras.losses.BinaryCrossentropy(from_logits = True),
        optimizer=tf.keras.optimizers.Adam(0.001),
    )

    models[i].fit(
        X_train, y_train,
        epochs=100
    )

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 

Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch

Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoc

Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epo

Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Ep

Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
E

### Results and Decision

This code will print out the training and cross valudation accuracies for the prior test.

In [20]:
for i in range(len(lambdas)):
    lambda_ = lambdas[i]
    print(f"Lambda = {lambda_}")
    # Calculate and print the training accuracy
    logitTrain = models[i](X_train)
    pTrain = tf.nn.sigmoid(logitTrain)
    trainAcc = np.mean((pTrain>=0.5) == y_train) * 100
    print('Train Accuracy: %f'%trainAcc)

    # Calculate and print the cross validation accuracy
    logitCV = models[i](X_cv)
    pCV = tf.nn.sigmoid(logitCV)
    cvAcc = np.mean((pCV>=0.5) == y_cv) * 100
    print('CV Accuracy: %f'%cvAcc)
    
    print('\n')

Lambda = 0.0
Train Accuracy: 96.166667
CV Accuracy: 96.500000


Lambda = 0.001
Train Accuracy: 96.000000
CV Accuracy: 95.000000


Lambda = 0.01
Train Accuracy: 96.000000
CV Accuracy: 96.000000


Lambda = 0.05
Train Accuracy: 96.166667
CV Accuracy: 96.500000


Lambda = 0.1
Train Accuracy: 96.166667
CV Accuracy: 96.000000


Lambda = 0.2
Train Accuracy: 96.166667
CV Accuracy: 96.500000


Lambda = 0.3
Train Accuracy: 96.000000
CV Accuracy: 96.500000




The following table shows the results that I achieved. The difference is the magnitude of the difference between the training accuracy and the cross validation accuracy.

| Lambda | Training Accuracy | CV Accuracy | Difference |
|--------|-------------------|-------------|------------|
| 0.0    | 96.1667           | 95.5        | 0.667      |
| 0.001  | 96.1667           | 96.0        | 0.1667     |
| 0.01   | 96.1667           | 96.5        | 0.33       |
| 0.05   | 96.1667           | 96.0        | 0.1667     | 
| 0.1    | 96.1667           | 96.5        | 0.33       |
| 0.2    | 96.33             | 97.0        | 0.667      |
| 0.3    | 96.1667           | 96.5        | 0.33       |

The lambda values with the smallest difference appears to be 0.001 and 0.05. I chose lambda = 0.05 for the regularization parameter because having a larger regularization parameter will reduce the chances that the model is overfitting.

$$ \lambda = 0.05$$

## 5 - Evaluation

First, let's build a fresh model with the decided architecture and regularization parameter.

### 5.1 - Build Model

In [21]:
finalModel =  Sequential(
    [
    tf.keras.Input(shape=(23,)),   #specify input size
    tf.keras.layers.Dense(15, activation='relu', kernel_regularizer = tf.keras.regularizers.l2(0.05)),
    tf.keras.layers.Dense(1, activation='linear')
    ]
)

finalModel.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits = True),
    optimizer=tf.keras.optimizers.Adam(0.001),
)

finalModel.fit(
    X_train, y_train,
    epochs=100
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x10fd4b50f10>

### 5.2 - Calculate Testing Accuracy

Now, let's the accuracy using the test set that was created earlier.

In [22]:
# Calculate and print the testing accuracy
logitTest = finalModel(X_test)
pTest = tf.nn.sigmoid(logitTest)
testAcc = np.mean((pTest>=0.5) == y_test) * 100
print('Test Accuracy: %f'%testAcc)

Test Accuracy: 95.500000


When I ran the code, I had a test accuracy of 95.5.

### 5.3 - Calculate Precision, Recall, and F1 Score

To gain a better understanding of the accuracy of the model, we are going to calculate the precision, recall, and F1 score.

The precision will tell us what fraction of the patients we predicted to have heart disease actually have heart disease. True positives indicate the number of examples that were predicted positive and were actually positive. Predicted Positives is the total number of positive predictions from the model.
$$ precision = \frac{True Positives}{Predicted Positives} $$

The recall will tell us what fraction of the patients that had heart disease were number predicted correctly. Actual Positives refer to the total number of patients that has heart disease in the dataset.
$$ recall = \frac{True Positives}{Actual Positives} $$

The F1 score, also known as the harmonic mean, is a good way to compare precision and recall numbers.
$$ F1 = \frac{2*precision*recall}{precision + recall} $$

In [23]:
# Containers to hold the predicted, actual, and true positives
predictedPositives = 0
actualPositives = 0
truePositives = 0

# Count the number of predicted, actual, and true positives
for i in range(0, len(y_test)):
    prediction = float(pTest[i] >= 0.5)
    if y_test[i] == 1:
        actualPositives += 1
    if prediction == 1:
        predictedPositives += 1
    if y_test[i] == prediction and prediction == 1:
        truePositives += 1
    
# Calculate the precision, recall, and F1 score
precision = truePositives / predictedPositives
recall = truePositives / actualPositives
F1 = 2 * precision * recall / (precision + recall)

# Print the results
print('Precision: %f'%precision)
print('Recall: %f'%recall)
print('F1 Score: %f'%F1)

Precision: 0.967213
Recall: 0.959350
F1 Score: 0.963265


#### Results

When running the code, I got the following values:  
Precision: 0.967213  
Recall: 0.959350  
F1 Score: 0.963265  

This model has a great F1 score. The precision and recall were fairly close too which means that the threshold appears to be set correctly.

## 6 - Analysis

Overall, the model that was chosen worked great on the dataset. Much work was needed to process the data. Z-score normalization or one-hot encoding needed to be applied to each feature in the dataset seperately, depending on categorization of the data. Then, a neural network architecture was decided: 23 inputs for the processed data's features, one hidden layer with 15 units, and an output layer with 1 unit. After, the regularization parameter was tuned to 0.05 to avoid high variance.

The accuracy of the model on a testing set, that is was not trained on, was 95.5%. This is a great accuracy for a model that predicts heart disease. The F1 score was 0.963265 out of 1, so the predictions were not skewed towards predicting positive or negative.

Doctors could use this model to help with their decisions on whether or not a patient may have a problem with their heart. Heart disease can be life threatening, so a model that can accurately predict heart disease has the potential to save lives.