# Heart Disease UCI Data Preprocessing -Neural Network
Raw dataset was downloaded from Kaggle. The target is to build a machine learnign model to predict whether there is a chance of heart attack based on the different parameters. In this document I will use the Neural Network to learn the preprocessed dataset.
## 1. Load the data
This part I would like to import the data sets created when running the logistic regression. The inputs will be stored as float data type while the targets will be saved as interger data type.

In [1]:
import numpy as np
import tensorflow as tf

In [2]:
# A temporary variable npz is used for storing the assigning relevant inputs and targets
npz = np.load('Heart_data_train.npz')
train_inputs, train_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

npz = np.load('Heart_data_validation.npz')
validation_inputs, validation_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

npz = np.load('Heart_data_test.npz')
test_inputs, test_targets = npz['inputs'].astype(np.float), npz['targets'].astype(np.int)

# 2. Build the model
In this part, I would use the tf.keras.layers.Dense to build the model. The hyperparameters to tune include the hidden layer size, the number of layers, the activation funcation and the batch size. After tuning the model, a 6 layer (5 hidden layer) neural network is chosen for the model, with batch size set to 5, and hidden layer size to be 20. The accuracy for training set is about 0.89, and the accuracy of the validation set is 0.84. 

In [3]:
# Set the output size and the hidden layer size
output_size = 1
# Just choose the same size for all the hidden layers
hidden_layer_size = 10
    
# define the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 3rd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 4th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 5th hidden layer
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='sigmoid') # output layer
])


### Choose the "adam" optimizer and the "sparse_categorical_crossentropy" loss function, use 'accuracy' as the metrics
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

### Training
# set the batch size
batch_size = 5

# set a maximum number of training epochs
max_epochs = 100

# set an early stopping mechanism, use patience=2
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

# fit the model
model.fit(train_inputs, # train inputs
          train_targets, # train targets
          batch_size=batch_size, # batch size
          epochs=max_epochs, # epochs(assuming early stopping doesn't kick in)
          # callbacks are functions called by a task when a task is completed
          # task here is to check if val_loss is increasing
          callbacks=[early_stopping], # early stopping
          validation_data=(validation_inputs, validation_targets), # validation data
          verbose = 2 # making sure we get enough information about the training process
          )  

Train on 181 samples, validate on 61 samples
Epoch 1/100
181/181 - 1s - loss: 0.6766 - accuracy: 0.6354 - val_loss: 0.6638 - val_accuracy: 0.7377
Epoch 2/100
181/181 - 0s - loss: 0.6500 - accuracy: 0.7017 - val_loss: 0.6267 - val_accuracy: 0.7705
Epoch 3/100
181/181 - 0s - loss: 0.6078 - accuracy: 0.7293 - val_loss: 0.5621 - val_accuracy: 0.7869
Epoch 4/100
181/181 - 0s - loss: 0.5437 - accuracy: 0.7790 - val_loss: 0.4769 - val_accuracy: 0.8197
Epoch 5/100
181/181 - 0s - loss: 0.4807 - accuracy: 0.8066 - val_loss: 0.4244 - val_accuracy: 0.8197
Epoch 6/100
181/181 - 0s - loss: 0.4314 - accuracy: 0.8343 - val_loss: 0.4149 - val_accuracy: 0.8197
Epoch 7/100
181/181 - 0s - loss: 0.3962 - accuracy: 0.8508 - val_loss: 0.4102 - val_accuracy: 0.8197
Epoch 8/100
181/181 - 0s - loss: 0.3772 - accuracy: 0.8729 - val_loss: 0.4090 - val_accuracy: 0.8197
Epoch 9/100
181/181 - 0s - loss: 0.3539 - accuracy: 0.8674 - val_loss: 0.4148 - val_accuracy: 0.8361
Epoch 10/100
181/181 - 0s - loss: 0.3373 - acc

<tensorflow.python.keras.callbacks.History at 0x1c3e7192388>

## 3. Test the accuracy
In this part the accuracy of the test set is calculated as 0.77, sklightly better than the logistic regression (0.75).

In [4]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)

