# Heart Disease Prediction

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense

## Heart Disease

Heart disease describe a range of conditions that affect the hear. The most common heart diseases include:

* Blood vessel disease, such as coronary artery disease
* Arrhythmias (heart rhythm problems)
* Congenital (i.e. born-with) heart defects 
* Heart valve disease

Heart disease is the leading cause of death in the United States, causing about 1 in 4 deaths, with the most common types being coronary artery disease (CAD).

## Data set

The [Heart Disease Data Set](http:////archive.ics.uci.edu/ml/datasets/Heart+Disease) of the University of California, Irvine Machine learning repository will be used. This data set contains data of 303 patients concerning heart disease diagnosis that was collected at several locations around the world. There are 76 attributes:

* Age.
* Sex.
* Chest pain type (1: typical angina, 2: atypical angina, 3: non-anginal pain, 4: asymptomatic).
* Resting blood pressure.
* Cholestoral levels.
* Fasting blood sugar.
* Resting electrocardiographic results (0: normal, 1: having ST-T wave abnormality, 2: showing probable or definite left ventricular hypertrophy by Estes' criteria).
* Maximum heart rate achieved.
* Exercise induced angina (1: yes, 0: no).
* ST depression induced by exercise relative to rest.
* Slope of the peak exercise ST segment.
* Number of major vessels (0-3) colored by flourosopy.
* Thal: (3: normal, 6: fixed defect, 7: reversable defect).
* Presence of heart disease (0: no presence, 1-2-3-4: presence).

The target to predict is the diagnosis of the heart disease.

In [10]:
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'class']
df = pd.read_csv(url, names=names)

In [11]:
print(f'Shape: {df.shape}')
df.head()

Shape: (303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,class
0,63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0
1,67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2
2,67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1
3,37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0
4,41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0


## Data Preparation

In [14]:
df = df[~df.isin(['?'])]
df = df.dropna(axis=0)
df = df.apply(pd.to_numeric)

In [27]:
X = np.array(df.drop(['class'], axis=1))
y = np.array(df['class'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

In [28]:
# convert into binary classification problem - heart disease or no heart disease
y_train_binary = y_train.copy()
y_test_binary = y_test.copy()

y_train_binary[y_train_binary > 0] = 1
y_test_binary[y_test_binary > 0] = 1

Y_train_binary = to_categorical(y_train_binary)
Y_test_binary = to_categorical(y_test_binary)

## Neural Network Model

In [56]:
model = Sequential([
    Dense(8, input_dim=13, kernel_initializer='normal', activation='relu'),
    Dense(4, kernel_initializer='normal', activation='relu'),
    Dense(2, activation='softmax'),
])

In [57]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### Train

In [58]:
model.fit(X_train, Y_train_binary, epochs=100, batch_size=10, verbose=0)

<keras.callbacks.History at 0x172781595e0>

### Test

In [59]:
Y_pred = np.round(model.predict(X_test)).astype(int)

print(f'Accuracy: {accuracy_score(Y_test_binary, Y_pred):.2f}')
print(classification_report(Y_test_binary, Y_pred))

Accuracy: 0.88
              precision    recall  f1-score   support

           0       0.90      0.88      0.89        32
           1       0.86      0.89      0.88        28

   micro avg       0.88      0.88      0.88        60
   macro avg       0.88      0.88      0.88        60
weighted avg       0.88      0.88      0.88        60
 samples avg       0.88      0.88      0.88        60

