# Heart Failure Prediction

Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Heart failure is a common event caused by CVDs.

In this project, I will develop a model that can be used to predict mortality by heart failure based on 12 health-related features. The dataset is obtained from [Kaggle](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data). 

In [53]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.compose import ColumnTransformer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import Sequential
from tensorflow.keras.layers import InputLayer
from tensorflow.keras.layers import Dense
from sklearn.metrics import classification_report

## 1. Loading the Data

In [22]:
data = pd.read_csv('heart_failure.csv')

In [23]:
#print all columns and their types
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   age                       299 non-null    float64
 1   anaemia                   299 non-null    int64  
 2   creatinine_phosphokinase  299 non-null    int64  
 3   diabetes                  299 non-null    int64  
 4   ejection_fraction         299 non-null    int64  
 5   high_blood_pressure       299 non-null    int64  
 6   platelets                 299 non-null    float64
 7   serum_creatinine          299 non-null    float64
 8   serum_sodium              299 non-null    int64  
 9   sex                       299 non-null    int64  
 10  smoking                   299 non-null    int64  
 11  time                      299 non-null    int64  
 12  DEATH_EVENT               299 non-null    int64  
dtypes: float64(3), int64(10)
memory usage: 30.5 KB


In [24]:
#print distribution of the 'DEATH EVENT' using collection.counter
print('Classes and number of values in the dataset', Counter(data['DEATH_EVENT']))

Classes and number of values in the dataset Counter({0: 203, 1: 96})


In [25]:
#extract the label column
y = data['DEATH_EVENT']

#extract the features columns
x = data[['age','anaemia','creatinine_phosphokinase','diabetes','ejection_fraction','high_blood_pressure','platelets','serum_creatinine','serum_sodium','sex','smoking','time']]

## 2. Preprocessing the Data

In [26]:
#convert the categorical variables into one-hot encoding vectors
x = pd.get_dummies(x)

#split the data into train and test data
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.33, random_state = 42)

#standardize the numeric data
ct = ColumnTransformer([("numeric",StandardScaler(),['age','creatinine_phosphokinase','ejection_fraction','platelets','serum_creatinine','serum_sodium','time'])])

#train the scaler instance on the training and testdata
x_train = ct.fit_transform(x_train)
x_test = ct.fit_transform(x_test)

## 3. Preparing labels for classification

In [27]:
#initialize an instance of LabelEncoder
le = LabelEncoder()

#fit the encoder instance to the training and test labels
y_train = le.fit_transform(y_train.astype(str))
y_test = le.transform(y_test.astype(str))

#transform the encoded training and test labels into a binary vector
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

## 4. Designing the Model

In [47]:
#create a model instance
model = Sequential()

#create an input layer and add it to the model
model.add(InputLayer(input_shape= (x_train.shape[1],)))

#create a hidden layer and add it to the model
model.add(Dense(12,activation ='relu'))

#create a output layer and add it to the model
model.add(Dense(2,activation = 'softmax'))

#compile the model
model.compile(loss = 'categorical_crossentropy',optimizer = 'adam', metrics = ['accuracy'])

## 5. Train and evaluate the model

In [48]:
model.fit(x_train,y_train,epochs = 100, batch_size = 16,verbose =1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<tensorflow.python.keras.callbacks.History at 0x7fc75fdb1bb0>

In [51]:
#evaluate the model
loss , accu = model.evaluate(x_test,y_test,verbose=0)
print('Loss: ' , loss, 'Accuracy: ',accu)

Loss:  0.5971583127975464 Accuracy:  0.7575757503509521


## 6. Generating a Classification Report 

In [62]:
#predict for the test data
y_estimate = model.predict(x_test)
y_estimate

array([[6.77040279e-01, 3.22959721e-01],
       [9.99863863e-01, 1.36083690e-04],
       [9.01164353e-01, 9.88356397e-02],
       [1.24361210e-01, 8.75638783e-01],
       [9.82932448e-01, 1.70675907e-02],
       [9.98566687e-01, 1.43330148e-03],
       [3.47249478e-01, 6.52750552e-01],
       [9.30057168e-01, 6.99427649e-02],
       [1.19142003e-01, 8.80858004e-01],
       [9.87243176e-01, 1.27568459e-02],
       [9.50004280e-01, 4.99956757e-02],
       [9.46313143e-01, 5.36868982e-02],
       [9.63024080e-01, 3.69758718e-02],
       [9.63236690e-01, 3.67632844e-02],
       [8.66636872e-01, 1.33363113e-01],
       [7.05428958e-01, 2.94571012e-01],
       [9.44984317e-01, 5.50157018e-02],
       [6.95940614e-01, 3.04059386e-01],
       [9.40160334e-01, 5.98396398e-02],
       [5.71161807e-01, 4.28838223e-01],
       [7.74347842e-01, 2.25652143e-01],
       [6.33207738e-01, 3.66792291e-01],
       [7.08959460e-01, 2.91040570e-01],
       [2.55182415e-01, 7.44817615e-01],
       [2.867381

In [63]:
#select the indices of the true classes for each label encoding 
y_true = np.argmax(y_test,axis =1 )
y_true

array([0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0,
       0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1,
       1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1])

In [None]:

#print a classification report
#print(classification_report(y_test,y_estimate))