# Harnessing Machine Learning for Healthcare: Predicting Death Events of Patients with Prior Heart Failure

One of machine learning's most significant applications is diagnosing and predicting future outcomes of patients, given their demographics and medical histories. By harnessing the data to classify patients as "high risk", healthcare professionals can administer more specialized care and implement proactive measures to save the lives of many.

This project aims to analyze the characteristics of 299 patients who have experienced heart failure. The target variable of the dataset is a death event, denoting whether a patient died during the follow-up period. Patient features include demographics like age and sex, along with medical statistics like CPK level, platelet count, and etc. Following the exploratory data analysis (EDA), modeling will be performed using a multitude of different machine learning models to compare and achieve the highest accuracy. Lastly, a dashboard will be implemented to showcase key statistics, modeling outcomes, and predictions on input patient statistics.

<input type="checkbox"/><label for="checkbox"> EDA</label><br>
<input type="checkbox"/><label for="checkbox"> Modeling</label><br>
<input type="checkbox"/><label for="checkbox"> Dashboard</label><br>

In [1]:
__author__ = 'Jared Paul Guevara'

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

train = pd.read_csv('data/heart_failure_train.csv')
train.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,death_event
0,75.0,0,582,0,45,1,263358.03,1.18,137,1,0,87,0
1,65.0,0,113,1,25,0,497000.0,1.83,135,1,0,67,1
2,85.0,0,23,0,45,0,360000.0,3.0,132,1,0,28,1
3,70.0,1,171,0,60,1,176000.0,1.1,145,1,1,146,0
4,75.0,1,582,0,30,0,225000.0,1.83,134,1,0,113,1


In [12]:
X_train = train.iloc[:, :-1]
display(X_train.head())

y_train = train.iloc[:, -1]
y_train.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time
0,75.0,0,582,0,45,1,263358.03,1.18,137,1,0,87
1,65.0,0,113,1,25,0,497000.0,1.83,135,1,0,67
2,85.0,0,23,0,45,0,360000.0,3.0,132,1,0,28
3,70.0,1,171,0,60,1,176000.0,1.1,145,1,1,146
4,75.0,1,582,0,30,0,225000.0,1.83,134,1,0,113


0    0
1    1
2    1
3    0
4    1
Name: death_event, dtype: int64

In [None]:
log_pipe = make_pipeline(StandardScaler(), LogisticRegression())
svc_pipe = make_pipeline(StandardScaler(), SVC())
rfc_pipe = make_pipeline(StandardScaler(), RandomForestClassifier())

