# Using Machine Learning to predict heart disease

Here, we are going to train a Logistic Regression model to predict coronary heart disease (CHD) in 10 years using systolic blood pressure (sysBP) and diastolic blood pressure (diaBP) data.

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv("par_disease.csv")

In [None]:
df = df[["sysBP", "diaBP", "TenYearCHD"]]
df

Removing the NaN:

In [None]:
df = df.dropna()

In [None]:
df.reset_index()

In [None]:
df.info()

Creating our training data:

In [None]:
data_elements = set(df.columns) - set(["TenYearCHD"])

In [None]:
X = df[["sysBP", "diaBP"]]
X

In [None]:
Y = df["TenYearCHD"]
Y

Dividing between training data and testing data:

Training the model:

In [None]:
from sklearn import linear_model

In [None]:
log_reg_chd = linear_model.LogisticRegression()

In [None]:
log_reg_chd.fit(X_train,Y_train)

Let's predict:

In [None]:
prediction = log_reg_chd.predict(X_test)

Let's see a very likely person to have CHD in ten years:

In [None]:
X_test[X_test["sysBP"] > 200]

In [None]:
pred_individual = log_reg_chd.predict(X[1189:1190])
pred_individual

Right on point!

Let's see the accuracy:

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2)

In [None]:
accuracy_score(Y_test, prediction)

86,67 % of accuracy

Plotting the data points to see the data boundaries:

In [None]:
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

In [None]:
sns.relplot(data = df.reset_index(), x = "diaBP", y = "sysBP", hue = "TenYearCHD")

Although the data boundary is not linear, it was possible to have 86,67 % of accuracy!

Trying another model: MLP

In [None]:
from sklearn.neural_network import MLPClassifier

In [None]:
mlp_chd = MLPClassifier(hidden_layer_sizes = (3,3), random_state = 5, learning_rate_init = 0.01, activation = "logistic", verbose = True)

In [None]:
mlp_chd.fit(X_train, Y_train)

In [None]:
mlp_prediction = mlp_chd.predict(X_test)

In [None]:
accuracy_score(Y_test, mlp_prediction)

87 % of accuracy, which reflects that non-linear data boundaries are more suitable for MLP than for LR