# Applying Logistic Regression

Now, I will train a machine learning model for the task of heart disease prediction. I will use the logistic regression algorithm as I mentioned at the beginning of the article. 

But before training the model I will first define a helper function for printing the classification report of the performance of the machine learning model:

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np 


%matplotlib inline
sns.set_style("whitegrid")

df_processed = pd.read_csv("/PROJECTS/Data_Science/heart_disease_prediction/data/processed/heart_processed.csv")


In [3]:
import pandas as pd
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

def print_score(clf, X_train, y_train, X_test, y_test, train=True):
    if train:
        pred = clf.predict(X_train)
        clf_report = pd.DataFrame(classification_report(y_train, pred, output_dict=True))
        print("Train Result:\n================================================")
        print(f"Accuracy Score: {accuracy_score(y_train, pred) * 100:.2f}%")
        print("_______________________________________________")
        print(f"CLASSIFICATION REPORT:\n{clf_report}")
        print("_______________________________________________")
        print(f"Confusion Matrix: \n {confusion_matrix(y_train, pred)}\n")
        
    elif train==False:
        pred = clf.predict(X_test)
        clf_report = pd.DataFrame(classification_report(y_test, pred, output_dict=True))
        print("Test Result:\n================================================")        
        print(f"Accuracy Score: {accuracy_score(y_test, pred) * 100:.2f}%")
        print("_______________________________________________")
        print(f"CLASSIFICATION REPORT:\n{clf_report}")
        print("_______________________________________________")
        print(f"Confusion Matrix: \n {confusion_matrix(y_test, pred)}\n")

Now let’s split the data into training and test sets. I will split the data into 70% training and 30% testing:



In [4]:
from sklearn.model_selection import train_test_split

X = df_processed.drop('target', axis=1)
y = df_processed.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Now let’s train the machine learning model and print the classification report of our logistic regression model:

In [5]:
from sklearn.linear_model import LogisticRegression

lr_clf = LogisticRegression(solver='liblinear')
lr_clf.fit(X_train, y_train)

print_score(lr_clf, X_train, y_train, X_test, y_test, train=True)
print_score(lr_clf, X_train, y_train, X_test, y_test, train=False)

Train Result:
Accuracy Score: 87.26%
_______________________________________________
CLASSIFICATION REPORT:
                   0           1  accuracy   macro avg  weighted avg
precision   0.888889    0.860656  0.872642    0.874772      0.873574
recall      0.824742    0.913043  0.872642    0.868893      0.872642
f1-score    0.855615    0.886076  0.872642    0.870845      0.872139
support    97.000000  115.000000  0.872642  212.000000    212.000000
_______________________________________________
Confusion Matrix: 
 [[ 80  17]
 [ 10 105]]

Test Result:
Accuracy Score: 81.32%
_______________________________________________
CLASSIFICATION REPORT:
                   0          1  accuracy  macro avg  weighted avg
precision   0.800000   0.823529  0.813187   0.811765      0.812928
recall      0.780488   0.840000  0.813187   0.810244      0.813187
f1-score    0.790123   0.831683  0.813187   0.810903      0.812958
support    41.000000  50.000000  0.813187  91.000000     91.000000
_____________

In [6]:
test_score = accuracy_score(y_test, lr_clf.predict(X_test)) * 100
train_score = accuracy_score(y_train, lr_clf.predict(X_train)) * 100

results_df = pd.DataFrame(data=[["Logistic Regression", train_score, test_score]], 
                          columns=['Model', 'Training Accuracy %', 'Testing Accuracy %'])
results_df

Unnamed: 0,Model,Training Accuracy %,Testing Accuracy %
0,Logistic Regression,87.264151,81.318681


The model's consistent performance across both training and test sets bodes well for its real-world application. This high degree of generalizability translates to reliable predictions on unseen data, a key requirement for accurate heart disease prediction.

I hope this article on machine learning-powered heart disease prediction has been informative. Please don't hesitate to share your valuable questions in the comments below!