# Heart Disease Detection

## 1.Data Preprocessing

In [9]:
# Standard Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
import pickle
from sklearn.model_selection import cross_val_score

In [2]:
# getting the data ready
heart_disease= pd.read_csv("datasets/heart-disease.csv")
heart_disease.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


age: The person’s age in years

sex: The person’s sex (1 = male, 0 = female)

cp: chest pain type

— Value 0: asymptomatic

— Value 1: atypical angina

— Value 2: non-anginal pain

— Value 3: typical angina

trestbps: The person’s resting blood pressure (mm Hg on admission to the hospital)

chol: The person’s cholesterol measurement in mg/dl

fbs: The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)

restecg: resting electrocardiographic results

— Value 0: showing probable or definite left ventricular hypertrophy by Estes’ criteria

— Value 1: normal

— Value 2: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)

thalach: The person’s maximum heart rate achieved

exang: Exercise induced angina (1 = yes; 0 = no)

oldpeak: ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot. See more here)

slope: the slope of the peak exercise ST segment — 0: downsloping; 1: flat; 2: upsloping

0: downsloping; 1: flat; 2: upsloping

ca: The number of major vessels (0–3)

thal: A blood disorder called thalassemia Value 0: NULL (dropped from the dataset previously

Value 1: fixed defect (no blood flow in some part of the heart)

Value 2: normal blood flow

Value 3: reversible defect (a blood flow is observed but it is not normal)

target: Heart disease (1 = no, 0= yes)

In [3]:
heart_disease.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [4]:
heart_disease.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


Here, we can see that all the values are in numerical format

which is good for us...

In [5]:
# data droping if null
heart_disease.dropna(inplace=True)

In [6]:
# creating X(Feature set) and Y(Target Set)
x=heart_disease.drop("target", axis=1)
y=heart_disease["target"]

In [7]:
# spliting data into training and testing parts
np.random.seed(45)
x_train,x_test,y_train,y_test=train_test_split(x,y, test_size=0.2)

In [8]:
x_train.shape,x_test.shape,y_train.shape,y_test.shape

((242, 13), (61, 13), (242,), (61,))

## 2.Model Training

This is a classification Problem so I have picked Random Forest Classifier Algorithm to solve it.

In [30]:
np.random.seed(45)
a,b=0,0
ii=0
for i in range (10,150,10):
    print(f"Trying the model with {i} n_estimator...")
    clf = RandomForestClassifier(n_estimators=i).fit(x_train,y_train)
    a=np.mean(cross_val_score(clf,x,y,cv=10))
    print(f"Model accuracy by 10 fold cross validation is {a*100:.2f}%")
    print(f"Model single accuracy is {clf.score(x_test,y_test)*100:.2f}%")
    if a>b:
        b=a
        pickle.dump(clf, open("model.pkl","wb"))
        ii=i
    print("")
print(f"\nHighest accuracy model with {ii} n_estimator Saved")

Trying the model with 10 n_estimator...
Model accuracy by 10 fold cross validation is 80.86%
Model single accuracy 77.05%

Trying the model with 20 n_estimator...
Model accuracy by 10 fold cross validation is 79.88%
Model single accuracy 75.41%

Trying the model with 30 n_estimator...
Model accuracy by 10 fold cross validation is 81.83%
Model single accuracy 81.97%

Trying the model with 40 n_estimator...
Model accuracy by 10 fold cross validation is 80.81%
Model single accuracy 86.89%

Trying the model with 50 n_estimator...
Model accuracy by 10 fold cross validation is 82.81%
Model single accuracy 90.16%

Trying the model with 60 n_estimator...
Model accuracy by 10 fold cross validation is 82.83%
Model single accuracy 83.61%

Trying the model with 70 n_estimator...
Model accuracy by 10 fold cross validation is 83.14%
Model single accuracy 86.89%

Trying the model with 80 n_estimator...
Model accuracy by 10 fold cross validation is 81.80%
Model single accuracy 83.61%

Trying the model

In [31]:
# Lets Evaluate the saved model single score and cross val score
np.random.seed(45)
loaded_model=pickle.load(open("model.pkl","rb"))
print(f"The single score is {loaded_model.score(x_test,y_test)*100:.2f}%")
print(f"The 10 fold Cross val score is {np.mean(cross_val_score(loaded_model,x,y,cv=10))*100:.2f}%")

The single score is 85.25%
The 10 fold Cross val score is 83.15%


### Run standard import cell and then the below cells to get faster result

In [32]:
# getting the data ready
heart_disease= pd.read_csv("datasets/heart-disease.csv")
heart_disease.dropna(inplace=True)
x=heart_disease.drop("target", axis=1)
y=heart_disease["target"]

In [33]:
np.random.seed(45)
x_train,x_test,y_train,y_test=train_test_split(x,y, test_size=0.2)

In [34]:
loaded_model=pickle.load(open("model.pkl","rb"))
print("Model Accuracy:",round(loaded_model.score(x_test,y_test)*100,2),"%")

Model Accuracy: 85.25 %


In [35]:
y_label=loaded_model.predict(x_test)

In [36]:
# classification Report
print(classification_report(y_test,y_label))

              precision    recall  f1-score   support

           0       0.75      0.91      0.82        23
           1       0.94      0.82      0.87        38

    accuracy                           0.85        61
   macro avg       0.84      0.86      0.85        61
weighted avg       0.87      0.85      0.85        61



In [37]:
# confusion_matrix
print(confusion_matrix(y_test,y_label))

[[21  2]
 [ 7 31]]


![WhatsApp%20Image%202022-03-07%20at%203.26.27%20AM.jpeg](attachment:WhatsApp%20Image%202022-03-07%20at%203.26.27%20AM.jpeg)

### Confusion Metrics
From our confusion matrix, we can calculate five different metrics measuring the validity of our model.

* Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN

* Misclassification (all incorrect / all) = FP + FN / TP + TN + FP + FN

* Precision (true positives / predicted positives) = TP / TP + FP

* Sensitivity aka Recall (true positives / all actual positives) = TP / TP + FN

* Specificity (true negatives / all actual negatives) =TN / TN + FP

## 3.Deployment 

In [48]:
def heart_disease_check():
    vage=int(input("Enter the patient age:"))
    vsex=int(input("Press 1 if you are male and 0 for female:"))
    vcp=int(input('''cp: chest pain type

    — Value 0: asymptomatic

    — Value 1: atypical angina

    — Value 2: non-anginal pain

    — Value 3: typical angina

    Enter Value Here:'''))

    vtrestbps=int(input("Enter resting blood pressure (mm Hg on admission to the hospital):"))

    vchol=int(input("Enter cholesterol measurement in mg/dl:"))

    vfbs=int(input("fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false):"))

    vrestecg=int(input('''resting electrocardiographic results

    — Value 0: showing probable or definite left ventricular hypertrophy by Estes’ criteria

    — Value 1: normal

    — Value 2: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    Enter Value Here:'''))

    vthalach=int(input("Enter maximum heart rate achieved:"))

    vexang=int(input("Exercise induced angina (1 = yes; 0 = no):"))

    voldpeak=float(input("ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot):"))

    vslope=int(input("the slope of the peak exercise ST segment — 0: downsloping; 1: flat; 2: upsloping:"))

    vca=int(input("The number of major vessels (0–3):"))

    vthal=int(input("A blood disorder called thalassemia Value (0: NULL):"))

    dicto={"age":[vage],"sex":[vsex],"cp":[vcp],"trestbps":[vtrestbps],"chol":[vchol],"fbs":[vfbs],"restecg":[vrestecg],"thalach":[vthalach],"exang":[vexang],"oldpeak":[voldpeak],"slope":[vslope],"ca":[vca],"thal":[vthal]}
    df=pd.DataFrame(dicto)
    df
    y_pred=loaded_model.predict(df)
    y_pred
    print("\n **The result is** \n")
    if (y_pred[0]==1):
        print("This Person has a Heart Disease..")
    else:
        print("This person has no Heart Disease...")

In [50]:
heart_disease_check()

Enter the patient age:60
Press 1 if you are male and 0 for female:1
cp: chest pain type

    — Value 0: asymptomatic

    — Value 1: atypical angina

    — Value 2: non-anginal pain

    — Value 3: typical angina

    Enter Value Here:3
Enter resting blood pressure (mm Hg on admission to the hospital):172
Enter cholesterol measurement in mg/dl:200
fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false):1
resting electrocardiographic results

    — Value 0: showing probable or definite left ventricular hypertrophy by Estes’ criteria

    — Value 1: normal

    — Value 2: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    Enter Value Here:1
Enter maximum heart rate achieved:210
Exercise induced angina (1 = yes; 0 = no):0
ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot):2
the slope of the peak exercise ST segment — 0: downsloping; 1: flat; 2: upsloping:2
The number of major vessels (0–3):3
A bloo