<a href="https://colab.research.google.com/github/sachinacharyaa/Heart_Disease-Predictor/blob/main/Heart_Disease_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 1.**Problem**

 **Background**

Heart disease is one of the leading causes of death globally. Early detection and intervention are crucial in reducing mortality and improving the quality of life for patients. Medical practitioners use various clinical parameters and diagnostic tests to identify the risk of heart disease.

**Objective**

The primary objective of this project is to develop a machine learning model that can accurately predict the presence of heart disease based on a set of clinical parameters. The model will analyze input features such as age, sex, blood pressure, cholesterol levels, and other relevant metrics to determine the likelihood of a patient having heart disease.

**Dataset** **Description**

Age: Age of the patient in years

Sex: Gender of the patient (1 = male, 0 = female)

Chest pain type: Type of chest pain experienced (values 1, 2, 3, 4)

BP: Resting blood pressure (in mm Hg)

Cholesterol: Serum cholesterol in mg/dl

FBS over 120: Fasting blood sugar > 120 mg/dl (1 = true, 0 = false)

EKG results: Resting electrocardiographic results (values 0, 1, 2)

Max HR: Maximum heart rate achieved

Exercise angina: Exercise-induced angina (1 = yes, 0 = no)

ST depression: ST depression induced by exercise relative to rest

Slope of ST: Slope of the peak exercise ST segment (values 1, 2, 3)

Number of vessels fluro: Number of major vessels (0-3) colored by fluoroscopy

Thallium: Thalassemia (3 = normal; 6 = fixed defect; 7 = reversible defect)

Heart Disease: Presence of heart disease (1 = yes, 0 = no)

In [58]:
#check no of cols and rows in dataset
heart_data.shape

(270, 14)

**Problem** **Statement**

Given the dataset of 270 patients with their respective clinical parameters, the goal is to build a predictive model that can classify whether a patient has heart disease (Heart Disease = 1) or not (Heart Disease = 0). The challenge is to develop a model with high accuracy and reliability to be used as a potential tool for early diagnosis in clinical settings.

 2.**Import all the dependencies** **required**

In [59]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


**Data Collection and Processing**

In [60]:
heart_data = pd.read_csv('/content/Heart_Disease_Prediction.csv')

In [61]:
heart_data.head() #print first 5 row of dataset

Unnamed: 0,Age,Sex,Chest pain type,BP,Cholesterol,FBS over 120,EKG results,Max HR,Exercise angina,ST depression,Slope of ST,Number of vessels fluro,Thallium,Heart Disease
0,70,1,4,130,322,0,2,109,0,2.4,2,3,3,Presence
1,67,0,3,115,564,0,2,160,0,1.6,2,0,7,Absence
2,57,1,2,124,261,0,0,141,0,0.3,1,0,7,Presence
3,64,1,4,128,263,0,0,105,1,0.2,2,1,7,Absence
4,74,0,2,120,269,0,2,121,1,0.2,1,1,3,Absence


In [62]:
heart_data.tail() #print last 5 row of dataset

Unnamed: 0,Age,Sex,Chest pain type,BP,Cholesterol,FBS over 120,EKG results,Max HR,Exercise angina,ST depression,Slope of ST,Number of vessels fluro,Thallium,Heart Disease
265,52,1,3,172,199,1,0,162,0,0.5,1,0,7,Absence
266,44,1,2,120,263,0,0,173,0,0.0,1,0,7,Absence
267,56,0,2,140,294,0,2,153,0,1.3,2,0,3,Absence
268,57,1,4,140,192,0,0,148,0,0.4,2,0,6,Absence
269,67,1,4,160,286,0,2,108,1,1.5,2,3,3,Presence


Info about the data

In [63]:
heart_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 270 entries, 0 to 269
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Age                      270 non-null    int64  
 1   Sex                      270 non-null    int64  
 2   Chest pain type          270 non-null    int64  
 3   BP                       270 non-null    int64  
 4   Cholesterol              270 non-null    int64  
 5   FBS over 120             270 non-null    int64  
 6   EKG results              270 non-null    int64  
 7   Max HR                   270 non-null    int64  
 8   Exercise angina          270 non-null    int64  
 9   ST depression            270 non-null    float64
 10  Slope of ST              270 non-null    int64  
 11  Number of vessels fluro  270 non-null    int64  
 12  Thallium                 270 non-null    int64  
 13  Heart Disease            270 non-null    object 
dtypes: float64(1), int64(12), 

# checking missing vlaues

In [64]:
heart_data.isnull().sum()

Age                        0
Sex                        0
Chest pain type            0
BP                         0
Cholesterol                0
FBS over 120               0
EKG results                0
Max HR                     0
Exercise angina            0
ST depression              0
Slope of ST                0
Number of vessels fluro    0
Thallium                   0
Heart Disease              0
dtype: int64

In [65]:
#statistical measures the data
heart_data.describe()

Unnamed: 0,Age,Sex,Chest pain type,BP,Cholesterol,FBS over 120,EKG results,Max HR,Exercise angina,ST depression,Slope of ST,Number of vessels fluro,Thallium
count,270.0,270.0,270.0,270.0,270.0,270.0,270.0,270.0,270.0,270.0,270.0,270.0,270.0
mean,54.433333,0.677778,3.174074,131.344444,249.659259,0.148148,1.022222,149.677778,0.32963,1.05,1.585185,0.67037,4.696296
std,9.109067,0.468195,0.95009,17.861608,51.686237,0.355906,0.997891,23.165717,0.470952,1.14521,0.61439,0.943896,1.940659
min,29.0,0.0,1.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,1.0,0.0,3.0
25%,48.0,0.0,3.0,120.0,213.0,0.0,0.0,133.0,0.0,0.0,1.0,0.0,3.0
50%,55.0,1.0,3.0,130.0,245.0,0.0,2.0,153.5,0.0,0.8,2.0,0.0,3.0
75%,61.0,1.0,4.0,140.0,280.0,0.0,2.0,166.0,1.0,1.6,2.0,1.0,7.0
max,77.0,1.0,4.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,3.0,3.0,7.0


In [66]:
#checking distribution of target varaible
heart_data['Heart Disease'].value_counts()

Heart Disease
Absence     150
Presence    120
Name: count, dtype: int64

0>>>> Heart-Healthy("heart disease is absence")

1>> Cardio-Challenged("heart disease is presence ")

Spilting the features(age,sex..) and target


In [67]:
x=heart_data.drop(columns='Heart Disease',axis=1)
y=heart_data['Heart Disease']
print(x)

     Age  Sex  Chest pain type   BP  Cholesterol  FBS over 120  EKG results  \
0     70    1                4  130          322             0            2   
1     67    0                3  115          564             0            2   
2     57    1                2  124          261             0            0   
3     64    1                4  128          263             0            0   
4     74    0                2  120          269             0            2   
..   ...  ...              ...  ...          ...           ...          ...   
265   52    1                3  172          199             1            0   
266   44    1                2  120          263             0            0   
267   56    0                2  140          294             0            2   
268   57    1                4  140          192             0            0   
269   67    1                4  160          286             0            2   

     Max HR  Exercise angina  ST depression  Slope 

In [68]:
print(y)

0      Presence
1       Absence
2      Presence
3       Absence
4       Absence
         ...   
265     Absence
266     Absence
267     Absence
268     Absence
269    Presence
Name: Heart Disease, Length: 270, dtype: object


Splitting the data into Training Data & Test Data

In [69]:
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.2,stratify=y,random_state=2)
print(X_train.shape,X_test.shape)

(216, 13) (54, 13)


Model Training


In [70]:
#Logistic regression
model = LogisticRegression()

In [77]:
#training logistic regression  model with training data
model.fit(X_train,Y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Model Evaluation

Acccuracy Score

In [72]:
#accuracy on training data
x_train = model.predict(X_train)
training_data_accuracy = accuracy_score(x_train,Y_train)
print('Accuracy on training data : ',training_data_accuracy)

Accuracy on training data :  0.875


In [73]:
#accuracy on test data
x_test = model.predict(X_test)
testing_data_accuracy = accuracy_score(x_test,Y_test)
print('Accuracy on testing data : ',testing_data_accuracy)

Accuracy on testing data :  0.8333333333333334


Now Predicting System

In [75]:
input_data = (44,1,2,130,219,0,2,188,0,0,1,0,3) # checking from this data
input_data_as_numpy_array= np.asarray(input_data)
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)
prediction = model.predict(input_data_reshaped)
print(prediction)




['Absence']


