<a href="https://colab.research.google.com/github/sinthumerlin96/python/blob/main/Heart_Disease_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Heart Disease Prediction Using Machine Learning
Project Overview

This project uses Machine Learning to predict the likelihood of heart disease in patients based on medical parameters.
It leverages Logistic Regression and can be extended to other classification algorithms.
The goal is to assist healthcare professionals in early detection of heart disease.

**Dataset**

Source: heart.csv (303 patients, 13 medical features + 1 target column)

Features include:

age: Age of patient

sex: Male = 1, Female = 0

cp: Chest pain type

trestbps: Resting blood pressure

chol: Serum cholesterol

fbs: Fasting blood sugar > 120 mg/dl

restecg: Resting electrocardiographic results

thalach: Maximum heart rate achieved

exang: Exercise-induced angina

oldpeak: ST depression induced by exercise

slope: Slope of the peak exercise ST segment

ca: Number of major vessels colored by fluoroscopy

thal: Thalassemia

Target: target (1 = Heart Disease, 0 = Healthy)

**Data Preprocessing**

Checked for null values (none present)

Split dataset into features (X) and target (Y)

Split data into training (80%) and testing (20%) sets

**Model Used**

Logistic Regression (classic classification algorithm)

Model trained on training data

Performance metric: Accuracy

Training data accuracy: ~85%

**Key Features**

Predicts heart disease using 13 medical features

Can be extended with other ML models (Random Forest, SVM, Neural Networks)

Ready to integrate with web dashboards or AI assistants

Future Enhancements

Integrate Claude/OpenAI API for automated report summaries

Deploy as a web application using Streamlit / Flask / FastAPI

Add feature importance analysis to explain predictions

Include additional datasets for improved accuracy

Technologies Used

Python 3

Pandas & NumPy

Scikit-learn (Logistic Regression, train/test split, accuracy_score)

**Project Outcome**

This project can assist healthcare providers in early heart disease detection, potentially saving lives by identifying high-risk patients before symptoms worsen.

In [27]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [28]:
heart_data =pd.read_csv('heart.csv')

In [29]:
heart_data.tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
302,57,0,1,130,236,0,0,174,0,0.0,1,1,2,0


In [30]:
heart_data.shape

(303, 14)

In [31]:
heart_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


In [32]:
heart_data.isnull().sum()

Unnamed: 0,0
age,0
sex,0
cp,0
trestbps,0
chol,0
fbs,0
restecg,0
thalach,0
exang,0
oldpeak,0


In [33]:
heart_data['target'].value_counts()

Unnamed: 0_level_0,count
target,Unnamed: 1_level_1
1,165
0,138


1 - Defective Heart
0 - Healthy heart

In [34]:
X =heart_data.drop(columns='target', axis=1)
Y =heart_data['target']

In [35]:
print(X)

     age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  \
0     63    1   3       145   233    1        0      150      0      2.3   
1     37    1   2       130   250    0        1      187      0      3.5   
2     41    0   1       130   204    0        0      172      0      1.4   
3     56    1   1       120   236    0        1      178      0      0.8   
4     57    0   0       120   354    0        1      163      1      0.6   
..   ...  ...  ..       ...   ...  ...      ...      ...    ...      ...   
298   57    0   0       140   241    0        1      123      1      0.2   
299   45    1   3       110   264    0        1      132      0      1.2   
300   68    1   0       144   193    1        1      141      0      3.4   
301   57    1   0       130   131    0        1      115      1      1.2   
302   57    0   1       130   236    0        0      174      0      0.0   

     slope  ca  thal  
0        0   0     1  
1        0   0     2  
2        2   0    

In [36]:
print(Y)

0      1
1      1
2      1
3      1
4      1
      ..
298    0
299    0
300    0
301    0
302    0
Name: target, Length: 303, dtype: int64


In [37]:
X_train, X_test,Y_train,Y_test= train_test_split(X,Y,test_size=0.2,stratify=Y , random_state=2)

In [38]:
print(X.shape,X_train.shape,X_test.shape)

(303, 13) (242, 13) (61, 13)


In [39]:
model = LogisticRegression()

In [40]:
model.fit(X_train,Y_train)

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [41]:
X_train_prediction = model.predict(X_train)
traning_data_accuracy = accuracy_score(X_train_prediction, Y_train)

In [42]:
print('Accuracy_on Traning data:', traning_data_accuracy)

Accuracy_on Traning data: 0.8512396694214877


In [45]:
imput_data = (62,0,0,140,268,0,0,160,0,3.6,0,2,2)
imput_data_as_numpy_arry =np.asarray(imput_data)
imput_data_reshaped = imput_data_as_numpy_arry.reshape(1,-1)
prediction = model.predict(imput_data_reshaped)
print(prediction)

if (prediction[0]==0):
    print('The Person does not have a Heart Disease')
else:
    print('The Person has Heart Disease')

[0]
The Person does not have a Heart Disease


