<a href="https://colab.research.google.com/github/kirantankala/Heart_Disease_Prediction/blob/main/Heart_Disease_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Importing the Dependencies

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split #split our original data into train and test data
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score#evaluate our model and find accuracy

Data Collection and Processing

In [None]:
# loading the csv data to a Pandas DataFrame
heart_data = pd.read_csv('/content/heart_disease_data.csv')

In [None]:
# print first 5 rows of the dataset
heart_data.head()

Unnamed: 0,age,sex,chest_pain,resting_BP,cholestroal(mg/dl),fasting Blood sugar,resting cardiographic results,maximum heart rate achieved,exercise induced angina,oldpeak,slope,number of major vessels,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [None]:
# print last 5 rows of the dataset
heart_data.tail()

Unnamed: 0,age,sex,chest_pain,resting_BP,cholestroal(mg/dl),fasting Blood sugar,resting cardiographic results,maximum heart rate achieved,exercise induced angina,oldpeak,slope,number of major vessels,thal,target
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
302,57,0,1,130,236,0,0,174,0,0.0,1,1,2,0


In [None]:
# number of rows and columns in the dataset
heart_data.shape

(303, 14)

In [None]:
# getting some info about the data
heart_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   age                            303 non-null    int64  
 1   sex                            303 non-null    int64  
 2   chest_pain                     303 non-null    int64  
 3   resting_BP                     303 non-null    int64  
 4   cholestroal(mg/dl)             303 non-null    int64  
 5   fasting Blood sugar            303 non-null    int64  
 6   resting cardiographic results  303 non-null    int64  
 7   maximum heart rate achieved    303 non-null    int64  
 8   exercise induced angina        303 non-null    int64  
 9   oldpeak                        303 non-null    float64
 10  slope                          303 non-null    int64  
 11  number of major vessels        303 non-null    int64  
 12  thal                           303 non-null    int

In [None]:
# checking for missing values
heart_data.isnull().sum()

Unnamed: 0,0
age,0
sex,0
chest_pain,0
resting_BP,0
cholestroal(mg/dl),0
fasting Blood sugar,0
resting cardiographic results,0
maximum heart rate achieved,0
exercise induced angina,0
oldpeak,0


In [None]:
# statistical measures about the data
heart_data.describe()

Unnamed: 0,age,sex,chest_pain,resting_BP,cholestroal(mg/dl),fasting Blood sugar,resting cardiographic results,maximum heart rate achieved,exercise induced angina,oldpeak,slope,number of major vessels,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [None]:
# checking the distribution of Target Variable
heart_data['target'].value_counts() #how many are effected and how many are not

Unnamed: 0_level_0,count
target,Unnamed: 1_level_1
1,165
0,138


1 --> Defective Heart

0 --> Healthy Heart

here the data is uniform so we dont need to do pre-processing of the data to get uniformity

Splitting the Features and Target

In [None]:
X = heart_data.drop(columns='target', axis=1)
Y = heart_data['target']

In [None]:
print(X) #this the data without the target

     age  sex  chest_pain  resting_BP  cholestroal(mg/dl)  \
0     63    1           3         145                 233   
1     37    1           2         130                 250   
2     41    0           1         130                 204   
3     56    1           1         120                 236   
4     57    0           0         120                 354   
..   ...  ...         ...         ...                 ...   
298   57    0           0         140                 241   
299   45    1           3         110                 264   
300   68    1           0         144                 193   
301   57    1           0         130                 131   
302   57    0           1         130                 236   

     fasting Blood sugar  resting cardiographic results  \
0                      1                              0   
1                      0                              1   
2                      0                              0   
3                      0       

In [None]:
print(Y) #data of target

0      1
1      1
2      1
3      1
4      1
      ..
298    0
299    0
300    0
301    0
302    0
Name: target, Length: 303, dtype: int64


Splitting the Data into Training data & Test Data

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)
#splitting the X into train and test data as X_train and X_test
#splitting test data into Y_train for the X_train and Y_test for X_test
#0.2 represents we want 20% of X as test data and 80% go to training data
#stratify will split Y data evenly so that both train and test get equal number of 1 and 0

In [None]:
print(X.shape, X_train.shape, X_test.shape)

(303, 13) (242, 13) (61, 13)


In [None]:
print(Y.shape, Y_train.shape, Y_test.shape)

(303,) (242,) (61,)


Model Training

Logistic Regression

In [None]:
model = LogisticRegression()
#binary classification
#true or false

In [None]:
# training the LogisticRegression model with Training data
model.fit(X_train, Y_train) #try to fit X_train and Y_train
#this will find the features in X_train and find the corresponding targets in Y_train

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Model Evaluation

we use Accuracy Score as accuracy metric
the model will be asked to predict the target and this predicted value will be compared to the original target values

In [None]:
# accuracy on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

In [None]:
print('Accuracy on Training data : ', training_data_accuracy)

Accuracy on Training data :  0.8512396694214877


In [None]:
# accuracy on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

In [None]:
print('Accuracy on Test data : ', test_data_accuracy)

Accuracy on Test data :  0.819672131147541


Building a Predictive System

In [None]:
input_data = (36,1,2,130,250,0,1,187,0,3.5,0,0,2)

# change the input data to a numpy array
input_data_as_numpy_array= np.array(input_data)

# reshape the numpy array as we are predicting for only on instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)
print(prediction)

if (prediction[0]== 0):
  print('The Person does not have a Heart Disease')
else:
  print('The Person has Heart Disease')

[1]
The Person has Heart Disease


