<a href="https://colab.research.google.com/github/veronicalimpooikhoon/ITI103/blob/main/Heart_Prediction_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Heart Prediction Model**

1. Import all the libraries

In [1]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import RFE

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

import pickle

import warnings
warnings.filterwarnings('ignore')

2. Load the dataset

In [2]:
url = 'https://raw.githubusercontent.com/nyp-sit/sdaai-iti103/master/session-8/Heart_failure_clinical_records_dataset.csv'
data = pd.read_csv(url)
print('Data read successfully')

data.head()

Data read successfully


Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


3. Seperate the dataset into features and label

In [3]:
array = data.values
X = array[:, :12]
Y = array[:, 12]

4. Seperate the dataset into train and test set

In [4]:
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.67, random_state=1)

5. Data Preprocessing- StandardScaler

In [5]:
# Standard Scalar
scalar = StandardScaler()
rescaled_X = scalar.fit_transform(x_train)
#print(rescaled_X[:5])

6. Build the model

In [6]:
# Recursive Feature Elimination

model = LogisticRegression()
rfe = RFE(model,n_features_to_select=6)
fit = rfe.fit(rescaled_X, y_train)

transformed_X = fit.transform(rescaled_X)


7. Run the model on the test data

In [7]:
steps = [('scaler', StandardScaler()),
         ('RFE', RFE(LogisticRegression(), n_features_to_select=6)),
         ('lda', LinearDiscriminantAnalysis())]

pipeline = Pipeline(steps)
pipeline.fit(x_train, y_train)
predictions = pipeline.predict(x_test)
print('The accurcay score of the test dataset : ', accuracy_score(y_test, predictions))
print('\nThe confusion matrix : \n', confusion_matrix(y_test, predictions))
print('\nFinally the classification report : \n', classification_report(y_test, predictions))
print('Score : ', pipeline.score(x_test, y_test))

The accurcay score of the test dataset :  0.7860696517412935

The confusion matrix : 
 [[120  15]
 [ 28  38]]

Finally the classification report : 
               precision    recall  f1-score   support

         0.0       0.81      0.89      0.85       135
         1.0       0.72      0.58      0.64        66

    accuracy                           0.79       201
   macro avg       0.76      0.73      0.74       201
weighted avg       0.78      0.79      0.78       201

Score :  0.7860696517412935


8. Save the model. We will be using this model in the flask application later.

In [8]:
# saving the model
pickle.dump(pipeline, open('model.pkl', 'wb'))