# Heart Disease Prediction

## Dataset Attributes
- id (Unique id for each patient)
- age (Age of the patient in years)
- origin (place of study)
- sex (Male/Female)
- cp chest pain type ([typical angina, atypical angina, non-anginal, asymptomatic])
- trestbps resting blood pressure (resting blood pressure (in mm Hg on admission to the hospital))
- chol (serum cholesterol in mg/dl)
- fbs (if fasting blood sugar > 120 mg/dl)
- restecg (resting electrocardiographic results) Values: [normal, stt abnormality, lv hypertrophy]
- thalach: maximum heart rate achieved
- exang: exercise-induced angina (True/ False)
- oldpeak: ST depression induced by exercise relative to rest
- slope: the slope of the peak exercise ST segment
- ca: number of major vessels (0-3) colored by fluoroscopy
- thal: [normal; fixed defect; reversible defect]
- num: the predicted attribute


## Objective

To predict whether a patient has heart disease based on the above attributes

## Importing Libraries

In [1]:
import pandas as pd
import numpy as np

from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
import keras
from keras import backend
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import BatchNormalization

from sklearn.metrics import mean_squared_error

## Data Preprocessing

In [2]:
# importing the dataset
dataset = pd.read_csv('heart_disease_uci.csv')
dataset.dropna(inplace = True)

# dropping unwanted columns from the dataset
dataset = dataset.drop(['id', 'dataset'], axis = 1)

# one hot encoding
dataset = pd.get_dummies(dataset, columns=['sex','slope','cp', 'fbs', 'thal', 'restecg', 'exang'])

# splitting the data into features and target
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# splitting data into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
# metrics for measuring performance of model
def rmse(y_true, y_pred):
    return backend.sqrt(backend.mean(backend.square(y_pred - y_true), axis=-1))

# initializing the model
model = Sequential()

# adding layers to the model
# input layer
model.add(Dense(512, activation = 'relu', input_dim = 25))
model.add(BatchNormalization())

# second hidden layer
model.add(Dense(units = 256, activation = 'relu'))
model.add(BatchNormalization())

# third hidden layer
model.add(Dense(units = 128, activation = 'relu'))
model.add(BatchNormalization())
model.add(Dense(units = 32, activation = 'relu'))
model.add(BatchNormalization())

# output layer
model.add(Dense(units = 1, activation = 'sigmoid'))

# compiling the model
model.compile(optimizer= 'adam',loss='mean_squared_error',metrics=[rmse])

In [6]:
# fitting the model on data
history=model.fit(X_train,y_train,epochs = 100 ,batch_size=32,validation_data=(X_test, y_test))

Epoch 1/100


2023-01-24 17:23:49.542748: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100


Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


In [7]:
# predicting and measuring performance
y_pred = model.predict(X_test)
print("RMS Error: ", np.sqrt(mean_squared_error(y_test, y_pred)))

RMS Error:  0.10981164
