**CHECKING WHETHER THE PATIENT HAVE HEART RISKS OR NOT USING ARTIFICIAL NEURAL NETWORKS.**

**About this dataset**
* Age : Age of the patient
* Sex : Sex of the patient
* exang: exercise induced angina (1 = yes; 0 = no)
* ca: number of major vessels (0-3)
* cp : Chest Pain type chest pain type
       Value 1: typical angina
   
       Value 2: atypical angina

       Value 3: non-anginal pain

       Value 4: asymptomatic

* trtbps : resting blood pressure (in mm Hg)
* chol : cholestoral in mg/dl fetched via BMI sensor
* fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
* rest_ecg : resting electrocardiographic results
            Value 0: normal

            Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)

            Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

* thalach : maximum heart rate achieved
* target : 0= less chance of heart attack 1= more chance of heart attack

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow as tf

In [None]:
ds = pd.read_csv('../input/heart-attack-analysis-prediction-dataset/heart.csv')
ds.head()

In [None]:
ds.isnull().sum()

**NO MISSING VALUES**

In [None]:
ds.info()

In [None]:
sns.countplot(x = 'output', data =ds)

**IT IS NOT SPECIFIED WHETHER 0 IS MALE OR FEMALE, THEREFORE WE'LL ASSUME GENDERS AS 0 AND 1**
* AROUND 160 PATIENTS HAVE RISKS OF HEART DISEASE

In [None]:
ds.columns

In [None]:
plt.figure(figsize=(18,10))
col = ['sex', 'cp', 'fbs', 'restecg','exng', 'slp', 'caa', 'thall',]
c = 1
for i in col:
    if c < 9:
        plt.subplot(2,4,c)
        sns.countplot(x = i,data =ds,hue = 'output')
        plt.xlabel(i)
    c += 1
    plt.tight_layout()

**ASSUMPTIONS -**
* **sex 1 has more patients than 0 , but percentile of high risk patients is more in gender 0.**
* **cp stand for chest pain and there are 4 type of chest pain, cp 0 has the most patients but cp 2 has the most with high risks.**
* **fps stands for fasting blood sugar (fps > 120 mg/dl , 1 = true, 0 = false) patients with high risk have fps < 120 mg/dl.**
* **resting electrocardiographic results with value 1 have highest risk.**
* **exng is exercise enduced argina (1 = yes, 0 = no ) exng = 0 have more patients.**
* **slp with value 2 have highest risk.**
* **caa = 0 has highest patients with heart risks.**
* **thall with value 2 have highest patients.**


In [None]:
plt.figure(figsize=(18,10))
col = ['age','trtbps', 'chol','thalachh','oldpeak']
c = 1
for i in col:
    if c < 6:
        plt.subplot(2,3,c)
        sns.violinplot(y = i,x = 'output', data =ds)
    c += 1
    plt.tight_layout()

**ASSUMPTIONS-**
* **ages between 50-60 have highest chance of heart risks**
* **trtbps (resting blood pressure) have equal distributions, hence it has very less effect in health risks.**
* **average cholestrol around 250 has risks but cholestrol values of 400-600 have highly likeness of heart risks.**
* **heart rate of 140 - 180 have high risks.**
* **patients with high risk have oldpeak values between 0-1.**

In [None]:
plt.figure(figsize=(16,4))

#RELATION BETWEEN AGE AND CHOLESTROL
plt.subplot(1,3,1)
sns.scatterplot(x = 'age', y = 'chol', data =ds,hue = 'output')
plt.title('AGE V/S CHOLESTROL')

#RELATION BETWEEN AGE AND MAXIMUM HEART RATE
plt.subplot(1,3,2)
sns.scatterplot(x = 'age', y = 'thalachh', data =ds,hue = 'output')
plt.title('AGE V/S MAXIMUM HEART RATE')

#RELATION BETWEEN AGE AND OLDPEAK
plt.subplot(1,3,3)
sns.scatterplot(x = 'age', y = 'oldpeak', data =ds,hue = 'output')
plt.title('AGE V/S OLDPEAK')

plt.tight_layout()

In [None]:
plt.figure(figsize=(12,10))
sns.heatmap(ds.corr() ,annot = True)

**NO SIGNIFICANT CORRELATION**

In [None]:
from numpy.random import seed
seed(0)
tf.random.set_seed(0)

In [None]:
#defining dependent and independent variables
x = ds.drop('output', axis=1)
y = ds['output']

In [None]:
#splitting data into training and testing set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

In [None]:
#applying feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.fit_transform(x_test)

In [None]:
#CREATING THE ANN AS SEQUENCE OF LAYERS
ann = tf.keras.models.Sequential()

#ADDING FIRST HIDDEN LAYER WITH 24 NEURONS, THE INPUT LAYER WILL BE ADDED AUTOMATICALLY,
ann.add(tf.keras.layers.Dense(units = 24,activation = 'relu'))

#ADDING 2ND HIDDEN LAYER WITH 24 NEURONS
ann.add(tf.keras.layers.Dense(units = 24,activation = 'relu'))

#ADDING 3RD HIDDEN LAYER WITH 12 NEURONS
ann.add(tf.keras.layers.Dense(units = 12,activation = 'relu'))


#ADDING OUTPUT LAYER WITH 1 NEURON , AS THIS IS A BINARY CLASSIFICATION
ann.add(tf.keras.layers.Dense(units = 1,activation = 'sigmoid'))

#COMPILING THE ANN USING STOCHASTIC GRADIENT DESCENT (optimizer = 'adam')
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])


#TRAINING THE ANN WITH BATCH SIZE OF 32 (THIS IS A BATCH LEARNING)
ann.fit(x_train, y_train,batch_size = 32, epochs = 50)

In [None]:
losses = pd.DataFrame(ann.history.history)
losses.plot()

In [None]:
#GETTING ACCURACY AND CONFUSION MATRIX
from sklearn.metrics import confusion_matrix,accuracy_score
y_pred = ann.predict(x_test)
y_pred  = y_pred > 0.5
cm = confusion_matrix(y_pred,y_test)
ac = accuracy_score(y_pred,y_test)

sns.heatmap(cm,annot = True)
plt.title('CONFUSION MATRIX')
print('Accuracy - {0:.2f}%'.format(ac*100))