## Introduction

The heart attack or acute myocardial infarction is the obstruction of the blood supply to the heart muscle.
A heart attack is a medical emergency. Heart attacks usually happen when a blood clot blocks the blood supply to the heart. Without blood, the tissues do not receive oxygen and die. 

Motivation:
Try to understand what the cause of Heart attack.
Explore the data trough some EDA and data visualisazion.
Try to detect and extract relevant feature in order to build a prediction model.
The plan:
- Libraries

- Load data and first look

- Data Analysis and feature selection 

- Model

## Libraries

In [None]:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as ts
from tensorflow import keras
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

## Load data and first look

In [None]:
heart = pd.read_csv('../input/heart-attack-analysis-prediction-dataset/heart.csv')
heart.head()

In [None]:
heart.info()

- Age : Age of the patient

- Sex : Sex of the patient

- exang: exercise induced angina (1 = yes; 0 = no)

- ca: number of major vessels (0-3)

- cp : Chest Pain type chest pain type

        Value 1: typical angina
        Value 2: atypical angina
        Value 3: non-anginal pain
        Value 4: asymptomatic
- trtbps : resting blood pressure (in mm Hg)

- chol : cholestoral in mg/dl fetched via BMI sensor

- fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

- rest_ecg : resting electrocardiographic results

        Value 0: normal

        Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)

        Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
        
- thalach : maximum heart rate achieved

- target : 0= less chance of heart attack 1= more chance of heart attack

From the datafame info we observe that our columns are all numerical and do not have missing values which will make our work easier

In [None]:
heart.isnull().sum()

No missing values.

## Data Analysis and feature selection

In [None]:
sns.countplot(x="output", data=heart).set(xlabel='Probability of heart attack',title='Distribution of outputs',ylabel=' ')

The dataset is balanced 

In [None]:
#Chest Pain type is a column with categorical variables
name=['typical_angina','atypical_angina','non_anginal_pain','asymptomatic']

cd=pd.get_dummies(heart['cp'])
cd.columns=name

In [None]:
#Rest_ecg is a column with categorical variables
name=['normal',' having_abnormality','definite_left_ventricular_hypertrophy']

ad=pd.get_dummies(heart['restecg'])
ad.columns=name

In [None]:
#include in dataframe original and drop column cp
heart=heart.drop(labels=['cp','restecg'],axis=1)
heart=pd.concat([heart,cd,ad],axis=1)
heart

Now,this dataset already makes sense to use

Pearson correlation

In [None]:
corr=heart.corr(method ='pearson')

mask = np.triu(np.ones_like(corr, dtype=bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5}).set(title='Correlation of parameters')

Spearman correlation

In [None]:
corr=heart.corr(method ='spearman')

mask = np.triu(np.ones_like(corr, dtype=bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5}).set(title='Correlation of parameters')

Taking into account the two correlation graphs, it can be concluded that the data are linear and monotonous for the reason that S = P

With the correlations it can be seen that some parameters do not have a significant correlation such as age, sex, trtbps, chol and fbs, these parameters do not have a great meaning in the probability of a heart attack.

## Predict probability of hearth attack with NN

The past analysis will be used to train a neural network with the parameters with the highest correlation for its prediction, tensorflow and keras will be used

In [None]:
y=heart['output']
X=heart.drop(labels=['age', 'sex', 'trtbps', 'chol','fbs','output'],axis=1)
scaler = StandardScaler()
X = scaler.fit_transform(X)
X

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.15)

In [None]:
model = keras.Sequential([
    keras.layers.Dense(64, activation='relu',input_dim=X_train.shape[1]),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(16, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid'),
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

callbacks = [
    keras.callbacks.ModelCheckpoint('./hearts_attack_model.h5', save_best_only=True)
]
model.fit(X_train,y_train,epochs=200,batch_size = 32,validation_split = 0.2,callbacks=callbacks)

In [None]:
model = keras.models.load_model('./hearts_attack_model.h5')
y_predict=(model.predict(X_test) > 0.5).astype("int32")
cf_matrix = confusion_matrix(y_test, y_predict)
sns.heatmap(cf_matrix, annot=True).set(xlabel='Actual values',ylabel='Predict values')