# <center> <div class="alert alert-block alert-info">  <span style="color:crimson;"> Heart Attack Prediction  </center>

![Hear Attack](http://www.enloe.org/media/Image/heart-attack.jpg)

**Import Libraries**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import cufflinks as cf
from scipy import stats
import plotly.express as px
import matplotlib.pyplot as plt

from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
init_notebook_mode(connected=True)
cf.go_offline()

# <center> <div class="alert alert-block alert-info">  <span style="color:crimson;">  Exploratory Data Analysis  </center>

In [None]:
df = pd.read_csv('../input/heart-attack-analysis-prediction-dataset/heart.csv')
df.head(n=5)

In [None]:
df.shape

**Descriptive Statistics**

In [None]:
df.describe()

**obsevations**

* Average age is 54 and maximum is 77,mostly old people and no childern in this data
* minimum blood pressure 94 and maximum 200
* maximum heart rate is 202 and minimum is 71
* minimum cholestoral level is 126 and maximum 564

In [None]:
df.info()

# Data Cleaning

**Missing values**

In [None]:
df.isnull().sum()

**Repeated values**

In [None]:
df.duplicated().sum()

In [None]:
duplicate = df[df.duplicated()]
duplicate

In [None]:
df1 = df.drop_duplicates()

In [None]:
df1.duplicated().sum()

**Data Dictionary**

* The Dataset has 303 rows and 14 columns. Column descriptions are as below:


* **Age** : Age of the patient
* **Sex** : Sex of the patient (1:male, 0:female)
* **exang**: exercise induced angina (1 = yes; 0 = no) means is there chest pain after exercise?
* **ca**: number of major vessels (0-3)
* **cp** : Chest Pain type chest pain type

    Value 1: typical angina
    
    Value 2: atypical angina
    
    Value 3: non-anginal pain
    
    Value 4: asymptomatic
* **trtbps** : resting blood pressure (in mm Hg)
* **chol** : cholestoral in mg/dl fetched via BMI sensor
* **fbs** : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
* **oldpeak** - ST depression induced by exercise relative to rest
* **rest_ecg** : resting electrocardiographic results

    Value 0: normal
    
    Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    
    Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
    
* **slp**: slope - the slope of the peak exercise ST segment (2 = upsloping; 1 = flat; 0 = downsloping)
* **thalach** : maximum heart rate achieved
* **thal** :2 = normal; 1 = fixed defect; 3 = reversable defect
* **target** : 0= less chance of heart attack 1= more chance of heart attack

# Data Visualization

In [None]:
fig = px.histogram(df1, x ='sex',color='sex' , barmode = 'group')
fig.show()

* After dulicates removed from data, total 96 females and 206 males

In [None]:
fig = px.histogram(df1, x ='output',color='sex' , barmode = 'group')
fig.show()

* Males have less chance of heart attack
* Females have high chance of attack

In [None]:
px.histogram(df1,x='age',color='output',pattern_shape='sex')

* Mostly 58-59 age people in data
* People with age between (52-53) are more in number with most chance of heart attack
* According to the second point at that age males have high chances

In [None]:
px.histogram(df1, x ='exng',color='output' , barmode = 'relative')

* Even if chest pain after exercise have less chances of heart attack

In [None]:
px.histogram(df1,x='cp',color='output',barmode='group')

* Chest pain type typical angina have less chances of heart attack
* Chest pain type non-anginal pain have high chances

In [None]:
px.histogram(df1,x='chol',color='output')

In [None]:
px.histogram(df1,x='thalachh',color='output')

* Thalachh with 160-164 and 170-174 have high chances

In [None]:
px.histogram(df1,x='thall',color='output')

* Thall with 3 type have high chances

In [None]:
px.histogram(df1,x='trtbps',color='output')

In [None]:
px.scatter(df,x='age',y='chol',color='cp',size='cp',hover_data=['trtbps','thall','thalachh','output'])

In [None]:
px.box(df1,points='all',color='output')

* chol have outliers better visualize with vilon plot

In [None]:
px.violin(df1,x='chol',box=True,points='all',color='sex')

In [None]:
px.violin(df1,x='thalachh',box=True,points='all',color='sex')

In [None]:
px.imshow(df1.corr())

* cp and thalachh have little correlation with output
* so many negative correlations

**Outliers removal**

In [None]:
zscore = np.abs(stats.zscore(df1))
print(zscore)

In [None]:
threshold = 4
print(np.where(zscore > 4))

In [None]:
df_clean=df1
df_clean = df_clean[(zscore<4).all(axis=1)]

In [None]:
df1.shape,df_clean.shape

In [None]:
px.violin(df_clean,x='chol',box=True,points='all',color='sex')

In [None]:
px.violin(df_clean,x='thalachh',box=True,points='all',color='sex')

* Perfect outliers are removed

# <center> <div class="alert alert-block alert-info">  <span style="color:crimson;"> Data Preprocessing </center>

In [None]:
df2 = df_clean.drop(['output','slp'],axis=True)
data_target = df_clean['output']

In [None]:
data_dummies=df2[['sex','cp','fbs','restecg','exng','caa','thall']]
data_dummies= pd.get_dummies(data_dummies,columns=['sex','cp','fbs','restecg','exng','caa','thall'])

In [None]:
data = df2.drop(['sex','cp','fbs','restecg','exng','caa','thall'],axis=True)

In [None]:
data=data.merge(data_dummies,left_index=True, right_index=True,how='left')
data.head()

In [None]:
data.shape

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(data,data_target,test_size=0.3,random_state=42)

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()
x_train=scaler.fit_transform(x_train)
x_test=scaler.transform(x_test)

# <center> <div class="alert alert-block alert-info">  <span style="color:crimson;"> Models </center>

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score,confusion_matrix

**LogisticRegression**

In [None]:
log_reg = LogisticRegression()
log_reg.fit(x_train,y_train)

log_acc=accuracy_score(y_test,log_reg.predict(x_test))

print("Train Set Accuracy:"+str(accuracy_score(y_train,log_reg.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,log_reg.predict(x_test))*100))

**DecisionTreeClassifier**

In [None]:
d_tree = DecisionTreeClassifier()
d_tree.fit(x_train,y_train)

d_acc=accuracy_score(y_test,d_tree.predict(x_test))

print("Train Set Accuracy:"+str(accuracy_score(y_train,d_tree.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,d_tree.predict(x_test))*100))

**RandomForestClassifier**

In [None]:
r_for = RandomForestClassifier()
r_for.fit(x_train,y_train)

r_acc=accuracy_score(y_test,r_for.predict(x_test))

print("Train Set Accuracy:"+str(accuracy_score(y_train,r_for.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,r_for.predict(x_test))*100))

**KNeighborsClassifier**

In [None]:
k_nei = KNeighborsClassifier()
k_nei.fit(x_train,y_train)

k_acc = accuracy_score(y_test,k_nei.predict(x_test))

print("Train set Accuracy:"+str(accuracy_score(y_train,k_nei.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,k_nei.predict(x_test))*100))

**Support vector classifier**

In [None]:
s_vec = SVC()
s_vec.fit(x_train,y_train)

s_acc = accuracy_score(y_test,s_vec.predict(x_test))

print("Train set Accuracy:"+str(accuracy_score(y_train,s_vec.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,s_vec.predict(x_test))*100))

**GaussianNB**

In [None]:
g_clf = GaussianNB()
g_clf.fit(x_train,y_train)

g_acc = accuracy_score(y_test,s_vec.predict(x_test))

print("Train set Accuracy:"+str(accuracy_score(y_train,g_clf.predict(x_train))*100))
print("Test Set Accuracy:"+str(accuracy_score(y_test,g_clf.predict(x_test))*100))

**Pre Pruning**

Try prepruning to increase test accuracy and to avoid overfitting

In [None]:
from sklearn.model_selection import  GridSearchCV
params = {'max_depth': [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34],
'min_samples_split': [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18],
'min_samples_leaf': [1,2,3,4,5,6,7,8,9,10,11,12,13,14]}
clf = DecisionTreeClassifier()
gcv = GridSearchCV(estimator=clf,param_grid=params)
gcv.fit(x_train,y_train)

In [None]:
modelD = gcv.best_estimator_
modelD.fit(x_train,y_train)
y_train_pred = modelD.predict(x_train)
y_test_pred = modelD.predict(x_test)
p_acc = accuracy_score(y_test,modelD.predict(x_test))
print(f'Train score {accuracy_score(y_train_pred,y_train)*100}')
print(f'Test score {accuracy_score(y_test_pred,y_test)*100}')

* It's not sufficent

# Ensemble Best Models
**Voting Classifier**


In [None]:
from sklearn.ensemble import VotingClassifier
model1 = LogisticRegression(random_state=1)
model2 = RandomForestClassifier(random_state=1)
model3 = SVC(random_state=1)
model = VotingClassifier(estimators=[('lr', model1), ('rf', model2),('sc',model3)], voting='hard')
model.fit(x_train,y_train)
model.score(x_test,y_test)
m_acc = accuracy_score(y_test,model.predict(x_test))

In [None]:
score = model.score(x_test,y_test)
train_scored = model.score(x_train,y_train)
y_predict=model.predict(x_test)
m_acc = accuracy_score(y_test,model.predict(x_test))
print("DecisionTreeClassifier Test Score:",train_scored)
print("DecisionTreeClassifier Test Score:",score)

In [None]:
models = pd.DataFrame({
    'Model': ['Logistic','KNN', 'SVC',  'Decision Tree Classifier',
             'Random Forest Classifier',  'Gaussian','Voting Classifier','Prepruning'],
    'Score': [ log_acc,k_acc, s_acc, d_acc, r_acc, g_acc,m_acc,p_acc]
})

models.sort_values(by = 'Score', ascending = False)

In [None]:
px.bar(models,x='Model',y='Score',color='Model')

In [None]:
y_predict=model.predict(x_test)
conf_mat = confusion_matrix(y_predict,y_test)

In [None]:
from mlxtend.plotting import plot_confusion_matrix
 
fig, ax = plot_confusion_matrix(conf_mat=conf_mat, figsize=(6, 6), cmap=plt.cm.Greens)

**<center> Any Suggestions are accepted </center>**

# <center> <div class="alert alert-block alert-info">  <span style="color:crimson;"> Done </center>