# Heart Attack Prediction 

**Information about Data**

**Columns is ::**
1. -Age is the age of candidate.
1. -Sex has numeric values. 1 is male and 0 is female.
1. -(cp)Chest Pain pain has values between 0-3. 
 The types of angina that are described in the research paper. The higher the number, the lesser are the odds of heart attack.
   — Value 1: typical angina
   — Value 2: atypical angina
   — Value 3: non-anginal pain
1. -(trestbps)Resting blood pressure is normal pressure with no exercise.
1. -(chol)Cholesterol means the blockage for blood supply in the blood vessels.
1. -(fbs)fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
 blood sugar taken after a long gap between a meal and the test. Typically, it's taken before any meal in the morning.
1. -(restecg)Rest ECG results means ECG values taken while person is on rest which means no exercise and normal functioning of heart is happening.
1. -(thalach)The Maximum Heart Rate achieved.
1. -(exang): exercise induced angina (1 = yes; 0 = no)
 is chest pain while exercising or doing any physical activity.
1. -(oldpeak)ST Depression is the difference between value of ECG at rest and after exercise.
 
1. -(slope): the slope of the peak exercise ST segment
     — Value 1: upsloping
     — Value 2: flat
     — Value 3: downsloping
1. (ca): The number of major blood vessels(0-3)  supplying blood to heart blocked.
1. -(thal):The Types of thalassemia
    (3 = normal;
    6 = fixed defect; 
    7 = reversable defect)
1. -(target) (predicted attribute): diagnosis of heart disease (angiographic disease status)
      — Value 0: < 50% diameter narrowing
      — Value 1: > 50% diameter narrowing

**Heart attack prediction where 1 denotes Heart attack occured and 0 where it din't take occur.


# Important libraries

In [None]:
#For uploading and accessing the data
import numpy as np
import pandas as pd
#For visualizations
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt
#for split data
from sklearn.model_selection import train_test_split
#for calculate mean_squared_error and mean_absolute_error
from sklearn.metrics import mean_squared_error , mean_absolute_error
#for fitting Models
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
#for calculate confusion_matrix
from sklearn.metrics import confusion_matrix

# Read Data

In [None]:
data = pd.read_csv('../input/heart-disease-uci/heart.csv' , sep=',' , encoding='utf8')

In [None]:
#show information about data
data.info()

In [None]:
data.shape

* -Data Containing of 303 row and 14 columns(features)
* -No found Null
* -all data is integer except oldpeak is float

In [None]:
#show first 5 row from data
data.head()

In [None]:
data.tail()

**Renaming the column headers for better understanding of visualizations.**

In [None]:
data.rename(columns = {'age':'Age','sex':'Gender','cp':'Chest_pain' ,'trestbps':'Resting_blood_pressure','chol':'Cholesterol','fbs':'Fasting_blood_sugar',
                    'restecg':'ECG_results','thalach':'Maximum_heart_rate','exang':'Exercise_induced_angina','oldpeak':'ST_depression','ca':'Major_vessels',
                   'thal':'Thalassemia_types','target':'Heart_attack','slope':'ST_slope'} , inplace = True)

In [None]:
#show heading of columns
data.head()

In [None]:
data.info()

In [None]:
#show sum of null data
data.isnull().sum()
#also no found null

In [None]:
#show all statistics
data.describe(include='all')

In [None]:
#replace number to object to make reading is easy
#data['Gender'].replace({1:'Male' , 0:'Female'},inplace = True)

In [None]:
#replace number to object to make reading is easy
#data['Heart_attack'].replace({1:'Heart_attack-Yes' , 0:'Heart_attack-No'} ,inplace = True)

In [None]:
data.head()

In [None]:
data.tail()

In [None]:
#GAH refer to gender , age and has Heart_attack 
#count of member has Heart_attack according his gender and age
GAH = data.groupby(['Gender','Age'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack',ascending=False)
GAH.head(20).style.background_gradient(cmap='Purples')

* 1:Male
* 0:Female
* top 20 count of heart attack, males have witnessed more heart attacks for their ages.**

In [None]:
#GcH refer to gender , Chest_pain and has Heart_attack 
#count of member has Heart_attack according his Chest_pain and Gender
GcH = data.groupby(['Gender' , 'Chest_pain'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GcH.head(20).style.background_gradient(cmap='Blues')

* The higher the number, the lesser are the odds of heart attack
* shows that large number of Male has Heart_Attack with Chest_pain less
* just 19 Man has Heart_Attack with Chest_pain higher
* for women 4 women has Heart_Attack with Chest_pain higher
* 39 women has Heart_Attack with Chest_pain less

In [None]:
data.columns

In [None]:
data.Chest_pain

In [None]:
#GRH refer to gender , Resting_blood_pressure and has Heart_attack 
#count of member has Heart_attack according his Resting_blood_pressure and Gender
GRH = data.groupby(['Gender' , 'Resting_blood_pressure'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GRH.head(8).style.background_gradient(cmap='coolwarm')

count of Male 16 has Resting_blood_pressure 110 but count of women 12 has Resting_blood_pressure 130

In [None]:
#GCH refer to gender , Cholesterol and has Heart_attack 
#count of member has Heart_attack according his Cholesterol and Gender
GCH = data.groupby(['Gender' , 'Cholesterol'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GCH.head(10).style.background_gradient(cmap='OrRd')

* count of Male with high Cholesterol less (212 -5)
* for female Cholesterol was high and count of persone has Heart_attack is smaller

In [None]:
#GFH refer to gender , Fasting_blood_sugar and has Heart_attack 
#count of member has Heart_attack according his Fasting_blood_sugar and Gender
GFH = data.groupby(['Gender' , 'Fasting_blood_sugar'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GFH.style.background_gradient(cmap='YlGn')

When Fasting_blood_sugar was False --Heart_attack was highe for Male also with women

In [None]:
data.Fasting_blood_sugar.shape

In [None]:
#GECH refer to gender , ECG_results and has Heart_attack 
#count of member has Heart_attack according his ECG_results and Gender
GECH = data.groupby(['Gender' , 'ECG_results'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GECH.style.background_gradient(cmap='bone')

* When ECG_results is small was Heart_attack is highe
* Male higher from Female

In [None]:
#GEH refer to gender , Maximum_heart_rate and has Heart_attack 
#count of member has Heart_attack according his Maximum_heart_rate and Gender
GEH = data.groupby(['Gender' , 'Maximum_heart_rate'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GEH.head(10).style.background_gradient(cmap='summer')

The maximum heart rate were higher for males resulting in heart attack

In [None]:
#GEiH refer to gender , Exercise_induced_angina and has Heart_attack 
#count of member has Heart_attack according his Exercise_induced_angina and Gender
GEiH = data.groupby(['Gender' , 'Exercise_induced_angina'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GEiH.style.background_gradient(cmap='cool')

Exercise induced chest pain was higher in males and more resulted in Heart attacks.

In [None]:
#GSH refer to gender , ST_depression and has Heart_attack 
#count of member has Heart_attack according his ST_depression and Gender
GSH = data.groupby(['Gender' , 'ST_depression'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GSH.head(10).style.background_gradient(cmap='Oranges')

The lower the depression, the higher the cases were for heart attack.

In [None]:
data.ST_depression.head()

In [None]:
#GSlH refer to gender , ST_slope and has Heart_attack 
#count of member has Heart_attack according his ST_slope and Gender
GSlH = data.groupby(['Gender' , 'ST_slope'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GSlH.head(10).style.background_gradient(cmap='afmhot')

* The lower ST_slope (2 less than 1 less than 0), the higher the cases were for heart attack.

* The higher the slope value, the higher were the cases for Heart attack

In [None]:
#GMH refer to gender , Major_vessels and has Heart_attack 
#count of member has Heart_attack according his Major_vessels and Gender
GMH = data.groupby(['Gender' , 'Major_vessels'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GMH.head(10).style.background_gradient(cmap='afmhot')

* The lower the number of vessels blocked, the higher were the heart attack cases. 
* This means that 0 represents that all 4 major blood vessels were blocked 
* and 4 represented all vessels were free for flow.
* Male > Female

In [None]:
#GTH refer to gender , Thalassemia_types and has Heart_attack 
#count of member has Heart_attack according his Thalassemia_types and Gender
GTH = data.groupby(['Gender' , 'Thalassemia_types'])['Heart_attack'].count().reset_index().sort_values(by='Heart_attack' , ascending=False)
GTH.head(20).style.background_gradient(cmap='GnBu')

The higher the Thalassemia type, the higher were the cases of heart attack.

In [None]:
data.columns   

In [None]:
data_corr=data.corr().style.background_gradient(cmap='plasma')
data_corr

In [None]:
#corelation matrix with all Data
plt.figure(figsize=(11,7))
sns.heatmap(cbar=True,annot=True,fmt=".0%",data=data.corr(),cmap='copper')
plt.title('Correlation Matrix')
plt.show()

In [None]:
#corelation matrix with important Data
#data.drop(['Cholesterol' , 'Resting_blood_pressure', 'ECG_results' , 'Fasting_blood_sugar' , 'ST_slope']
  #       ,axis = 'columns' , inplace = True)
plt.figure(figsize=(10,6))
sns.heatmap(data.corr() , annot=True,fmt=".0%",linewidth=0.5, cmap='PuRd')#linewidth=0.5 is space between numbers

In [None]:
mask = np.zeros_like(data.corr())
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("ticks"):
    f, ax = plt.subplots(figsize=(9, 5))
    ax = sns.heatmap(data.corr(), mask=mask, vmax=.3,annot=True,fmt=".0%",linewidth=0.5,square=False)
    #annot is numbers in squares
    #square=True the square is small but square=False the square is large

In [None]:
#Show histogram for Age 
sns.distplot(data['Age'])

In [None]:
#Show histogram for Gender 
sns.histplot(data['Gender'])

In [None]:
#Show histogram for Chest_pain
sns.histplot(data['Chest_pain'])

In [None]:
sns.relplot(x ='Age', y ='Chest_pain', col = 'Gender', data =data, color = 'crimson')

In [None]:
#Show histogram for Maximum_heart_rate
sns.distplot(data['Maximum_heart_rate'])

In [None]:
sns.relplot(x ='Age', y ='Maximum_heart_rate', col = 'Gender', data =data, color = 'orange')

In [None]:
#Show histogram for Exercise_induced_angina
sns.histplot(data['Exercise_induced_angina'])

In [None]:
sns.relplot(x ='Age', y ='Exercise_induced_angina', col = 'Gender', data =data, color = 'teal')

In [None]:
#Show histogram for ST_depression
sns.distplot(data['ST_depression'])

In [None]:
sns.relplot(x ='Age', y ='ST_depression', col = 'Gender', data =data, color = 'crimson')

In [None]:
#Show histogram for Major_vessels
sns.histplot(data['Major_vessels'])

In [None]:
sns.relplot(x ='Age', y ='Major_vessels', col = 'Gender', data =data, color = 'crimson')

In [None]:
#Show histogram for Thalassemia_types
sns.histplot(data['Thalassemia_types'])

In [None]:
sns.relplot(x ='Age', y ='Thalassemia_types', col = 'Gender', data =data, color = 'crimson')

In [None]:
#Show histogram for Heart_attack
sns.histplot(data['Heart_attack'])

In [None]:
sns.set_style('whitegrid') #styling using sns
sns.countplot(x='Heart_attack',hue='Gender',data=data,palette='Blues') #palette is also styling parameter
#Insights from the graph are:
#males are highly suffering from heart disease as compared to females

In [None]:
data.columns

In [None]:
data.shape

In [None]:
data.plot(kind='density' , subplots=True , layout=(4,4) , sharex=False ,
          fontsize=8 , figsize=(10,10))
plt.tight_layout()

In [None]:
sns.lmplot('Age' , 'ST_depression' , data=data , hue='Heart_attack' , fit_reg=False , height=5)
plt.show()

In [None]:
sns.lmplot('Age' , 'Major_vessels' , data=data , hue='Heart_attack' , fit_reg=False , height=5)
plt.show()

In [None]:
sns.pairplot(data,size=5)

In [None]:
data.head()

In [None]:
#data has correlation high with the target(Heart_attack)
#data=data[['Age', 'Gender', 'Chest_pain', 'Maximum_heart_rate',
  #     'Exercise_induced_angina', 'ST_slope', 'Major_vessels',
   #    'Thalassemia_types', 'Heart_attack']]
data = data[['Chest_pain','Maximum_heart_rate','Resting_blood_pressure' ,'Fasting_blood_sugar','Cholesterol','ST_slope' , 'ECG_results' ,'Heart_attack']]

In [None]:
data.columns

In [None]:
#show correlation
mask = np.zeros_like(data.corr())
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("ticks"):
    f, ax = plt.subplots(figsize=(9, 5))
    ax = sns.heatmap(data.corr(), mask=mask, vmax=.3,annot=True,fmt=".0%",linewidth=0.5,square=False)
    #annot is numbers in squares
    #square=True the square is small but square=False the square is large

In [None]:
#show all statistics
data.describe(include='all')

In [None]:
#important data
data

In [None]:
#show outliers
sns.boxplot(x=data['Chest_pain'])

In [None]:
data.plot(kind='box' , subplots=True , layout=(4,4) , sharex=False ,
          fontsize=8 , figsize=(10,10))
plt.tight_layout()

In [None]:
plt.scatter(data['Maximum_heart_rate'] , data['Heart_attack'] , color='blue')

In [None]:
#plt.scatter(data['Major_vessels'] , data['Heart_attack'] , color='blue')


In [None]:
#data[data['Maximum_heart_rate']<75]
#data[data['Major_vessels']>=3.5]

In [None]:
#print outliers
print('outliers' , data[(data['Cholesterol']>500)]['Cholesterol'].count())

In [None]:
#print outliers
print('outliers' , data[(data['Resting_blood_pressure']>180)]['Resting_blood_pressure'].count())

In [None]:
data.columns

In [None]:
#print outliers
#print('outliers' , data[(data['Major_vessels']>=3.5)]['Major_vessels'].count())
#print('outliers' , data[(data['Thalassemia_types']<=0)]['Thalassemia_types'].count())

In [None]:
data.shape

# Splitting Data

In [None]:
#Splitting Data
#X = clean_data.drop(['Heart_attack'] , axis=1).values
X = data.drop(['Heart_attack'] , axis=1).values
#Y = clean_data['Heart_attack'].values
Y = data['Heart_attack'].values
#split data 70% for train and 30% for test
x_train , x_test , y_train , y_test = train_test_split(X,Y , test_size=0.30 ,random_state=40 )

# Normalization (MinMaxScaler)

In [None]:
#Import Library MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
#Creat Object from MinMaxScaler
s = MinMaxScaler()
#fit_transform for x-train data
x_train_scaled = s.fit_transform(x_train)
#fit_transform for x-test data
x_test_scaled = s.transform(x_test)

# Fit Model With Scaleing Data

**Linear Regression**

In [None]:
#Linear Regression 
reg = LinearRegression(fit_intercept=True , normalize=True)
#fitting Model
reg.fit(x_train_scaled,y_train)


In [None]:
reg.get_params

In [None]:
#Alpha
reg.coef_

In [None]:
#Beta
reg.intercept_

* The equation of the best fit line according to this data
* Y = 0.497618 x + (-0.05118767185421935)

In [None]:
#prediction
y_pred_reg = reg.predict(x_test_scaled)

In [None]:
#Training score
Train_score = reg.score(x_train_scaled,y_train)
#Testing score
Test_score = reg.score(x_test_scaled,y_test)

In [None]:
Train_score

In [None]:
Test_score

In [None]:
#accuracy for Model
from sklearn.metrics import r2_score
r2 = r2_score(y_test , y_pred_reg)
r2

# MSE and MAE

In [None]:
MSE = mean_squared_error(y_pred_reg, y_test)
MSE

In [None]:
RMSE = np.sqrt(MSE)
RMSE

In [None]:
MAE = mean_absolute_error(y_pred_reg, y_test)
MAE

**LogisticRegression**

In [None]:
#LogisticRegression
a_index=[0.001 , 0.1 ,0.01 , 1 , 10 , 100 ,1000]
train=pd.Series()
test=pd.Series()
for i in a_index:
    classifier=LogisticRegression(C=i)
    classifier.fit(x_train_scaled,y_train)
    y_pred_LR=classifier.predict(x_test_scaled)
    train=train.append(pd.Series(classifier.score(x_train_scaled,y_train)))
    test=test.append(pd.Series(classifier.score(x_test_scaled,y_test)))
    
plt.plot(a_index, train)
plt.plot(a_index, test)
plt.xticks(a_index)    

In [None]:
#confusion_matrix
cm=confusion_matrix(y_pred_LR,y_test)
cm

In [None]:
#Support Vector Machine (Classifier)
#Fitting Model
svm = SVC(kernel="rbf")
svm.fit(x_train_scaled,y_train)

y_pred_SVM=svm.predict(x_test_scaled)

#Training Score
Train_Score = svm.score(x_train_scaled,y_train)
#Test Score
test_score = svm.score(x_test_scaled,y_test)
#Conf_Matrix
cm=confusion_matrix(y_pred_SVM,y_test)

In [None]:
y_pred_SVM

In [None]:
Train_Score

In [None]:
test_score

In [None]:
cm

In [None]:
#GaussianNB
#fit Model Gaussian
gnb = GaussianNB()
gnb.fit(x_train_scaled,y_train)
#Training Score
GN_train_score = gnb.score(x_train_scaled,y_train)
#Test Score
GN_test_score = gnb.score(x_test_scaled,y_test)
#prediction
y_pred_GN=classifier.predict(x_test_scaled)

In [None]:
y_pred_GN

In [None]:
GN_train_score

In [None]:
GN_test_score

In [None]:
cm=confusion_matrix(y_pred_GN,y_test)
cm

DecisionTreeClassifier

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
dt = DecisionTreeClassifier(max_depth=4)
dt.fit(x_train_scaled,y_train)

In [None]:
y_pred=dt.predict(x_test_scaled)

In [None]:
y_pred

In [None]:
#Training Score
dt.score(x_train_scaled,y_train)

In [None]:
#Test Score
dt.score(x_test_scaled,y_test)

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
rf=RandomForestClassifier(max_depth=4 ,max_features=5 )
rf.fit(x_train_scaled,y_train)

In [None]:
#Training Score
rf.score(x_train_scaled,y_train)

In [None]:
#Test Score
rf.score(x_test_scaled,y_test)