# Heart Attack Prediction

In this project, we will be predicting if a person is prone to heart attack or not. We will also be doing some EDA.

**Importing Libraries**

In [None]:
# EDA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# Modeling and Prediction 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

In [None]:
df = pd.read_csv("/kaggle/input/heart-attack-analysis-prediction-dataset/heart.csv")

In [None]:
df.head()

**Exploratory Data Analysis**

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.isnull().sum()

In [None]:
sns.set_style("darkgrid")
sns.countplot(x='output',data=df, palette = 'Pastel1')

In [None]:
df['output'].value_counts()

We can see that there are 165 people who are prone to heart attacks and 138 people who are not prone to heart attacks. 

In [None]:
sns.set_style("darkgrid")
sns.countplot(x='output',hue='sex',data=df, palette = 'Pastel1')

From the plot above, we can conclude that:
* People whose sex = 1 is more prone to getting a heart attack as compared to those whose sex = 0

In [None]:
sns.boxplot(x='output',y='age',data=df,palette='Pastel1')

In [None]:
sns.pairplot(df, hue = 'output')

In [None]:
df.corr()

In [None]:
plt.figure(figsize=(14,10))
sns.heatmap(df.corr(), annot=True)

**Splitting and Scaling the Data**

In [None]:
X = df.drop(['output'], axis = 1)
Y = df['output']

In [None]:
# Train-test split 30-70
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size=0.3, 
                                                    random_state=101)

In [None]:
#Scaling the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)

**Models and Accuracy**

We will be using:
* Logistic Regression
* Decision Tree
* Random Forest 
* KNN
* SVM

In [None]:
def models(X_train,y_train):
    
    #Logistic Regression
    log = LogisticRegression(random_state=0)
    log.fit(X_train, y_train)
    
    #Decision Tree
    decision_tree = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
    decision_tree.fit(X_train, y_train)
    
    #Random Forest
    random_forest = RandomForestClassifier(n_estimators=10,criterion = 'entropy', random_state=0)
    random_forest.fit(X_train, y_train)
    
    #KNN
    knn = KNeighborsClassifier(n_neighbors = 3)
    knn.fit(X_train, y_train)
    
    #SVM
    svm = SVC(kernel='linear', C=1, random_state=101).fit(X_train,y_train)
    
    #Model Accuracy on Training Data
    print('[0]Logistic Regression Training Acc:', log.score(X_train,y_train))
    print('[1]Decision Tree Training Acc:', decision_tree.score(X_train,y_train))
    print('[2]Random Forest Training Acc:', random_forest.score(X_train,y_train))
    print('[3]KNN Training Acc:', knn.score(X_train,y_train))
    print('[4]SVM Training Acc:', svm.score(X_train,y_train))
    
    return log, decision_tree, random_forest, knn, svm

In [None]:
model = models(X_train,y_train)

In [None]:
# Accuracy on Testing Data

for i in range(len(model)):
    print('Model ', i)
    cm = confusion_matrix(y_test, model[i].predict(X_test))

    tp = cm[0][0]
    tn = cm[1][1]
    fp = cm[1][0]
    fn = cm[0][1]

    print(cm)
    print('Testing Acc = ', (tp + tn)/(tp +tn +fn + fp))
    print()

**The testing accuracies are:**
* Logistic Regression: 0.86
* Decision Tree: 0.86
* Random Forest: 0.78
* KNN: 0.86
* SVM: 0.87

SVM has the best accuracy score.