# Heart Attack Prediction

In this model, we try to predict weather a person has more chance or less chance of heart attack

Dataset: https://www.kaggle.com/nareshbhat/health-care-data-set-on-heart-attack-possibility


### Importing Libraries

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

## Importing dataset

In [None]:

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

data = pd.read_csv('../input/health-care-data-set-on-heart-attack-possibility/heart.csv')

In [None]:
data

In [None]:
data.info()

### Variable description

1. age
2. sex
3. cp: chest pain type (4 values)
4. trestbps : resting blood pressure
5. chol : serum cholestoral in mg/dl
6. fbs : fasting blood sugar > 120 mg/dl
7. restecg : resting electrocardiographic results (values 0,1,2)
8. thalach : maximum heart rate achieved
9. exang : exercise induced angina
10. oldpeak : ST depression induced by exercise relative to rest
11. slope : the slope of the peak exercise ST segment
12. ca : number of major vessels (0-3) colored by flourosopy
13. thal : 0 = normal; 1 = fixed defect; 2 = reversable defect
14. target : 0= less chance of heart attack 1= more chance of heart attack

## EDA

In [None]:
sns.countplot(x = 'target', data = data)

In [None]:
sns.countplot(x = 'sex', hue = 'target', data = data)

In [None]:
sns.countplot(x = 'cp', hue = 'target', data = data)

In [None]:
sns.countplot(x = 'fbs', hue = 'target', data = data)

In [None]:
sns.countplot(x = 'restecg', hue = 'target', data = data)

In [None]:
sns.countplot(x = 'exang', hue = 'target', data = data)

In [None]:
sns.countplot(x = 'slope', hue = 'target', data = data)

In [None]:
sns.countplot(x = 'ca', hue = 'target', data = data)

In [None]:
sns.countplot(x = 'thal', hue = 'target', data = data)

In [None]:
plt.hist(data["age"], bins = 30)

In [None]:
plt.hist(data["trestbps"], bins = 30)

In [None]:
plt.hist(data["chol"], bins = 50)

In [None]:
plt.hist(data["thalach"], bins = 30)

In [None]:
list = ["age","sex","trestbps","chol","thalach","oldpeak", "cp", "fbs", "restecg", "exang", "slope", "ca", "thal", "target"]
plt.figure(figsize=(15,10))
sns.heatmap(data[list].corr(), annot=True, fmt=".2f")
plt.show()

## Train-CV-Test Split

In [None]:
x = data.drop(["target"], axis=1)
y = data["target"]

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)



## Model Training

In [None]:
from sklearn.metrics import confusion_matrix,accuracy_score,roc_curve,classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

In [None]:
loreg = LogisticRegression(C = 10)

loreg.fit(x_train, y_train)

y_pred = loreg.predict(x_test)
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))


In [None]:
GNB = GaussianNB()

GNB.fit(x_train, y_train)

y_pred = GNB.predict(x_test)
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))


In [None]:
knn = KNeighborsClassifier(n_neighbors=10)
knn.fit(x_train, y_train)

y_pred = knn.predict(x_test)
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))

## Observation

Naive Baye's gives the best accuracy