# Heart Attack Predicition - Finding the Best Predictive Model

### **Table of Contents**
* [<span style="color:#A690A4"> 0. Executive Summary](#exe_sum)
* [<span style="color:#A690A4"> 1. Introduction](#intro)
* [<span style="color:#A690A4"> 2. Collect, Wrangle & Explore](#process)
* [<span style="color:#A690A4"> 3. Predict Life Expectancy](#predict)

# <span style="color:#5E6997">Executive Summary</span> <a class="anchor" id="exe_sum"></a>

# <span style="color:#5E6997">Introduction</span> <a class="anchor" id="intro"></a>

# <span style="color:#5E6997">Collect, Wrangle, and Explore</span> <a class="anchor" id="process"></a>

Below is data collection and wrangling.

Data will be explored in this [Tableau chart](https://public.tableau.com/views/HeartAttackAnalysis_17120977864890/Sheet1?:language=en-US&:sid=&:display_count=n&:origin=viz_share_link).

In [None]:
import seaborn as sns
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler 
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier  
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix
import warnings

In [None]:
df = pd.read_csv('/kaggle/input/heart-attack-analysis-prediction-dataset/heart.csv')
df.head()

#### Description of Data

<ul>
    <li><b>age</b> - age in years</li>
    <li><b>sex</b> - sex (1 = male; 0 = female)</li>
    <li><b>cp</b> - chest pain type 
        <ul>
            <li><b>0</b> - Asymptomatic</li>
            <li><b>1</b> - Typical Angina</li>
            <li><b>2</b> - Atypical Angina</li>
            <li><b>3</b> - Non-Anginal Pain</li>
        </ul>
    </li>
    <li><b>trestbps</b> - resting blood pressure (in mm Hg on admission to the hospital)</li>
    <li><b>chol</b> - serum cholestoral in mg/dl</li>
    <li><b>fbs</b> - fasting blood sugar > 120 mg/dl (1 = true; 0 = false)</li>
    <li><b>restecg</b> - resting electrocardiographic results</li>
    <ul>
        <li><b>0</b> - Hypertrophy</li>
        <li><b>1</b> - Normal</li>
        <li><b>2</b> - Having ST-T wave abnormality</li>
    </ul>
    <li><b>thalach</b> - maximum heart rate achieved</li>
    <li><b>exang</b> - exercise induced angina (1 = yes; 0 = no)</li>
    <li><b>oldpeak</b> - ST depression induced by exercise relative to rest</li>
    <li><b>slope</b> - the slope of the peak exercise ST segment</li>
    <li><b>ca</b> - number of major vessels (0-3) colored by flourosopy</li>
    <li><b>thal</b> - 2 = normal; 1 = fixed defect; 3 = reversable defect</li>
    <ul>
        <li><b>1</b> - Fixed defect</li>
        <li><b>2</b> - Normal</li>
        <li><b>3</b> - Reversable Defect</li>
    </ul>
    <li><b>out</b> - the predicted attribute - diagnosis of heart disease (angiographic disease status) (Value 0 = diameter narrowing; Value 1 = greater than 50% diameter narrowing)</li>

</ul>

In [None]:
df.info()

In [None]:
df.describe()

Check for any NULL values.

In [None]:
df.isnull().sum()

Check for duplicates.

In [None]:
df.duplicated().sum()

In [None]:
df.shape

In [None]:
df = df.drop_duplicates()

In [None]:
df.shape

One-hot encode the 'cp' column.

In [None]:
df = pd.get_dummies(df, columns=['cp'], prefix=['cp'], dtype=int)
df.head()

Create the correlation matrix.

In [None]:
correlation_matrix = df.corr()

In [None]:
plt.figure(figsize=(8, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".1f", square=True)
plt.title("Correlation Heatmap")
plt.show()

In [None]:
df.corr()['output'].sort_values(ascending=False)

Create input (X) and output (y) data.

In [None]:
X = df.drop(labels=["output"], axis=1)
y = df.output

Split the data into a training, and testing set.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2, random_state= 0)

Feature Scaling.

In [None]:
df.columns

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# <span style="color:#5E6997">Predict Heart Attacks</span> <a class="anchor" id="predict"></a>

1. Logistic Regression

In [None]:
model = LogisticRegression()
model.fit(X_train, y_train)
predicted=model.predict(X_test)
conf = confusion_matrix(y_test, predicted)
print ("Confusion Matrix : \n", conf)
print()
print()
print ("The accuracy of Logistic Regression is : ", accuracy_score(y_test, predicted)*100, "%")

2. Gaussian Naive Bayes

In [None]:
model = GaussianNB()
model.fit(X_train, y_train)
  
predicted = model.predict(X_test)
  
print("The accuracy of Gaussian Naive Bayes model is : ", accuracy_score(y_test, predicted)*100, "%")

3.Bernoulli Naive Bayes

In [None]:
model = BernoulliNB()
model.fit(X_train, y_train)
  
predicted = model.predict(X_test)
  
print("The accuracy of Gaussian Naive Bayes model is : ", accuracy_score(y_test, predicted)*100, "%")

4. Support Vector Machine

In [None]:
model = SVC()
model.fit(X_train, y_train)
  
predicted = model.predict(X_test)
print("The accuracy of SVM is : ", accuracy_score(y_test, predicted)*100, "%")

5. Random Forest

In [None]:
model = RandomForestRegressor(n_estimators = 100, random_state = 0)  
model.fit(X_train, y_train)  
predicted = model.predict(X_test)
print("The accuracy of Random Forest is : ", accuracy_score(y_test, predicted.round())*100, "%")

6. K Nearest Neighbors

In [None]:
model = KNeighborsClassifier(n_neighbors = 1)  
model.fit(X_train, y_train)
predicted = model.predict(X_test)
  

print(confusion_matrix(y_test, predicted))
print("The accuracy of KNN is : ", accuracy_score(y_test, predicted.round())*100, "%")

In [None]:
error_rate = []
  
for i in range(1, 40):
      
    model = KNeighborsClassifier(n_neighbors = i)
    model.fit(X_train, y_train)
    pred_i = model.predict(X_test)
    error_rate.append(np.mean(pred_i != y_test))
  
plt.figure(figsize =(10, 6))
plt.plot(range(1, 40), error_rate, color ='blue',
                linestyle ='dashed', marker ='o',
         markerfacecolor ='red', markersize = 10)
  
plt.title('Error Rate vs. K Value')
plt.xlabel('K')
plt.ylabel('Error Rate')

In [None]:
model = KNeighborsClassifier(n_neighbors = 36)
  
model.fit(X_train, y_train)
predicted = model.predict(X_test)
  
print('Confusion Matrix :')
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, predicted))

print()
print()
print("The accuracy of KNN is : ", accuracy_score(y_test, predicted.round())*100, "%")

7. XGBoost

In [None]:
model = xgb.XGBClassifier(use_label_encoder=False)
model.fit(X_train, y_train)
   
predicted = model.predict(X_test)
   
cm = confusion_matrix(y_test, predicted)
print()
print ("The accuracy of X Gradient Boosting is : ", accuracy_score(y_test, predicted)*100, "%")