# Heart Attack Prediction

Data Set - https://www.kaggle.com/ronitf/heart-disease-uci

## Table of Content

1. **[Header Files](#lib)**
2. **[About Data Set](#about)**
3. **[Data Preparation](#prep)**
    - 3.1 - **[Read Data](#read)**
    - 3.2 - **[Analysing Missing Values](#miss)**
    - 3.3 - **[Removing Outliers](#outliers)**
4. **[4.Creating and Saving the Final Pipeline](#pipe)**




<a id="lib"></a>
## 1. Import Libraries

In [1]:
import pandas as pd
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt

import pickle

from sklearn.pipeline import Pipeline
from sklearn.impute import KNNImputer
from sklearn.preprocessing import Normalizer

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestRegressor

plt.rcParams['figure.figsize']=[12,8]

<a id="about"></a>
## 2. About the Dataset


1.Age : Age of the patient

2.Sex : Sex of the patient (1=Male,0=Female)

3.exang: exercise induced angina (1 = yes; 0 = no)

4.ca: number of major vessels (0-3)

5.cp : Chest Pain type

Value 1: typical angina Value 2: atypical angina Value 3: non-anginal pain Value 4: asymptomatic

6.trtbps : resting blood pressure (in mm Hg)

7.chol : cholestoral in mg/dl fetched via BMI sensor

8.fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

9.rest_ecg : resting electrocardiographic results

Value 0: normal Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

10.slp: The slope of the peak exercise ST segment

11.oldpeak: ST depression induced by exercise relative to rest

12.thalach : maximum heart rate achieved

13.thal : Thal rate

Depenedent/Target Varibale :
14.output : 0= Lower chance of heart attack 1= Higher chance of heart attack

<a id="prep"></a>
## 3. Data Preperation

<a id="read"></a>
## 3.1 Read Data

In [2]:
df=pd.read_csv('heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [4]:
df.tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
302,57,0,1,130,236,0,0,174,0,0.0,1,1,2,0


In [5]:
df.shape

(303, 14)


## 3.2 Analysing Missing Values

In [6]:
df.isnull().sum()

age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

## Spliting into Train and Test

In [4]:
X=df.drop('target',1)
y=df['target']

In [5]:
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y,test_size=0.3,random_state=48)

## Numerical Variables

In [11]:
df_num=X[['age','trestbps','thalach','chol','oldpeak']]
df_num

Unnamed: 0,age,trestbps,thalach,chol,oldpeak
0,63,145,150,233,2.3
1,37,130,187,250,3.5
2,41,130,172,204,1.4
3,56,120,178,236,0.8
4,57,120,163,354,0.6
...,...,...,...,...,...
298,57,140,123,241,0.2
299,45,110,132,264,1.2
300,68,144,141,193,3.4
301,57,130,115,131,1.2


In [12]:
# Creating Pipeline Specifying Numeriacal Variables
feat_num=list(df_num.columns)
print(feat_num)
pickle.dump(feat_num,open('feat_numv1','wb'))

['age', 'trestbps', 'thalach', 'chol', 'oldpeak']


In [13]:
# Pipeline for  imputing and preprocessing Numerical Values
num_pipe=Pipeline([('imputer',KNNImputer()),('normalize',Normalizer())])

## Categorical Variables

In [14]:
# Creating Pipeline Specifying Categorical Variables
df_cat=X.drop(['age','trestbps','thalach','chol','oldpeak'],axis=1)
feat_cat=list(df_cat.columns)
pickle.dump(feat_cat,open('feat_catv1','wb'))

Unnamed: 0,sex,cp,fbs,restecg,exang,slope,ca,thal
0,1,3,1,0,0,0,0,1
1,1,2,0,1,0,0,0,2
2,0,1,0,0,0,2,0,2
3,1,1,0,1,0,2,0,2
4,0,0,0,1,1,2,0,2
...,...,...,...,...,...,...,...,...
298,0,0,0,1,1,1,0,3
299,1,3,0,1,0,1,0,3
300,1,0,1,1,0,1,2,3
301,1,0,0,1,1,1,1,3


In [None]:
# Pipeline for preprocessing numeric and categorical variables together
data_pipeline=ColumnTransformer([('numeric',num_pipe,feat_num),
                                ('categorical',OneHotEncoder(),feat_cat)],
                                remainder='passthrough')

<a id="pipe"></a>
## 4.Creating and Saving the Final Pipeline

In [16]:
# Final Pipeline Including Model
full_pipe=Pipeline([('pre_process',data_pipeline),('model',RandomForestRegressor(max_depth=7, max_features=10, max_leaf_nodes=5,
                       n_estimators=170, random_state=10))])

In [17]:
# Training the pipeline
full_pipe.fit(xtrain,ytrain)

Pipeline(steps=[('pre_process',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   KNNImputer()),
                                                                  ('normalize',
                                                                   Normalizer())]),
                                                  ['age', 'trestbps', 'thalach',
                                                   'chol', 'oldpeak']),
                                                 ('categorical',
                                                  OneHotEncoder(),
                                                  ['sex', 'cp', 'fbs',
                                                   'restecg', 'exang', 'slope',
                                                   'ca', 'thal'])])),
                (

In [19]:
# Predicting values through Pipeline
full_pipe.predict(xtest[:1])

array([0.78483351])

In [20]:
# Creating a pickle file and saving the tuned model so it can be used for deployment on any other platform
pickle.dump(full_pipe,open('full_pipeline','wb'))