# Heart Failure Prediction

## About Dataset

### Source

This dataset was created by combining 5 different heart datasets with over 11 common features which makes it the largest heart disease dataset available so far for research purposes.
The dataset used can be found on the following link: [Heart Failure Prediction Dataset](https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction)

### Features

|    Feature     |  Type   | Description                                                 | Values                                                                                                                         |
| :------------: | :-----: | :---------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------- |
|      Age       |  int64  | Age of the patient                                          | Years                                                                                                                          |
|      Sex       | object  | Sex of the patient                                          | **M**: Male <br/> **F**: Female                                                                                                |
| ChestPainType  | object  | Chest pain type                                             | **TA**: Typical Angina <br/> **ATA**: Atypical Angina <br/> **NAP**: Non-Anginal Pain <br/> **ASY**: Asymptomatic              |
|   RestingBP    |  int64  | Resting blood pressure                                      | mm/Hg                                                                                                                          |
|  Cholesterol   |  int64  | Serum cholesterol                                           | mm/dl                                                                                                                          |
|   FastingBS    |  int64  | Fasting blood sugar                                         | **1**: If FastingBS > 120 mg/dl <br/> **0**: Otherwise                                                                         |
|   RestingECG   | object  | Resting electrocardiogram results                           | **Normal**: Normal <br/> **ST**: Having ST-T wave abnormality <br/> **LVH**: Probable or definite left ventricular hypertrophy |
|     MaxHR      |  int64  | Maximum heart rate achieved                                 | Numeric value between **60** and **202**                                                                                       |
| ExerciseAngina | object  | Exercise-induced angina                                     | **Y**: Yes <br/> **N**: No                                                                                                     |
|    Oldpeak     | float64 | Oldpeak = [ST](https://en.wikipedia.org/wiki/ST_depression) | Numeric value measured in depression                                                                                           |
|    ST_Slope    | object  | The slope of the peak exercise ST segment                   | **Up**: Upsloping <br/> **Flat**: Flat <br/> **Down**: Downsloping                                                             |
|  HeartDisease  |  int64  | Output class                                                | **1**: Heart disease <br/> **0**: Normal                                                                                       |

**Features With Missing Data:**  
`None`

**Features To Encode:**

|    Feature     | Values                                           | Encoder       |
| :------------: | :----------------------------------------------- | :------------ |
|      Sex       | **M** <br/> **F**                                | OneHotEncoder |
| ChestPainType  | **TA** <br/> **ATA** <br/> **NAP** <br/> **ASY** | OneHotEncoder |
|   RestingECG   | **Normal** <br/> **ST** <br/> **LVH**            | OneHotEncoder |
| ExerciseAngina | **Y** <br/> **N**                                | OneHotEncoder |
|    ST_Slope    | **Up** <br/> **Flat** <br/> **Down**             | OneHotEncoder |

**Features To Scale:**

|   Feature   | Scaler         |
| :---------: | :------------- |
|     Age     | StandardScaler |
|  RestingBP  | StandardScaler |
| Cholesterol | StandardScaler |
|    MaxHR    | StandardScaler |
|   Oldpeak   | StandardScaler |

## Preprocessing

**Steps**
1. `Importing` Libraries and Dataset
2. Dealing With `Missing Data`
3. `Encoding` Categorical Data
4. `Splitting` The Dataset Into **Training Set** and **Test Set**
5. Features `Scalling`
6. Deal With `Outliers`

### 1. `Importing` Libraries and Dataset

In [21]:
# Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Importing Dataset
heartDB = pd.read_csv("HeartDB.csv")


heartDB.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


### 2. Dealing With `Missing Data`

In [2]:
heartDB.isnull()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...
913,False,False,False,False,False,False,False,False,False,False,False,False
914,False,False,False,False,False,False,False,False,False,False,False,False
915,False,False,False,False,False,False,False,False,False,False,False,False
916,False,False,False,False,False,False,False,False,False,False,False,False


In [3]:
heartDB.isnull().values.any()

False

In [21]:
##There is no missing data in the dataset.

### 3. `Encoding` Categorical Data

In [4]:
# OneHotEncoder Object
# Set "sparse=False" to return 2d-array insted of matrix
oheEncoder = OneHotEncoder(sparse=False)

# Determine  categorical features
categoricalColumns = heartDB.select_dtypes("object")

# Unfortunately, "fit_transform()" return an 2d-array instead of dataframe
arrayHotEncoded = oheEncoder.fit_transform(categoricalColumns)
# Convert it to df
dataHotEncoded = pd.DataFrame(arrayHotEncoded)
# Rename new df features
dataHotEncoded.columns = oheEncoder.get_feature_names_out(categoricalColumns.columns)

# Merge the original dataset columns (without categorical columns) + new columns after encoding
encodedHeartDB = pd.concat([heartDB.drop(categoricalColumns, axis=1), dataHotEncoded], axis=1)

encodedHeartDB.head()

Unnamed: 0,Age,RestingBP,Cholesterol,FastingBS,MaxHR,Oldpeak,HeartDisease,Sex_F,Sex_M,ChestPainType_ASY,...,ChestPainType_NAP,ChestPainType_TA,RestingECG_LVH,RestingECG_Normal,RestingECG_ST,ExerciseAngina_N,ExerciseAngina_Y,ST_Slope_Down,ST_Slope_Flat,ST_Slope_Up
0,40,140,289,0,172,0.0,0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
1,49,160,180,0,156,1.0,1,1.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
2,37,130,283,0,98,0.0,0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0
3,48,138,214,0,108,1.5,1,1.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
4,54,150,195,0,122,0.0,0,0.0,1.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0


### 4. `Splitting` The Dataset

In [37]:
xFrame = encodedHeartDB.drop(['HeartDisease'], axis=1)
yFrame = encodedHeartDB["HeartDisease"]
xTrain, xTest, yTrain, yTest = train_test_split(xFrame, yFrame, test_size=0.25, random_state=77)

xTrainFrame = pd.DataFrame(xTrain)
yTrainFrame = pd.DataFrame(yTrain)
xTestFrame = pd.DataFrame(xTest)
yTestFrame = pd.DataFrame(yTest)


In [38]:
xTrainFrame

Unnamed: 0,Age,RestingBP,Cholesterol,FastingBS,MaxHR,Oldpeak,Sex_F,Sex_M,ChestPainType_ASY,ChestPainType_ATA,ChestPainType_NAP,ChestPainType_TA,RestingECG_LVH,RestingECG_Normal,RestingECG_ST,ExerciseAngina_N,ExerciseAngina_Y,ST_Slope_Down,ST_Slope_Flat,ST_Slope_Up
390,51,140,0,0,60,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
485,63,139,217,1,128,1.2,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0
311,60,125,0,1,110,0.1,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
530,50,133,218,0,128,1.1,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
173,49,140,187,0,172,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
293,65,115,0,0,93,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
235,39,120,200,0,160,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
596,57,122,264,0,100,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
607,53,144,300,1,128,1.5,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0


### 5. Features `Scalling`

In [69]:
# Creating an instance of StandardScaler class to use
stdScaler = StandardScaler()  

# Variable that contains names of columns to be scaled (encoded columns are excluded)
scaleColumns=['Age','RestingBP','Cholesterol','FastingBS','MaxHR','Oldpeak']

# Scaling train data
# saving the scaled array into dataframe and naming columns
xTrainScaled=stdScaler.fit_transform(xTrainFrame[Scale_Columns])
xTrainScaledFrame=pd.DataFrame(xTrainScaled, columns =Scale_Columns)

# Scaling test data
xTestScaled=stdScaler.transform(xTestFrame[Scale_Columns])

# saving the scaled array into dataframe and naming columns
xTestScaledFrame=pd.DataFrame(xTestScaled, columns= Scale_Columns)
xTestScaledFrame

Unnamed: 0,Age,RestingBP,Cholesterol,FastingBS,MaxHR,Oldpeak
0,1.191353,-0.737523,-0.182404,-0.550482,0.154929,-0.442030
1,0.030823,0.144939,0.207328,-0.550482,0.154929,2.042115
2,-0.813199,-0.185984,0.225455,-0.550482,-0.954567,-0.824206
3,-1.340713,-0.737523,0.887093,-0.550482,1.343674,-0.824206
4,-0.707697,0.365555,-0.037387,-0.550482,0.353053,0.131234
...,...,...,...,...,...,...
225,0.663839,-0.737523,0.442980,-0.550482,-0.043196,-0.824206
226,0.347331,0.586170,0.660505,1.816590,0.947426,1.086675
227,-0.707697,-1.289061,-1.786648,1.816590,-0.637569,-0.824206
228,-0.602194,-0.516907,0.696759,-0.550482,1.185175,-0.346486


### 6. Deal With `Outliers`