# Heart Failure Prediction

## About Dataset

### Source

This dataset was created by combining 5 different heart datasets with over 11 common features which makes it the largest heart disease dataset available so far for research purposes.
The dataset used can be found on the following link: [Heart Failure Prediction Dataset](https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction)

### Features

|    Feature     |  Type   | Description                                                 | Values                                                                                                                         |
| :------------: | :-----: | :---------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------- |
|      Age       |  int64  | Age of the patient                                          | Years                                                                                                                          |
|      Sex       | object  | Sex of the patient                                          | **M**: Male <br/> **F**: Female                                                                                                |
| ChestPainType  | object  | Chest pain type                                             | **TA**: Typical Angina <br/> **ATA**: Atypical Angina <br/> **NAP**: Non-Anginal Pain <br/> **ASY**: Asymptomatic              |
|   RestingBP    |  int64  | Resting blood pressure                                      | mm/Hg                                                                                                                          |
|  Cholesterol   |  int64  | Serum cholesterol                                           | mm/dl                                                                                                                          |
|   FastingBS    |  int64  | Fasting blood sugar                                         | **1**: If FastingBS > 120 mg/dl <br/> **0**: Otherwise                                                                         |
|   RestingECG   | object  | Resting electrocardiogram results                           | **Normal**: Normal <br/> **ST**: Having ST-T wave abnormality <br/> **LVH**: Probable or definite left ventricular hypertrophy |
|     MaxHR      |  int64  | Maximum heart rate achieved                                 | Numeric value between **60** and **202**                                                                                       |
| ExerciseAngina | object  | Exercise-induced angina                                     | **Y**: Yes <br/> **N**: No                                                                                                     |
|    Oldpeak     | float64 | Oldpeak = [ST](https://en.wikipedia.org/wiki/ST_depression) | Numeric value measured in depression                                                                                           |
|    ST_Slope    | object  | The slope of the peak exercise ST segment                   | **Up**: Upsloping <br/> **Flat**: Flat <br/> **Down**: Downsloping                                                             |
|  HeartDisease  |  int64  | Output class                                                | **1**: Heart disease <br/> **0**: Normal                                                                                       |

**Features With Missing Data:**  
`None`

**Features To Encode:**

|    Feature     | Values                                           | Encoder       |
| :------------: | :----------------------------------------------- | :------------ |
|      Sex       | **M** <br/> **F**                                | OneHotEncoder |
| ChestPainType  | **TA** <br/> **ATA** <br/> **NAP** <br/> **ASY** | OneHotEncoder |
|   RestingECG   | **Normal** <br/> **ST** <br/> **LVH**            | OneHotEncoder |
| ExerciseAngina | **Y** <br/> **N**                                | OneHotEncoder |
|    ST_Slope    | **Up** <br/> **Flat** <br/> **Down**             | OneHotEncoder |

**Features To Scale:**

|   Feature   | Scaler         |
| :---------: | :------------- |
|     Age     | StandardScaler |
|  RestingBP  | StandardScaler |
| Cholesterol | StandardScaler |
|    MaxHR    | StandardScaler |
|   Oldpeak   | StandardScaler |

## Preprocessing

**Steps**
1. `Importing` Libraries and Dataset
2. Dealing With `Missing Data`
3. `Encoding` Categorical Data
4. `Splitting` The Dataset Into **Training Set** and **Test Set**
5. Features `Scalling`
6. Deal With `Outliers`

### 1. `Importing` Libraries and Dataset

In [1]:
# Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Importing Dataset
heartDB = pd.read_csv("HeartDB.csv")
xFrame = heartDB.iloc[:,:-1]
yFrame = heartDB.iloc[:,-1]
xArray = xFrame.values # 2D Array
yArray = yFrame.values # 1D Array

heartDB.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


### 2. Dealing With `Missing Data`

`None`

### 3. `Encoding` Categorical Data

### 4. `Splitting` The Dataset

### 5. Features `Scalling`

### 6. Deal With `Outliers`