# Predict heart disease using machine learning

This notebook (QA) leverages Python-based ML libraries to build a ML-model that predicts if a person has heart diseases or not
based on their medical attributes.

Steps:
1. Define problem statement
2. Understand Data
3. Model Evaluation (to choose the most appropriate one)
4. Determine features
5. Modelling
6. Experimentation

### 1. Define Problem Statement

> Given clinical attributes of a patient, predict whether they have heart disease or not?

### 2. Understand Data : [Data](https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data/data)
### 3. Evaluation
> Reach 95% accuracy at predicting during Proof-Of-Concept and then pursue the project.

### 4. Features

**Create data dictionary**
- **age:** The person’s age in years
- **sex:** The person’s sex (1 = male, 0 = female)
- **cp:** Chest pain type
  - Value 0: asymptomatic
  - Value 1: atypical angina
  - Value 2: non-anginal pain
  - Value 3: typical angina
- **trestbps:** The person’s resting blood pressure (mm Hg on admission to the hospital)
- **chol:** The person’s cholesterol measurement in mg/dl
- **fbs:** The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)
- **restecg:** Resting electrocardiographic results
  - Value 0: showing probable or definite left ventricular hypertrophy by Estes’ criteria
  - Value 1: normal
  - Value 2: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
- **thalach:** The person’s maximum heart rate achieved
- **exang:** Exercise-induced angina (1 = yes; 0 = no)
- **oldpeak:** ST depression induced by exercise relative to rest
- **slope:** The slope of the peak exercise ST segment
  - Value 0: downsloping
  - Value 1: flat
  - Value 2: upsloping
- **ca:** The number of major vessels (0–3)
- **thal:** A blood disorder called thalassemia
  - Value 0: NULL (dropped from the dataset previously)
  - Value 1: fixed defect (no blood flow in some part of the heart)
  - Value 2: normal blood flow
  - Value 3: reversible defect (a blood flow is observed but it is not normal)
- **target:** Heart disease (1 = no, 0 = yes)


In [9]:
# Import tools

# EDA (exploratory data analysis) and plotting libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Ensure plots appear inside the notebook (only needed for Jupyter Notebooks)
%matplotlib inline

# Import models from SciKit-Learn
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

# Model evaluation
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import RocCurveDisplay
