# Meta-Data Overview

## Context
This dataset is multivariate, consisting of 14 key attributes related to heart disease diagnosis, derived from a larger collection of 76 attributes. The primary objective is to predict heart disease presence based on patient attributes.

## Content

### Column Descriptions:
- **id**: Unique patient identifier
- **age**: Patient's age (years)
- **origin**: Study location
- **sex**: Gender (Male/Female)
- **cp**: Chest pain type (typical, atypical, non-anginal, asymptomatic)
- **trestbps**: Resting blood pressure (mm Hg)
- **chol**: Serum cholesterol (mg/dl)
- **fbs**: Fasting blood sugar (> 120 mg/dl)
- **restecg**: Electrocardiographic results (normal, stt abnormality, lv hypertrophy)
- **thalach**: Maximum heart rate achieved
- **exang**: Exercise-induced angina (True/False)
- **oldpeak**: ST depression induced by exercise
- **slope**: Slope of peak exercise ST segment
- **ca**: Number of major vessels (0-3) visible via fluoroscopy
- **thal**: Thalassemia status (normal, fixed defect, reversible defect)
- **num**: Presence of heart disease

### Creators:
- Andras Janosi, M.D. (Hungarian Institute of Cardiology)
- William Steinbrunn, M.D. (University Hospital, Zurich)
- Matthias Pfisterer, M.D. (University Hospital, Basel)
- Robert Detrano, M.D., Ph.D. (V.A. Medical Center, Long Beach and Cleveland Clinic Foundation)

### Relevant Papers:
1. Detrano et al. (1989) - Diagnosis of coronary artery disease
2. Aha & Kibler - Instance-based prediction using the Cleveland database
3. Gennari et al. - Models of incremental concept formation in AI

### Citation Request:
Publications utilizing this data should acknowledge the principal investigators from the respective institutions.

## Aims and Objectives: 
To be defined following exploratory data analysis (EDA).

## Library Imports:
Start by importing the necessary libraries for the project.

In [2]:
import pandas as pd 
import numpy as numpy
import matplotlib.pyplot as plt
import seaborn as sns



In [3]:
# To preprocess the data
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
from sklearn.impute import SimpleImputer, KNNImputer

In [4]:
# import iterative imputer
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

# machine learning
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score

In [8]:
### Load the dataset

df = pd.read_csv("heart_disease_uci.csv")
df.head()

Unnamed: 0,id,age,sex,dataset,cp,trestbps,chol,fbs,restecg,thalch,exang,oldpeak,slope,ca,thal,num
0,1,63,Male,Cleveland,typical angina,145.0,233.0,True,lv hypertrophy,150.0,False,2.3,downsloping,0.0,fixed defect,0
1,2,67,Male,Cleveland,asymptomatic,160.0,286.0,False,lv hypertrophy,108.0,True,1.5,flat,3.0,normal,2
2,3,67,Male,Cleveland,asymptomatic,120.0,229.0,False,lv hypertrophy,129.0,True,2.6,flat,2.0,reversable defect,1
3,4,37,Male,Cleveland,non-anginal,130.0,250.0,False,normal,187.0,False,3.5,downsloping,0.0,normal,0
4,5,41,Female,Cleveland,atypical angina,130.0,204.0,False,lv hypertrophy,172.0,False,1.4,upsloping,0.0,normal,0
