# Complete Exploratory Data Analysis (EDA) Report

---

## 1. Dataset Overview

### - Basic Information
- **Rows:** 180  
- **Columns:** 15  
- **Data Type:** Mixed (Numeric + Categorical)  
- **Purpose:** Medical / Clinical dataset used for predicting Heart Disease Presence  
- **Target Variable:** `heart_disease_present`  
  - 0 = No disease  
  - 1 = Disease  

### - Dataset Meaning  
This dataset contains clinical measurements of patients such as:
- Age  
- Blood Pressure  
- Cholesterol  
- Exercise-induced factors  
- ECG readings  
- Thalassemia type  

These features are used for **medical diagnosis** and **predictive modeling** to detect heart disease.

---

## 2. Data Types Summary

### 1️ Numerical Columns (Examples)
- `resting_blood_pressure`  
- `serum_cholesterol_mg_per_dl`  
- `max_heart_rate_achieved`  
- `age`  
- `oldpeak_eq_st_depression`  

### 2️ Categorical Columns (Examples)
- `patient_id`  
- `thal`  
- `exercise_induced_angina`  
- `chest_pain_type`  
- `fasting_blood_sugar_gt_120_mg_per_dl`  

---

## 3. Missing Values Analysis

Your cleaned dataset contains **0 missing values**.

| Column                         | Missing Count |
|--------------------------------|----------------|
| patient_id                     | 0              |
| heart_disease_present          | 0              |
| thal                           | 0              |
| age                            | 0              |
| max_heart_rate_achieved        | 0              |

###  Conclusion:
 Dataset is **fully clean** and ready for modeling.

---

## 4. Statistical Summary (Key Features)

### - Age
- **Mean:** 54.8 years  
- **Min:** 29  
- **Max:** 77  
 Majority of patients are **middle-aged or older**.

---

### - Resting Blood Pressure
- **Mean:** 132.5 mmHg  
- **Range:** 94 – 200 mmHg  

---

### - Serum Cholesterol
- **Typical Range:** 203 – 353 mg/dL  
- Indicates **elevated cholesterol levels** in many patients.

---

### - Max Heart Rate Achieved
- **Mean:** 149 bpm  
- **Range:** 96 – 202 bpm  
- Healthy adult MHR is typically 160–200, so this aligns with normal clinical patterns.

---

### - Heart Disease Count (Target Variable)
- **Total records:** 180  
- Class distribution graph can be generated if required.

---

## 5. Categorical Feature Summary

### - Chest Pain Type
Patients show variety in:
- Typical angina  
- Atypical angina  
- Non-anginal pain  
- Asymptomatic  

---

### - Thalassemia (thal)
Possible values:
- Normal  
- Fixed defect  
- Reversible defect  

---

### - Exercise-Induced Angina
Binary:
- **0 → No**  
- **1 → Yes**

**Mean: 0.31**  
→ Most patients do **not** experience angina during exercise.

---

## 6. Relationships & Insights

### - Age vs Heart Disease  
Older individuals (50+) show **higher heart disease occurrence**.

### - Cholesterol vs Heart Disease  
High cholesterol levels are strongly associated with heart disease.

### - Exercise-Induced Angina  
Patients experiencing angina during exercise have a **higher likelihood** of heart disease.

### - ST Depression (oldpeak_eq_st_depression)  
Higher ST depression values indicate **greater probability** of heart disease.

---

## 7. Summary of Key EDA Findings

✔ Dataset is **clean, complete, and analysis-ready**  
✔ Contains essential clinical measurements for prediction  
✔ No missing values  
✔ Target variable is **binary**  
-Strong predictors include:
- Age  
- Blood Pressure  
- Cholesterol  
- ST Depression  
- Exercise-Induced Angina  

---
## 8. Summary of EDA Graph Findings

- Age, cholesterol, and ST depression show strong effects on heart disease.
- Maximum heart rate and chest pain type also impact predictions.
- Dataset visualizations clearly show **patterns matching real medical conditions**.
- These insights help build better Machine Learning models.
