> # **Heart Disease Prediction Project**

## **Introduction**

This project is part of the **Digital Egypt Pioneers Initiative (DEPI)**, a scholarship provided by the Ministry of Information Technology and Communication. Under the guidance of our instructor, **Ahmed Moustafa**, our team—**Ahmed Elsayed**, **Ammar Osama**, **Moustafa Alaa**, and **Mahmoud Mohamed Helmy**—aims to develop a comprehensive analysis and prediction model for heart disease.

### **Objective**

The primary goal of this project is to leverage data science techniques, including **Exploratory Data Analysis (EDA)** and **Machine Learning (ML)**, to predict the presence of heart disease based on various patient features. By analyzing the dataset, we hope to uncover key insights into which factors contribute most significantly to heart disease, and subsequently build an effective machine learning model to forecast its occurrence.

### **Dataset**

## Features Description

The following are key features in the dataset used to predict heart disease:

1. **Age**  
   - **Description**: The age of the patient in years.  

2. **Sex**  
   - **Description**: The gender of the patient.  
     - `0` = Male  
     - `1` = Female  

3. **Chest Pain Type**  
   - **Description**: The type of chest pain experienced by the patient.  
     - `1` = Typical angina (related to reduced blood supply to the heart)  
     - `2` = Atypical angina (chest pain not related to the heart)  
     - `3` = Non-anginal pain (chest pain unrelated to the heart)  
     - `4` = Asymptomatic (no chest pain)  

4. **Resting Blood Pressure (BP)**  
   - **Description**: The patient’s resting blood pressure, measured in mm Hg.  

5. **Cholesterol Levels**  
   - **Description**: The serum cholesterol level of the patient, measured in mg/dL.  

6. **Fasting Blood Sugar (FBS > 120)**  
   - **Description**: A binary feature that indicates whether the patient's fasting blood sugar is greater than 120 mg/dL.  
     - `1` = True (FBS > 120)  
     - `0` = False (FBS ≤ 120)  

7. **Max Heart Rate (Max HR)**  
   - **Description**: The maximum heart rate achieved by the patient during a stress test, measured in beats per minute (bpm).  

8. **Exercise-Induced Angina**  
   - **Description**: A binary feature indicating whether the patient experiences angina (chest pain) during physical activity.  
     - `1` = Yes  
     - `0` = No  

9. **ST Depression**  
   - **Description**: The difference in ST segment elevation between rest and exercise, measured in mm. The ST segment is part of an electrocardiogram (ECG). ST depression during exercise suggests that part of the heart is not getting enough oxygen, a condition known as ischemia, which is a strong indicator of heart disease.



> ### **Project Scope**

1. **Exploratory Data Analysis (EDA)**:  
   - We will start by understanding the dataset, visualizing distributions, and examining relationships between variables.
   - Through statistical analysis, we aim to identify key risk factors for heart disease.

2. **Machine Learning (ML)**:  
   - After EDA, we will develop machine learning models to predict the likelihood of heart disease.
   - We will experiment with various classification algorithms, evaluate model performance, and optimize accuracy.

This project not only provides a practical application of data science methodologies but also highlights the importance of predictive analytics in healthcare. By using historical patient data, we aim to aid in early detection and prevention of heart disease, potentially contributing to improved healthcare outcomes.

> **Let's dive into the analysis!**


### Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px


  from pandas.core import (


#### Loading the data

In [7]:
df = pd.read_csv('Heart_Disease_Prediction.csv')

#### Explore the data and datatypes

In [8]:
df.sample(5)

Unnamed: 0,Age,Sex,Chest pain type,BP,Cholesterol,FBS over 120,EKG results,Max HR,Exercise angina,ST depression,Slope of ST,Number of vessels fluro,Thallium,Heart Disease
98,64,0,3,140,313,0,0,133,0,0.2,1,0,7,Absence
231,39,1,4,118,219,0,0,140,0,1.2,2,0,7,Presence
63,60,0,1,150,240,0,0,171,0,0.9,1,0,3,Absence
237,43,1,4,120,177,0,2,120,1,2.5,2,0,7,Presence
36,61,1,4,140,207,0,2,138,1,1.9,1,1,7,Presence


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 270 entries, 0 to 269
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Age                      270 non-null    int64  
 1   Sex                      270 non-null    int64  
 2   Chest pain type          270 non-null    int64  
 3   BP                       270 non-null    int64  
 4   Cholesterol              270 non-null    int64  
 5   FBS over 120             270 non-null    int64  
 6   EKG results              270 non-null    int64  
 7   Max HR                   270 non-null    int64  
 8   Exercise angina          270 non-null    int64  
 9   ST depression            270 non-null    float64
 10  Slope of ST              270 non-null    int64  
 11  Number of vessels fluro  270 non-null    int64  
 12  Thallium                 270 non-null    int64  
 13  Heart Disease            270 non-null    object 
dtypes: float64(1), int64(12), 

> no missing values 