# Heart Disease Dataset - Exploratory Data Analysis (EDA)

### Dataset Overview
This dataset contains medical records of 1025 patients. It includes features related to:
* Demographic: `age`, `sex`
* Chest pain types: `cp`
* Resting blood pressure: `trestbps`
* Serum Cholestoral in mg/dl: `chol`
* Fasting blood sugar > 120 mg/dl: `fbs`
* Resting electrocardiographic (ECG) results: `restecg`
* Maximum heart rate achieved: `thalach`
* Exercise induced angina: `exang`
* ST depression induced exercise relative to rest: `oldpeak`
* Slope of the peak exercise of ST segment: `slope`
* Number of major vessels colored by flourosopy: `ca`
* Thalassemia: `thal` (0 = normal, 1 = fixed defect, 2 = reversable)
* Heart disease diagnosis: `target` (0 = No disease, 1 = Disease)

### Objective
The goal of this analysis is to:
* Understand patient demographics
* Analyze key medical indicators
* Explore relationships between features and heart disease
* Identify patterns that can guide predictive modeling

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
sns.set_theme(style="whitegrid")

### Load the Dataset
We will load the dataset and inspect the first few rows to understand its structure.

In [4]:
# Load the dataset
df = pd.read_csv('../Data/heart.csv')

# Inspect the first 5 rows
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


### Dataset Shape and Info
We check the number of records, columns, data type, and missing values

In [5]:
# Dataset Shape
print("Dataset Shape: ", df.shape)

# Column info and datatypes
df.info()

# Check for missing values
df.isnull().sum()

Dataset Shape:  (1025, 14)
<class 'pandas.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1025 non-null   int64  
 1   sex       1025 non-null   int64  
 2   cp        1025 non-null   int64  
 3   trestbps  1025 non-null   int64  
 4   chol      1025 non-null   int64  
 5   fbs       1025 non-null   int64  
 6   restecg   1025 non-null   int64  
 7   thalach   1025 non-null   int64  
 8   exang     1025 non-null   int64  
 9   oldpeak   1025 non-null   float64
 10  slope     1025 non-null   int64  
 11  ca        1025 non-null   int64  
 12  thal      1025 non-null   int64  
 13  target    1025 non-null   int64  
dtypes: float64(1), int64(13)
memory usage: 112.2 KB


age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64