In [1]:
import pandas as pd


Loading dataset:

In [4]:
# loading dataset and showing top 5 results
hfp_dataset = pd.read_csv('../data/heart.csv')
hfp_dataset.head(5)

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


In [8]:
# checking for null vaues
hfp_dataset.isnull().sum()



Age               0
Sex               0
ChestPainType     0
RestingBP         0
Cholesterol       0
FastingBS         0
RestingECG        0
MaxHR             0
ExerciseAngina    0
Oldpeak           0
ST_Slope          0
HeartDisease      0
dtype: int64

In [10]:
# Get min values for each column
min_values = hfp_dataset.min()

# Get max values for each column
max_values = hfp_dataset.max()

# Display results
print("Min values for each column:")
print(min_values)

print("\nMax values for each column:")
print(max_values)

Min values for each column:
Age                 28
Sex                  F
ChestPainType      ASY
RestingBP            0
Cholesterol          0
FastingBS            0
RestingECG         LVH
MaxHR               60
ExerciseAngina       N
Oldpeak           -2.6
ST_Slope          Down
HeartDisease         0
dtype: object

Max values for each column:
Age                77
Sex                 M
ChestPainType      TA
RestingBP         200
Cholesterol       603
FastingBS           1
RestingECG         ST
MaxHR             202
ExerciseAngina      Y
Oldpeak           6.2
ST_Slope           Up
HeartDisease        1
dtype: object


We can observe a 0 values for RestingBP, Cholesterol and FastingBS parameters, which we should account for when doing our analysis. Lets see how many 0 values are present in the dataset.

In [11]:
# checking for null vaues
zero_counts = (hfp_dataset == 0).sum()

# Display the counts
print("Count of 0s in each column:")
print(zero_counts)

Count of 0s in each column:
Age                 0
Sex                 0
ChestPainType       0
RestingBP           1
Cholesterol       172
FastingBS         704
RestingECG          0
MaxHR               0
ExerciseAngina      0
Oldpeak           368
ST_Slope            0
HeartDisease      410
dtype: int64


As we can see, a lot of values are missing for Cholesterol, and one 0 value RestingBP. Let's print unique values for each column to analyze missing data for string data type parameters:

In [14]:
# Get unique values for all columns
unique_values_per_column = {col: hfp_dataset[col].unique() for col in hfp_dataset.columns}

# Display unique values for each column
for col, unique_vals in unique_values_per_column.items():
    print(f"Unique values in '{col}': {unique_vals}")


Unique values in 'Age': [40 49 37 48 54 39 45 58 42 38 43 60 36 44 53 52 51 56 41 32 65 35 59 50
 47 31 46 57 55 63 66 34 33 61 29 62 28 30 74 68 72 64 69 67 73 70 77 75
 76 71]
Unique values in 'Sex': ['M' 'F']
Unique values in 'ChestPainType': ['ATA' 'NAP' 'ASY' 'TA']
Unique values in 'RestingBP': [140 160 130 138 150 120 110 136 115 100 124 113 125 145 112 132 118 170
 142 190 135 180 108 155 128 106  92 200 122  98 105 133  95  80 137 185
 165 126 152 116   0 144 154 134 104 139 131 141 178 146 158 123 102  96
 143 172 156 114 127 101 174  94 148 117 192 129 164]
Unique values in 'Cholesterol': [289 180 283 214 195 339 237 208 207 284 211 164 204 234 273 196 201 248
 267 223 184 288 215 209 260 468 188 518 167 224 172 186 254 306 250 177
 227 230 294 264 259 175 318 216 340 233 205 245 194 270 213 365 342 253
 277 202 297 225 246 412 265 182 218 268 163 529 100 206 238 139 263 291
 229 307 210 329 147  85 269 275 179 392 466 129 241 255 276 282 338 160
 156 272 240 393 161 228 292 


The 'ChestPainType' values ['ATA', 'NAP', 'ASY', 'TA'] represent different types of chest pain experienced by individuals, often used in medical datasets or heart disease prediction models. Here’s what each abbreviation typically stands for:

ATA (Atypical Angina):
Atypical angina refers to chest pain that does not fit the typical description of angina. It may present with symptoms such as discomfort, pressure, or indigestion, but it is not always related to exertion or relieved by rest. The pain may also be less predictable.
NAP (Non-Anginal Pain):
Non-anginal pain refers to chest pain that is not related to the heart. This type of pain can have other causes, such as gastrointestinal issues, musculoskeletal problems, or anxiety. It is usually not related to physical exertion and does not respond to nitroglycerin (which is commonly used to treat angina).
ASY (Asymptomatic):
Asymptomatic means that the individual does not experience any chest pain or symptoms, despite possibly having underlying heart issues, such as silent ischemia (where the heart muscle is not getting enough oxygen but without causing noticeable symptoms).
TA (Typical Angina):
Typical angina is the classic type of chest pain related to heart disease. It is usually caused by reduced blood flow to the heart due to coronary artery disease. The pain typically occurs with exertion or stress and is relieved by rest or medication (such as nitroglycerin). It is often described as a squeezing or pressure-like sensation in the chest.
Summary:
ATA: Atypical Angina (unpredictable or not classic angina symptoms)
NAP: Non-Anginal Pain (not related to the heart)
ASY: Asymptomatic (no chest pain)
TA: Typical Angina (classic chest pain related to heart disease)

The 'RestingECG' values ['Normal', 'ST', 'LVH'] refer to different types of results that can be seen on a Resting Electrocardiogram (ECG), a test that measures the electrical activity of the heart. Here's what each abbreviation typically means:

Normal:
This indicates that the ECG shows no abnormalities in the heart’s electrical activity. The heart's rhythm, rate, and electrical signals are within normal ranges, and there are no signs of heart disease or other issues.
ST (ST-T Wave Abnormalities):
ST refers to abnormalities in the ST segment and/or T wave of the ECG. These changes can suggest various heart problems, such as:
Myocardial ischemia: Reduced blood flow to the heart, possibly due to coronary artery disease (CAD).
ST elevation or depression: Can indicate issues such as a heart attack (if elevated) or ischemia (if depressed). ST-T wave abnormalities are often assessed during stress tests or episodes of chest pain.
LVH (Left Ventricular Hypertrophy):
LVH means there is thickening of the muscle wall of the heart's left ventricle (the main pumping chamber). This condition often results from high blood pressure or other conditions that cause the heart to work harder. An enlarged left ventricle can make the heart less efficient and is a risk factor for heart failure or arrhythmias.
Summary:
Normal: No abnormalities in the ECG; heart function appears normal.
ST: ST-T wave abnormalities, often indicating myocardial ischemia or other electrical disturbances in the heart.
LVH: Left Ventricular Hypertrophy, indicating an enlarged or thickened left ventricle, which may suggest high blood pressure or heart strain.
These are common classifications used in medical settings to assess heart function and potential heart-related conditions.

The unique values in 'ST_Slope' ['Up', 'Flat', 'Down'] refer to the slope of the ST segment on an electrocardiogram (ECG), especially during a stress test. The slope of the ST segment provides important information about heart function, particularly regarding blood flow to the heart and possible ischemia (reduced blood flow). Here’s what each term means:

Up (Up-sloping):
An upward-sloping ST segment is generally considered benign or normal. It can occur during exercise or stress testing and often doesn't indicate significant heart problems. However, in some cases, it can be observed in less severe cases of ischemia.
Flat (Horizontal):
A flat or horizontal ST segment is more concerning. It often suggests myocardial ischemia, which occurs when the heart muscle isn't getting enough oxygen, typically due to blockages in the coronary arteries. This condition requires further investigation, as it can indicate coronary artery disease (CAD).
Down (Down-sloping):
A down-sloping ST segment is usually the most serious indicator of ischemia or heart disease. It strongly suggests that there is significant reduced blood flow to the heart, often due to severe blockages in the coronary arteries. It is associated with a higher risk of adverse cardiac events and requires immediate medical attention.
Summary:
Up (Up-sloping): Often normal, but can indicate mild issues.
Flat (Horizontal): Suggests potential myocardial ischemia and moderate concern.
Down (Down-sloping): Typically a sign of significant ischemia and a serious concern for heart disease.
The ST slope is an important diagnostic marker in stress tests, helping to evaluate the likelihood of ischemia and guiding further treatment or intervention.