# Data Science Bootcamp project

## Business problem

A hospital in India has collected data on heart diseases from 1000 subjects, with 12 common clinical characteristics. The goal is to use this data to develop predictive models for early detection of heart diseases and generate insights that can help improve diagnosis and treatment.

Problem Description:
The hospital's doctors want to build an early prediction system that can help identify patients at high risk of heart disease before they develop severe symptoms. The idea is to use historical patient data to train machine learning models that can predict the likelihood of a patient developing heart disease.

However, there is a challenge: the doctors need a system that not only makes predictions but also provides clear and understandable interpretations of the results. They need to understand which characteristics are the most important in influencing the risk of heart disease to apply appropriate preventive treatments and make informed decisions.

## Dataset and Description

The database used is from the following link:
https://data.mendeley.com/datasets/dzz48mvjht/1

This heart disease dataset is acquired from one o f the multispecialty hospitals in India. Over 14 common features which makes it one of the heart disease dataset available so far for research purposes. This dataset consists of 1000 subjects with 12 features. This dataset will be useful for building a early-stage heart disease detection as well as to generate predictive machine learning models.

## Attribute Table

| No. | Attribute                                 | Assigned Code       | Unit            | Type of Data                                                                |
|-----|-------------------------------------------|---------------------|-----------------|----------------------------------------------------------------------------|
| 1   | Patient Identification Number             | patientid           | Number          | Numeric                                                                    |
| 2   | Age                                       | age                 | In Years        | Numeric                                                                    |
| 3   | Gender                                    | gender              | 1,0 (0 = female, 1 = male) | Binary                                                                      |
| 4   | Chest pain type                           | chestpain           | 0,1,2,3 (Value 0: typical angina, Value 1: atypical angina, Value 2: non-anginal pain, Value 3: asymptomatic) | Nominal                                                                     |
| 5   | Resting blood pressure                    | restingBP           | 94-200 (in mm Hg) | Numeric                                                                    |
| 6   | Serum cholesterol                         | serumcholestrol     | 126-564 (in mg/dl) | Numeric                                                                    |
| 7   | Fasting blood sugar                       | fastingbloodsugar   | 0,1 > 120 mg/dl (0 = false, 1 = true) | Binary                                                                      |
| 8   | Resting electrocardiogram results         | restingrelectro     | 0,1,2 (Value 0: normal, Value 1: ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), Value 2: probable or definite left ventricular hypertrophy by Estes' criteria) | Nominal |
| 9   | Maximum heart rate achieved               | maxheartrate        | 71-202          | Numeric                                                                    |
| 10  | Exercise induced angina                   | exerciseangia       | 0,1 (0 = no, 1 = yes) | Binary                                                                      |
| 11  | Oldpeak = ST                              | oldpeak             | 0-6.2           | Numeric                                                                    |
| 12  | Slope of the peak exercise ST segment     | slope               | 1,2,3 (1: upsloping, 2: flat, 3: downsloping) | Nominal                                                                     |
| 13  | Number of major vessels                   | noofmajorvessels    | 0,1,2,3         | Numeric                                                                    |
| 14  | Classification                            | target              | 0,1 (0 = Absence of Heart Disease, 1 = Presence of Heart Disease) | Binary                                |


## Features detail

This is an explanation of each feature of the dataset to be in context of the features that will affect to the analysis. It is also explained their importance in the abscence or presence of heart disease and how the values affect. 

1. Patiend Identification Number.

Is a unique numeric identifier assigned to each individual in dataset. It serves as a distinct label to differentiate one patient from another in medical records. This number helps accurately track and manage patient information, ensuring confidentiality and facilitating efficient data management and communication across different healthcare settings. 

2. Age.

In clinical settings, age is crucial for assessing health risks, diagnosing conditions, and determining appropriate treatments. It plays a significant role in understanding disease prevalence, response to therapies, and overall health outcomes.

3. Gender.

It can influence health outcomes, access to healthcare services, and the experience of health and illness.

4. Chest pain type.

Chest pain type refers to the categorization of symptoms related to chest discomfort or pain that patients report.

Value 0: Typical Angina

Typical angina refers to chest pain or discomfort that occurs when the heart muscle does not receive enough blood flow (ischemia) due to narrowed coronary arteries. It is typically triggered by physical exertion or emotional stress and is relieved by rest or nitroglycerin.
The pain is often described as squeezing, pressure, heaviness, or tightness in the chest. It may radiate to the left arm, shoulder, jaw, or back. Patients may also experience shortness of breath, sweating, or nausea.

Value 1: Atypical Angina

Atypical angina describes chest discomfort that does not fit the typical pattern of classic angina. It may have different characteristics or triggers compared to typical angina.
The pain may be sharp, stabbing, burning, or dull. It may occur with physical exertion but can also occur at rest or during emotional stress. It may not be relieved completely by rest or nitroglycerin.

Value 2: Non-Anginal Pain

Non-anginal pain refers to chest pain or discomfort that is not related to coronary artery disease or ischemia. It may originate from other structures within the chest, such as muscles, bones, nerves, or the gastrointestinal tract.
The pain may vary widely in quality and intensity. It may be sharp, stabbing, or fleeting. It may not worsen with physical exertion and may not be associated with other typical angina symptoms.

Value 3: Asymptomatic

Asymptomatic means that the patient does not experience any chest pain or discomfort at all.
There are no symptoms of chest pain or discomfort reported by the patient.

5. Resting blood pressure.

Measurement of the force of blood when the body is at rest and the heart is not actively pumping blood (i.e., during relaxation). In clinical practice, normal resting blood pressure typically falls within a range considered healthy for adults, generally around 90/60 mm Hg to 120/80 mm Hg. Blood pressure values above this range may indicate hypertension (high blood pressure), which can increase the risk of cardiovascular diseases such as heart attack and stroke. 

6. Serum cholesterol.

Level of cholesterol present in the blood serum. Cholesterol is a waxy substance that is essential for building cell membranes, producing certain hormones, and synthesizing vitamin D. However, high levels of cholesterol in the blood can increase the risk of heart disease and stroke.

When we refer to "serum cholesterol," we are generally talking about the total amount of cholesterol present in the blood serum, which includes both LDL and HDL cholesterol.

Low-Density Lipoprotein (LDL) Cholesterol: Often referred to as "bad" cholesterol because high levels can lead to plaque buildup in the arteries, increasing the risk of heart disease.

High-Density Lipoprotein (HDL) Cholesterol: Known as "good" cholesterol because it helps remove LDL cholesterol from the bloodstream, reducing the risk of heart disease.

![image-2.png](attachment:image-2.png)

7. Fasting blood sugar.

Level of glucose (sugar) in the bloodstream after a period of fasting, typically for at least 8 hours. It provides insights into the body's ability to regulate glucose during periods of fasting. Elevated levels may indicate impaired glucose tolerance or diabetes mellitus, while low levels can sometimes indicate conditions such as hypoglycemia.

In the values of the dataset:

0 = False: This indicates that the individual's fasting blood sugar level is not greater than 120 mg/dl. It suggests that the fasting blood sugar level is within a normal range.

1 = True: This indicates that the individual's fasting blood sugar level is greater than 120 mg/dl. In practical terms, this means their blood sugar level exceeds 120 mg/dl after fasting for at least 8 hours. A value of 1 (true) suggests that the fasting blood sugar level is elevated, which could indicate impaired glucose tolerance or diabetes mellitus.

8. Resting electrocardiogram results.

Resting electrocardiogram (ECG) results refer to the outcomes of an ECG test performed while the patient is at rest. An ECG is a non-invasive test that measures the electrical activity of the heart to identify various cardiac conditions. The resting ECG provides a snapshot of the heart's electrical function and can reveal abnormalities in heart rhythm, structure, and function.

Value 0 = Normal: Within the expected range and no significant abnormalities. Normal heart rhythm and electrical conduction. This suggests that the heart's electrical system is functioning properly, and there are no apparent signs of heart disease or other cardiac conditions at the time of the test.

Value 1 = ST-T Wave Abnormality: This value indicates abnormalities in the ST segment or T wave on the ECG. These abnormalities can include T wave inversions, ST segment elevation, or ST segment depression greater than 0.05 millivolts (mV). ST-T wave abnormalities can suggest various cardiac issues, including ischemia (reduced blood flow to the heart muscle), myocardial infarction (heart attack), or other cardiac stress. These findings may warrant further investigation and monitoring.

Value 2 = Probable or Definite Left Ventricular Hypertrophy by Estes' Criteria.  This value indicates the presence of left ventricular hypertrophy (LVH), which is an enlargement and thickening of the walls of the left ventricle, based on specific criteria established by Estes.  LVH is often a response to increased workload on the heart, commonly due to conditions like hypertension (high blood pressure) or valvular heart disease. Detecting LVH is important as it can increase the risk of cardiovascular events and may influence treatment decisions.

![image.png](attachment:image.png)

9. Maximum heart rate achieved.

Refers to the highest number of beats per minute (bpm) that a person's heart reaches during a stress test, such as an exercise test. This feature helps healthcare providers assess how well the heart is functioning, detect abnormal heart rhythms, and diagnose conditions like coronary artery disease.

10. Exercise induced angina. 

Chest pain or discomfort that occurs during physical exertion or stress and typically subsides with rest. It is a symptom of underlying heart conditions, most commonly coronary artery disease (CAD), where the heart's blood supply is restricted due to narrowed or blocked coronary arteries.

11. Oldpeak = ST.

The term "Oldpeak" refers to the depression or elevation in the ST segment of an electrocardiogram (ECG) reading during peak exercise compared to the baseline level when at rest. The ST segment is part of the ECG tracing that represents the interval between ventricular depolarization and repolarization. Changes in the ST segment during exercise can indicate the presence of myocardial ischemia or other cardiac conditions.

12. Slope of the peak excercise ST.

The "slope of the peak exercise ST segment" refers to the incline of the ST segment during maximal exercise in a cardiac stress test (exercise ECG). The ST segment on an electrocardiogram (ECG) is the portion of the tracing that follows the QRS complex and precedes the T wave.

During a stress test, the change in the ST segment in response to increased physical exertion is observed. Normally, the ST segment should be flat or show slight elevation (upsloping) during exercise. The slope of the ST segment is classified into different types:

Upsloping: Indicates a gradual upward change of the ST segment during exercise. This is considered a normal response.

Flat: The ST segment remains horizontal during exercise. This interpretation can vary depending on clinical context and other factors.

Downsloping: Involves a downward descent of the ST segment during exercise. This response may indicate an underlying issue such as myocardial ischemia (inadequate blood flow to the heart).

Assessing the "slope of the peak exercise ST segment" is crucial in interpreting the stress test as it can provide clues about the presence and severity of coronary artery disease (CAD) or other cardiac conditions. This information is used by doctors to guide clinical decisions, including the need for additional tests, changes in treatment, or recommendations for invasive procedures.

![image-5.png](attachment:image-5.png)

13. Number of major vessels.

The severity of the condition is assessed based on the number of narrowings or blockages in each vessel, as well as the overall extent of coronary involvement, which is crucial for determining appropriate management and treatment, ranging from lifestyle changes and medications to invasive procedures such as angioplasty or coronary bypass surgery.

Left Coronary Artery (LCA): This artery divides into two main branches:

Left Anterior Descending (LAD) or Left Anterior Descending (LAD)
Circumflex Artery
Right Coronary Artery (RCA): This artery supplies blood mainly to the right ventricle of the heart and the posterior part of the heart.

Circumflex Coronary Artery: This artery extends around the left side of the heart and may vary in size and location.

Dominant Right or Left Coronary Artery (LCA): This variation of the right coronary artery can play an important role.

![image-4.png](attachment:image-4.png)

14. Classification. 

Presence or absence of heart disease in patients. It is a binary classification used to indicate whether a patient has been diagnosed with heart disease or not based on various medical assessments and test results. It plays a critical role in diagnosing, managing, and treating patients based on their cardiovascular health status.
