In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import pingouin as pg
import scipy.stats as stats

# Heart Failure Data Analysis

In this notebook, we conduct data analysis on the popular [Heart Failure Prediction](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data) dataset from Kaggle. The dataset contains 12 features that can be used to predict mortality by heart failure. As such, successful analysis could shed light on some important insights.

## Context

Contrary to what it may sound like, heart failure does not mean the heart has stopped, and most certainly does not imply sudden death. However, it is a serious condition of the heart that causes it to pump less blood than usual affecting normal body functions. More than 6 million adults have heart failure in the United States alone, and it can also afflict children. Often times, heart failure is caused by other underlying conditions such as coronary artery disease, and high blood pressure. While it has no cure, changes to the lifestyle and directed treatment can help people live longer and better lives. As mentioned above, the main task of this dataset is to predict whether a patient <b>dies</b> from heart failure.

Below we list and explain the features of the dataset, while this is in no way exhaustive it can help in analysis down the line.
- Anemia: A condition in which the blood does not have enough healthy red blood cells, while this is mostly a mild condition, it does cause weakness and disrupts normal routines of individuals.
- Creatinine Phosphokinase: This is an enzyme that's found in various tissues. It's primary function is to catalyze the conversion of creatine to phosphocreatine, which is used as an energy store in tissues. Elevated levels of CPK are often used as a marker of CK-rich tissues such as in a myocardial infarction which is a heart attack.
- Diabetes: This is a condition in which the body's ability to produce or respond to the hormone insulin is impaired, resulting in abnormal metabolism of carbohydrates and elevated levels of glucose in the blood and urine. If the glucose levels are not controlled appropriately, it can lead to the damage of various organs.
- Ejection Fraction: This is a measurement of how much blood the left ventricle pumps out with each contraction. It is used to help classify heart failure and guide treatment. The normal range is between 50% and 70%.
- High Blood Pressure: This is a condition in which the blood pressure of the individual is in an elevated levels. If this isn't controlled it can lead to various complications such as heart disease, stroke, and kidney failure.
- Platelets: These are blood cells that help the body form clots to stop bleeding. If the platelet count is too low, internal bleeding can occur.
- Serum Creatinine: This is a waste product that occurs due to the normal energy producing processes that occur in the muscles. This can be tested to see how well the kidneys are filtering the bloody, higher than normal levels could indicate damage to the kidneys.
- Serum Sodium: This is a measure of how much Sodium is present in the blood. Sodium is an important mineral that is useful for the regulation of fluids and helps balance the pH of the body. Higher or lower values than the normal could indicate a problem in the kidney amongst other things.
- Sex: Gender of patient
- Smoking: Whether the patient smokes or not
- Time: Follow-up period in days.
- Death Event: Whether the patient died during the follow-up period

The key aspect to note is that all of these patients have sufferred from heart failure, the task is to predict how many died. The key will be to analyze the data with this mindset.

In [4]:
heart_data_v1 = pd.read_csv('./heart_failure_clinical_records_dataset.csv')
heart_data_v1.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1
