<a href="https://colab.research.google.com/github/masrinez/masrinez/blob/main/Heart_Failure_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project: Heart Failure Prediction

**Table** **of** **Contents**

Introduction

Data Wrangling

Exploratory Data Analysis

Conclusions

# Introduction

# Dataset Description

Rise in health-related issues has made it critical that everyone has easy access to health-care services. However,
 ***Cardiovascular*** **diseases** (e.g *Hypertension*, *Myocardial* *Infraction*, *Health* *failure*) are the number one cause of death in Nigeria and globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide.
Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality rate by heart failure.

Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies.

People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.

# Variables in this Dataset

1) "Age" = indicates the age of the patient.

2) "Anaemia" = indicates the lack of enough healthy red blood cells to carry adequate oxygen to your body's tissues.

3) "Creatine - phosphokinase" = indicates inflammation of muscles (myositis) or muscle damage due to muscle disorders (myopathies) such as muscular dystrophy or to help diagnose rhabdomyolysis if a person has signs and symptoms.

4) "Diabetes" = indicates a chronic disease that occurs either when the pancreas does not produce enough insulin or when the body cannot effectively use the insulin it produces.

5) "Ejection-Fraction" = indicates the measurement, expressed as a percentage, of how much blood the left ventricle pumps out with each contraction.

6) "High Blood Pressure" = indicates condition in which the long-term force of the blood against your artery walls is high enough that it may eventually cause health problems, such as heart disease.

7) "Platelets" = indicates small, colorless cell fragments in our blood that form clots and stop or prevent bleeding.

8) "Serum_creatinine" = serum creatinine level is based on a blood test that measures the amount of creatinine in your blood. It tells how well your kidneys are working.

9) "Serum Sodium" = Sodium is an essential electrolyte that helps maintain the balance of water in and around your cells. It's important for proper muscle and nerve function. It also helps maintain stable blood pressure level.

10) "Sex" = indicates the gender of the patient

11) "Smoking" = indicates the act of inhaling and exhaling the fumes of burning plant material by the patients.



NB//: 1) High Blood Pressure variable will be used as our dependent variable while the others will be used as independent variables

**Question**(**s**) **for** **Analysis**

So we will try answering the following questions below:

1) Predicting mortality caused by Heart Failure?

2) Is Alcohol a major determinant in causing Heart Failure?

3) Is there any significant relationship between Ejection fraction and their elevated blood pressure?

4) Is there any significant relationship between Anaemia and their elevated blood pressure?

5) Is there any significant relationship between Serum  creatinine and their elevated blood pressure?

# Acknowledgements

## Citation

Davide Chicco, Giuseppe Jurman: Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making 20, 16 (2020). (link)

##License
CC BY 4.0

##Splash icon
Icon by Freepik, available on Flaticon.

##Splash banner
Wallpaper by jcomp, available on Freepik.

# **Data Wrangling**

In this section of the report, we are going to perform the following tasks:

1) load in the data.

2) check for cleanliness and orderliness.

3) trim the dataset.

4) clean the dataset for analysis. 


# **Data Gathering**

In [22]:
# import statements for all packages 

# For Data analysis

import pandas as pd
import numpy as np


In [23]:
# For data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [3]:
#For unzipping files

import zipfile

In [20]:
# Upgrade pandas to use dataframe.explode() function. 

!pip install --upgrade pandas==0.25.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [21]:
#Extract all content from zip file

with zipfile.ZipFile('archive.zip', 'r') as myzip:
  myzip.extractall()

# General Properties

*I am going to avoid performing too many operations in each cell. I am going to Create cells freely to explore my data.*

In [24]:
# Loading my data 

df = pd.read_csv('heart_failure_clinical_records_dataset.csv')

#Performing operations to inspect few columns in the dataset

df.head(2)



Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1


In [36]:
#getting information on the characteristices of each attributes in the data

df.info

<bound method DataFrame.info of       age  anaemia  creatinine_phosphokinase  diabetes  ejection_fraction  \
0    75.0        0                       582         0                 20   
1    55.0        0                      7861         0                 38   
2    65.0        0                       146         0                 20   
3    50.0        1                       111         0                 20   
4    65.0        1                       160         1                 20   
..    ...      ...                       ...       ...                ...   
294  62.0        0                        61         1                 38   
295  55.0        0                      1820         0                 38   
296  45.0        0                      2060         1                 60   
297  45.0        0                      2413         0                 38   
298  50.0        0                       196         0                 45   

     high_blood_pressure  platelets  serum_

In [30]:
#checking the size of the dataset
df.shape

(299, 13)

# **The Dataset I am working on possess:**

1) 299 records/rows

2) 13 Fields/Column 

In [33]:
df.dtypes

age                         float64
anaemia                       int64
creatinine_phosphokinase      int64
diabetes                      int64
ejection_fraction             int64
high_blood_pressure           int64
platelets                   float64
serum_creatinine            float64
serum_sodium                  int64
sex                           int64
smoking                       int64
time                          int64
DEATH_EVENT                   int64
dtype: object