#Heart Disease Analysis 2022
Case Study : Cleveland Ohio, USA   
Analyst: Frank Ebere  
Data Ref https://www.kaggle.com/datasets/ritwikb3/heart-disease-cleveland

Heart disease is a serious health concern that affects millions of people in the United States. According to the Centers for Disease Control and Prevention (CDC), heart disease is the leading cause of death for both men and women in the country. In this data report, we will focus on heart disease in Cleveland, Ohio, one of the major cities in the United States. The report aims to provide a comprehensive analysis of the prevalence and risk factors associated with heart disease in Cleveland, based on data from various sources such as medical records, surveys, and public health reports. By understanding the trends and patterns of heart disease in Cleveland, we hope to inform policymakers, healthcare providers, and the public on how to improve heart health and reduce the burden of heart disease in the city.



Libriares required for this analysis.

Pandas  
Numpy  
Drive

**Pandas** will be used for manipulatin and analyzing the data.  
**Drive** We will be reading the CSV from Google drive using the Drive Library.  
**Numpy** - The NumPy support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

In [25]:
import numpy as np
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**Part 1**

We will review the dataset as exported from Kaggle and rename each column header matching the supporting details below, referencing all NOMINAL data.




**Age**:  
Patients Age in years (Numeric)

**Sex**:   
Gender (Male : 1; Female : 0) (Nominal)

**cp**:   
Type of chest pain experienced by patient. This term categorized into 4 category.
0 typical angina, 1 atypical angina, 2 non- anginal pain, 3 asymptomatic (Nominal)


**trestbps**:   
patient's level of blood pressure at resting mode in mm/HG (Numerical)

**chol**:   
Serum cholesterol in mg/dl (Numeric)


**fbs**:   
Blood sugar levels on fasting > 120 mg/dl represents as 1 in case of true and 0 as false (Nominal)


**restecg**:   
Result of electrocardiogram while at rest are represented in 3 distinct values
0 : Normal 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
2: showing probable or definite left ventricular hypertrophyby Estes' criteria (Nominal)


**thalach**:   
Maximum heart rate achieved (Numeric)

**exang**:   
Angina induced by exercise 0 depicting NO 1 depicting Yes (Nominal)

**oldpeak**:   
Exercise induced ST-depression in relative with the state of rest (Numeric)

**slope**:   
ST segment measured in terms of slope during peak exercise
0: up sloping; 1: flat; 2: down sloping(Nominal)

**ca**:   
The number of major vessels (0–3)(nominal)

**thal**:   
A blood disorder called thalassemia
0: NULL 1: normal blood flow 2: fixed defect (no blood flow in some part of the heart) 3: reversible defect (a blood flow is observed but it is not normal(nominal)


**target**:   
It is the target variable which we have to predict 1 means patient is suffering from heart disease and 0 means patient is normal.

In [26]:
#Reading/Reviewing and understanding the dataset from google drive.
df = pd.read_csv('/content/drive/MyDrive/Datasets/Heart_disease_cleveland_new.csv')
df.head()


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,0,145,233,1,2,150,0,2.3,2,0,2,0
1,67,1,3,160,286,0,2,108,1,1.5,1,3,1,1
2,67,1,3,120,229,0,2,129,1,2.6,1,2,3,1
3,37,1,2,130,250,0,0,187,0,3.5,2,0,1,0
4,41,0,1,130,204,0,2,172,0,1.4,0,0,1,0


**Step 1.  
Rename Column Header**

In [27]:
df.columns

#Replacing the columns with the Header provided above, Refer to the details described above

Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
      dtype='object')

In [28]:
#Replacing the columns with the Header provided above, Refer to the details described above


df.columns= ['age', 'sex', 'chest_pain_type', 'resting_blood_pressure', 'cholesterol', 'fasting_blood_sugar', 'rest_ecg', 'max_heart_rate_achieved',
       'exercise_induced_angina', 'st_depression', 'st_slope','ca','thalassemia','target']

df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,1,0,145,233,1,2,150,0,2.3,2,0,2,0
1,67,1,3,160,286,0,2,108,1,1.5,1,3,1,1
2,67,1,3,120,229,0,2,129,1,2.6,1,2,3,1
3,37,1,2,130,250,0,0,187,0,3.5,2,0,1,0
4,41,0,1,130,204,0,2,172,0,1.4,0,0,1,0


**Step 2.  
Referencing the details provided, We are Converting selected Columns from an integer to Object.**

Using a dictionary approach, All Nominal category from the details provided above will be  converted in this step.

In [29]:
#In this step we will use dictionary as a method to replace the selected values in the dataset, 
#Please reference the details above.


df['sex'] = df['sex'].replace({1 : 'Male', 0: 'Female'})
df['sex'].value_counts()
df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,Male,0,145,233,1,2,150,0,2.3,2,0,2,0
1,67,Male,3,160,286,0,2,108,1,1.5,1,3,1,1
2,67,Male,3,120,229,0,2,129,1,2.6,1,2,3,1
3,37,Male,2,130,250,0,0,187,0,3.5,2,0,1,0
4,41,Female,1,130,204,0,2,172,0,1.4,0,0,1,0


In [30]:
df['chest_pain_type'] = df['chest_pain_type'].replace({0 : 'typical angina', 1 : 'atypical angina', 2 : 'non- anginal pain', 3 : 'asymptomatic'})

df['chest_pain_type'].value_counts()

df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,Male,typical angina,145,233,1,2,150,0,2.3,2,0,2,0
1,67,Male,asymptomatic,160,286,0,2,108,1,1.5,1,3,1,1
2,67,Male,asymptomatic,120,229,0,2,129,1,2.6,1,2,3,1
3,37,Male,non- anginal pain,130,250,0,0,187,0,3.5,2,0,1,0
4,41,Female,atypical angina,130,204,0,2,172,0,1.4,0,0,1,0


In [31]:
df['fasting_blood_sugar'] = df['fasting_blood_sugar'].replace({1 : 'TRUE', 0: 'FALSE'})
df['fasting_blood_sugar'].value_counts()
df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,Male,typical angina,145,233,True,2,150,0,2.3,2,0,2,0
1,67,Male,asymptomatic,160,286,False,2,108,1,1.5,1,3,1,1
2,67,Male,asymptomatic,120,229,False,2,129,1,2.6,1,2,3,1
3,37,Male,non- anginal pain,130,250,False,0,187,0,3.5,2,0,1,0
4,41,Female,atypical angina,130,204,False,2,172,0,1.4,0,0,1,0


In [32]:
df['rest_ecg'] = df['rest_ecg'].replace({0 : 'Normal', 1: 'Abnormality in ST-T wave',
2: 'efinite left ventricular hypertrophyby'})

df['rest_ecg'].value_counts()
df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,Male,typical angina,145,233,True,efinite left ventricular hypertrophyby,150,0,2.3,2,0,2,0
1,67,Male,asymptomatic,160,286,False,efinite left ventricular hypertrophyby,108,1,1.5,1,3,1,1
2,67,Male,asymptomatic,120,229,False,efinite left ventricular hypertrophyby,129,1,2.6,1,2,3,1
3,37,Male,non- anginal pain,130,250,False,Normal,187,0,3.5,2,0,1,0
4,41,Female,atypical angina,130,204,False,efinite left ventricular hypertrophyby,172,0,1.4,0,0,1,0


In [33]:
df['exercise_induced_angina'] = df['exercise_induced_angina'].replace({0: 'NO', 1 : 'YES'})

df['exercise_induced_angina'].value_counts()

df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,Male,typical angina,145,233,True,efinite left ventricular hypertrophyby,150,NO,2.3,2,0,2,0
1,67,Male,asymptomatic,160,286,False,efinite left ventricular hypertrophyby,108,YES,1.5,1,3,1,1
2,67,Male,asymptomatic,120,229,False,efinite left ventricular hypertrophyby,129,YES,2.6,1,2,3,1
3,37,Male,non- anginal pain,130,250,False,Normal,187,NO,3.5,2,0,1,0
4,41,Female,atypical angina,130,204,False,efinite left ventricular hypertrophyby,172,NO,1.4,0,0,1,0


In [34]:
df['st_slope'] = df['st_slope'].replace({0: 'up sloping', 1: 'flat', 2: 'down sloping'})
df['st_slope'].value_counts()
df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,Male,typical angina,145,233,True,efinite left ventricular hypertrophyby,150,NO,2.3,down sloping,0,2,0
1,67,Male,asymptomatic,160,286,False,efinite left ventricular hypertrophyby,108,YES,1.5,flat,3,1,1
2,67,Male,asymptomatic,120,229,False,efinite left ventricular hypertrophyby,129,YES,2.6,flat,2,3,1
3,37,Male,non- anginal pain,130,250,False,Normal,187,NO,3.5,down sloping,0,1,0
4,41,Female,atypical angina,130,204,False,efinite left ventricular hypertrophyby,172,NO,1.4,up sloping,0,1,0


In [35]:
df['thalassemia'] = df['thalassemia'].replace({0: 'NULL', 1: 'normal blood flow', 2: 'fixed defect (no blood flow in some part of the heart)', 3: 'reversible defect (a blood flow is observed but it is not normal'})
df['thalassemia'].value_counts()
df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,Male,typical angina,145,233,True,efinite left ventricular hypertrophyby,150,NO,2.3,down sloping,0,fixed defect (no blood flow in some part of th...,0
1,67,Male,asymptomatic,160,286,False,efinite left ventricular hypertrophyby,108,YES,1.5,flat,3,normal blood flow,1
2,67,Male,asymptomatic,120,229,False,efinite left ventricular hypertrophyby,129,YES,2.6,flat,2,reversible defect (a blood flow is observed bu...,1
3,37,Male,non- anginal pain,130,250,False,Normal,187,NO,3.5,down sloping,0,normal blood flow,0
4,41,Female,atypical angina,130,204,False,efinite left ventricular hypertrophyby,172,NO,1.4,up sloping,0,normal blood flow,0


In [36]:
df['target'] = df['target'].replace({1: 'heart disease', 0 : 'normal'})
df['target'].value_counts()
df.head()

Unnamed: 0,age,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target
0,63,Male,typical angina,145,233,True,efinite left ventricular hypertrophyby,150,NO,2.3,down sloping,0,fixed defect (no blood flow in some part of th...,normal
1,67,Male,asymptomatic,160,286,False,efinite left ventricular hypertrophyby,108,YES,1.5,flat,3,normal blood flow,heart disease
2,67,Male,asymptomatic,120,229,False,efinite left ventricular hypertrophyby,129,YES,2.6,flat,2,reversible defect (a blood flow is observed bu...,heart disease
3,37,Male,non- anginal pain,130,250,False,Normal,187,NO,3.5,down sloping,0,normal blood flow,normal
4,41,Female,atypical angina,130,204,False,efinite left ventricular hypertrophyby,172,NO,1.4,up sloping,0,normal blood flow,normal


**Step 3**:  

*   Define an age category rangig from 20 - 80

*   Label and assign the new age_group column to the DataFrame


*   Drop off the Age Column as this column is represented in the "Age_Group.







In [37]:
# Define the age ranges and labels
df['age'] = df['age'].replace({'-': 0})

# CONVERT THE DATA TYPE FROM SERIES TO INTERGER
df['age'] = df['age'].astype(int)

# LABEL THE AGE GROUPS AND ESTABLISH AGE RANGES
age_ranges = [20, 40, 60, 80]
age_labels = ['Early Adulthood', 'Middle Adulthood', 'Old Age']

# Use the cut() function to categorize the age groups
df['age_group'] = pd.cut(df['age'], bins=age_ranges, labels=age_labels)


In [38]:
df.describe()

Unnamed: 0,age,resting_blood_pressure,cholesterol,max_heart_rate_achieved,st_depression,ca
count,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.438944,131.689769,246.693069,149.607261,1.039604,0.663366
std,9.038662,17.599748,51.776918,22.875003,1.161075,0.934375
min,29.0,94.0,126.0,71.0,0.0,0.0
25%,48.0,120.0,211.0,133.5,0.0,0.0
50%,56.0,130.0,241.0,153.0,0.8,0.0
75%,61.0,140.0,275.0,166.0,1.6,1.0
max,77.0,200.0,564.0,202.0,6.2,3.0


In [39]:
df= df.drop(['age'], axis= 1)

In [40]:
df.head()

Unnamed: 0,sex,chest_pain_type,resting_blood_pressure,cholesterol,fasting_blood_sugar,rest_ecg,max_heart_rate_achieved,exercise_induced_angina,st_depression,st_slope,ca,thalassemia,target,age_group
0,Male,typical angina,145,233,True,efinite left ventricular hypertrophyby,150,NO,2.3,down sloping,0,fixed defect (no blood flow in some part of th...,normal,Old Age
1,Male,asymptomatic,160,286,False,efinite left ventricular hypertrophyby,108,YES,1.5,flat,3,normal blood flow,heart disease,Old Age
2,Male,asymptomatic,120,229,False,efinite left ventricular hypertrophyby,129,YES,2.6,flat,2,reversible defect (a blood flow is observed bu...,heart disease,Old Age
3,Male,non- anginal pain,130,250,False,Normal,187,NO,3.5,down sloping,0,normal blood flow,normal,Early Adulthood
4,Female,atypical angina,130,204,False,efinite left ventricular hypertrophyby,172,NO,1.4,up sloping,0,normal blood flow,normal,Middle Adulthood


**Step 4**:

Export Cleaned data.  
For graphical visualization of the cleaned
Data, Graphs will be plotted using Power Bi.

In [41]:
df.describe(include=[np.object])

#df.to_csv(r'/content/drive/MyDrive/Datasets/Clean_Heart_disease_cleveland_new.csv', index = False)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  df.describe(include=[np.object])


Unnamed: 0,sex,chest_pain_type,fasting_blood_sugar,rest_ecg,exercise_induced_angina,st_slope,thalassemia,target
count,303,303,303,303,303,303,303,303
unique,2,4,2,3,2,3,3,2
top,Male,asymptomatic,FALSE,Normal,NO,up sloping,normal blood flow,normal
freq,206,144,258,151,204,142,168,164
