<p  style="text-align: center;"><font size="12"><b>HEART FAILURE PREDICTION</b></font></p>

![](https://inteng-storage.s3.amazonaws.com/img/iea/Xy6xeK3Wwr/sizes/heart-attack-ai-oxford_md.jpg)


In this notebook we evaluate several variables to determine how they may relate to whether a patient dies or survives a heart failure event. 

The variables that are included in this data set are:
* Age                         
* Anemia  
* Creatinine Phosphokinase  
* Diabetes       
* Ejection Fraction   
* High Blood Pressure  
* Platelets   
* Serum Creatinine 
* Serum Sodium 
* Sex    
* Smoking 
* Time (Follow-up period (days))


<h3 class="list-group-item list-group-item-action active" data-toggle="list"  role="tab" aria-controls="home">Table of Contents</h3>

* <a href='#1'>I. Load Libraries & Packages</a>  
* <a href='#2'>II. Data Overview & Insights</a>  
* <a href='#3'>III. Outliers</a>  
* <a href='#4'>IV. Exploratory Data Analysis</a>  
    * <a href='#4a'>IVa. Univariate Analysis</a>  
    * <a href='#4b'>IVb. Bivariate Analysis</a>  
    * <a href='#4c'>IVc. Multivariate Analysis</a>  
* <a href='#5'>V. Data Normalization</a>  (coming soon) 
* <a href='#6'>VI. Model Development</a>  (coming soon)

# <a id="1">I. LIBRARIES & PACKAGES</a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import missingno as msno
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots


# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# <a id="2">II. DATA OVERVIEW & INSIGHTS</a>

In [None]:
df = pd.read_csv('../input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv')
df.head()

In [None]:
df.columns

In [None]:
# CHANGE SPELLING OF ANAEMIA COLUMN TO 'ANEMIA'
df.rename(columns={'anaemia':'anemia'}, inplace=True)

In [None]:
df.info()

In [None]:
df.describe()

## **MISSING VALUES**

There are no missing values

In [None]:
missing_percentage=df.isna().sum()*100/df.shape[0]
missing_percentage

In [None]:
df_survived = df.loc[df['DEATH_EVENT'] == 0]
df_died = df.loc[df['DEATH_EVENT'] == 1]

df_cat = df[['anemia', 'diabetes', 'high_blood_pressure', 'sex', 'smoking']]
df_cont = df[['age', 'creatinine_phosphokinase', 'ejection_fraction', 'platelets', 'serum_creatinine', 'serum_sodium', 'time']]

In [None]:
#PRINT VALUE COUNTS FOR VARIABLES WITH BINARY VALUES:

print("ANEMIA:")
print(df['anemia'].value_counts())
print("")
print("DIABETES:")
print(df['diabetes'].value_counts())
print("")
print("HIGH BLOOD PRESSURE:")
print(df['high_blood_pressure'].value_counts())
print("")
print("SEX:")
print(df['sex'].value_counts())
print("")
print("SMOKING:")
print(df['smoking'].value_counts())
print("")
print("DEATH EVENT:")
print(df['DEATH_EVENT'].value_counts())
print("")

#### GET RANGES OF CONTINUOUS VARIABLES

In [None]:
print("Range of Age Column: ", df['age'].min(), "to", df['age'].max())
print("")
print("Range of Creatinine Phosphokinase Column: ", df['creatinine_phosphokinase'].min(), "to", df['creatinine_phosphokinase'].max())
print("")
print("Range of Platelets Column: ", df['platelets'].min(), "to", df['platelets'].max())
print("")
print("Range of Serum Creatinine Column: ", df['serum_creatinine'].min(), "to", df['serum_creatinine'].max())
print("")
print("Range of Serum Sodium Column: ", df['serum_sodium'].min(), "to", df['serum_sodium'].max())
print("")
print("Range of Time Column: ", df['time'].min(), "to", df['time'].max())



# <a id='3'>III. OUTLIERS</a>

We'll visualize some of the major outliers and delete the rows that contain them. This will improve the accuracy of our predictive models. 

In [None]:
fig = px.box(df, x="creatinine_phosphokinase")
fig.update_layout(title_text='CREATININE PHOSPHOKINASE')
fig.show()

In [None]:
fig = px.box(df, x="platelets")
fig.update_layout(title_text='PLATELETS')
fig.show()

In [None]:
fig = px.box(df, x="serum_creatinine")
fig.update_layout(title_text='SERUM CREATININE')
fig.show()

In [None]:
fig = px.box(df, x="serum_sodium")
fig.update_layout(title_text='SERUM SODIUM')
fig.show()

In [None]:
# DROP ROWS WITH OUTLIER VALUES

df.drop(df[df['creatinine_phosphokinase'] >= 1380].index, inplace = True) 
df.drop(df[df['platelets'] >= 448000].index, inplace = True) 
df.drop(df[df['platelets'] <= 73000].index, inplace = True) 
df.drop(df[df['serum_creatinine'] >= 1.7].index, inplace = True) 
df.drop(df[df['serum_sodium'] <= 127].index, inplace = True) 
df.drop(df[df['serum_sodium'] >= 148].index, inplace = True) 


# <a id="4">IV. EXPLORATORY DATA ANALYSIS</a>

In this section we'll explore our data and create some visualizion to give us further insight. 

## <a id='4a'>IVa. UNIVARIATE ANALYSIS</a>


In [None]:
values = df['DEATH_EVENT'].value_counts()

fig = make_subplots(rows=1, cols=2, 
                    specs=[[{"type": "xy"}, {"type": "domain"}]],
                    subplot_titles=('Death Event Count', 'Death Event Percentage'))

fig.add_trace(go.Bar(y=values, 
                     name='Death Event Count', 
                     marker=dict(color=['#2ad4cb','#e6c822'])), row=1, col=1)

fig.add_trace(go.Pie(labels=['Survived','Died'], 
                     values=values, 
                     name='Death Event Percentage',
                     hole = 0.5,
                     marker=dict(colors=['#2ad4cb','#e6c822'])), row=1, col=2)


fig.update_layout(height=500, 
                  title_text='DEATH EVENT STATS',
                  showlegend=True)

fig.show()


The dataset is very unbalanced with a relatively low number of deaths.

<a id="cat1"></a>

### VISUALIZE ALL CATEGORICAL VARIABLES

In [None]:
def plot_cats(feat):    
    values = df_cat[feat].value_counts()
    labels = df_cat[feat].value_counts().keys().tolist()

    fig = make_subplots(rows=1, cols=2, 
                        specs=[[{"type": "xy"}, {"type": "domain"}]],
                        subplot_titles=((feat.title() + ' Count'), (feat.title() + ' Percentage')))

    fig.add_trace(go.Bar(y=values, 
                         name=(feat.title() + ' Count'), 
                         marker=dict(color=['#2ad4cb','#e6c822'])), row=1, col=1)

    fig.add_trace(go.Pie(labels=[labels], 
                         values=values, 
                         name=feat.title() + ' Percentage',
                         hole = 0.5,
                         marker=dict(colors=['#2ad4cb','#e6c822'])), row=1, col=2)

    fig.update_layout(height=500, 
                      title_text=feat.upper() + ' STATS',
                      showlegend=True)

    fig.show()

In [None]:
plot_cats('anemia')

In [None]:
plot_cats('diabetes')

In [None]:
plot_cats('high_blood_pressure')

In [None]:
plot_cats('sex')

In [None]:
plot_cats('smoking')

## <a id='4b'>IVa. BIVARIATE ANALYSIS</a>


In this section we'll visualize how each categorical and continuous variable correlates with our target variable, "DEATH EVENT". 

<a id="cat2"></a>

### CATEGORICAL VARIABLES

### ANEMIA x DEATH EVENT

**OBSERVATIONS**

* The dataset is closely split between patients both with and without anemia. A small majority of patients (103 or 52.3%) have no anemia. While 97 or 47.7% of patients do have anemia. 
* Patients with anemia are more likely to die than those without. 27.7% of patients with anemia died while 18.4% of patients without anemia died. 

In [None]:
df_anemia = df.groupby(['anemia', 'DEATH_EVENT'])[['DEATH_EVENT']].count()
df_anemia.columns = ['count']
df_anemia.reset_index(inplace=True)

anemia_count = df_anemia.groupby(['anemia'])[['count']].sum()
anemia_count.reset_index(inplace=True)

noanemia_death = df_anemia.loc[df_anemia['anemia'] == 0]
anemia_death = df_anemia.loc[df_anemia['anemia'] == 1]

subplot_titles=['ANEMIA COUNT', 'ANEMIA x DEATH EVENT COUNT', 'ANEMIA PERCENTAGES', 
                'OVERALL ANEMIA & DEATH EVENT', 'NO ANEMIA x DEATH', 'ANEMIA x DEATH']

fig = make_subplots(rows=3, cols=2, specs=[[{"type": "xy"}, {"type": "xy"}],
                                           [{"type": "domain"}, {"type": "domain"}],
                                           [{"type": "domain"}, {"type": "domain"}]],
                   subplot_titles=subplot_titles,
                   vertical_spacing = 0.13)

label1 = ['No Anemia', 'Anemia']
label2 = ['No Anemia: Survived', 'No Anemia: Died', 'Anemia: Survived', 'Anemia: Died']
label3 = ['No Anemia: Survived', 'No Anemia: Died']
label4 = ['Anemia: Survived', 'Anemia: Died']

fig.add_trace(go.Bar(x=label1, y=anemia_count['count'], name='Anemia Count', marker_color='rgb(26, 118, 255)'), row=1, col=1)
fig.add_trace(go.Bar(x=label2, y=df_anemia['count'], name='Anemia vs Death Event', marker_color='rgb(235, 186, 40)'), row=1, col=2)
fig.add_trace(go.Pie(labels=label1, values=anemia_count['count']), row=2, col=1)
fig.add_trace(go.Pie(labels=label2, values=df_anemia['count']), row=2, col=2)
fig.add_trace(go.Pie(labels=label3, values=noanemia_death['count']), row=3, col=1)
fig.add_trace(go.Pie(labels=label4, values=anemia_death['count']), row=3, col=2)

# fig.update_traces(hoverinfo="label+name+value")
fig.update_layout(height=1000, showlegend=True, title_text='ANEMIA x DEATH EVENT')

fig.show()

### DIABETES x DEATH EVENT

**OBSERVATIONS**

* Most patients do not have diabetes, outnumbering patients with diabetes by 29 or about 16%. 
* 113 patients or 57.4% do NOT have diabetes. 84 patients or 42.6% have diabetes. 
* Death rates of patients with diabetes are not signficantly higher than those without. Those with diabetes died at a rate approximately 1.8% higher than those without. 

In [None]:
df_diabetes = df.groupby(['diabetes', 'DEATH_EVENT'])[['DEATH_EVENT']].count()
df_diabetes.columns = ['count']
df_diabetes.reset_index(inplace=True)

diabetes_count = df_diabetes.groupby(['diabetes'])[['count']].sum()
diabetes_count.reset_index(inplace=True)

nodiabetes_death = df_diabetes.loc[df_diabetes['diabetes'] == 0]
diabetes_death = df_diabetes.loc[df_diabetes['diabetes'] == 1]

subplot_titles=['DIABETES COUNT', 'DIABETES x DEATH EVENT COUNT', 'DIABETES PERCENTAGES', 
                'OVERALL DIABETES & DEATH EVENT', 'NO DIABETES x DEATH', 'DIABETES x DEATH']

fig = make_subplots(rows=3, cols=2, specs=[[{"type": "xy"}, {"type": "xy"}],
                                           [{"type": "domain"}, {"type": "domain"}],
                                           [{"type": "domain"}, {"type": "domain"}]],
                   subplot_titles=subplot_titles,
                   vertical_spacing = 0.13)

label1 = ['No Diabetes', 'Diabetes']
label2 = ['No Diabetes: Survived', 'No Diabetes: Died', 'Diabetes: Survived', 'Diabetes: Died']
label3 = ['No Diabetes: Survived', 'No Diabetes: Died']
label4 = ['Diabetes: Survived', 'Diabetes: Died']

fig.add_trace(go.Bar(x=label1, y=diabetes_count['count'], name='Diabetes Count', marker_color='rgb(26, 118, 255)'), row=1, col=1)
fig.add_trace(go.Bar(x=label2, y=df_diabetes['count'], name='Diabetes vs Death Event', marker_color='rgb(235, 186, 40)'), row=1, col=2)
fig.add_trace(go.Pie(labels=label1, values=diabetes_count['count']), row=2, col=1)
fig.add_trace(go.Pie(labels=label2, values=df_diabetes['count']), row=2, col=2)
fig.add_trace(go.Pie(labels=label3, values=nodiabetes_death['count']), row=3, col=1)
fig.add_trace(go.Pie(labels=label4, values=diabetes_death['count']), row=3, col=2)

# fig.update_traces(hoverinfo="label+name+value")
fig.update_layout(height=1000, showlegend=True, title_text='DIABETES x DEATH EVENT')

fig.show()

### HIGH BLOOD PRESSURE x DEATH EVENT

**OBSERVATIONS**

* Most patients (121 or 61.4%) do not have high blood pressure, while 76 or 38.6% do. 
* Patients with high blood pressure are more likely to die than those without. 27.6% of patients with high blood pressure died while 19.8% of patients without high blood pressure died. 

In [None]:
df_hbp = df.groupby(['high_blood_pressure', 'DEATH_EVENT'])[['DEATH_EVENT']].count()
df_hbp.columns = ['count']
df_hbp.reset_index(inplace=True)

hbp_count = df_hbp.groupby(['high_blood_pressure'])[['count']].sum()
hbp_count.reset_index(inplace=True)

nohbp_death = df_hbp.loc[df_hbp['high_blood_pressure'] == 0]
hbp_death = df_hbp.loc[df_hbp['high_blood_pressure'] == 1]

subplot_titles=['HBP COUNT', 'HBP x DEATH EVENT COUNT', 'HBP PERCENTAGES', 
                'OVERALL HBP & DEATH EVENT', 'NO HBP x DEATH', 'HBP x DEATH']

fig = make_subplots(rows=3, cols=2, specs=[[{"type": "xy"}, {"type": "xy"}],
                                           [{"type": "domain"}, {"type": "domain"}],
                                           [{"type": "domain"}, {"type": "domain"}]],
                   subplot_titles=subplot_titles,
                   vertical_spacing = 0.13)

label1 = ['No HBP', 'HBP']
label2 = ['No HBP: Survived', 'No HBP: Died', 'HBP: Survived', 'HBP: Died']
label3 = ['No HBP: Survived', 'No HBP: Died']
label4 = ['HBP: Survived', 'HBP: Died']

fig.add_trace(go.Bar(x=label1, y=hbp_count['count'], name='HBP Count', marker_color='rgb(26, 118, 255)'), row=1, col=1)
fig.add_trace(go.Bar(x=label2, y=df_hbp['count'], name='HBP vs Death Event', marker_color='rgb(235, 186, 40)'), row=1, col=2)
fig.add_trace(go.Pie(labels=label1, values=hbp_count['count']), row=2, col=1)
fig.add_trace(go.Pie(labels=label2, values=df_hbp['count']), row=2, col=2)
fig.add_trace(go.Pie(labels=label3, values=nohbp_death['count']), row=3, col=1)
fig.add_trace(go.Pie(labels=label4, values=hbp_death['count']), row=3, col=2)

# fig.update_traces(hoverinfo="label+name+value")
fig.update_layout(height=1000, showlegend=True, title_text='HIGH BLOOD PRESSURE (HBP) x DEATH EVENT')

fig.show()

### SEX x DEATH EVENT

**OBSERVATIONS**

* In this dataset males outnumber females by 11.4% 
* Female patients died at a 1.4% higher rate than males. 23.7% of females died while 22.3% of males died.  

In [None]:
df_sex = df.groupby(['sex', 'DEATH_EVENT'])[['DEATH_EVENT']].count()
df_sex.columns = ['count']
df_sex.reset_index(inplace=True)

sex_count = df_sex.groupby(['sex'])[['count']].sum()
sex_count.reset_index(inplace=True)

female_death = df_sex.loc[df_sex['sex'] == 0]
male_death = df_sex.loc[df_sex['sex'] == 1]

subplot_titles=['SEX COUNT', 'SEX x DEATH EVENT COUNT', 'SEX PERCENTAGES', 
                'OVERALL SEX & DEATH EVENT', 'FEMALE x DEATH', 'MALE x DEATH']

fig = make_subplots(rows=3, cols=2, specs=[[{"type": "xy"}, {"type": "xy"}],
                                           [{"type": "domain"}, {"type": "domain"}],
                                           [{"type": "domain"}, {"type": "domain"}]],
                   subplot_titles=subplot_titles,
                   vertical_spacing = 0.13)

label1 = ['Female', 'Male']
label2 = ['Female: Survived', 'Female: Died', 'Male: Survived', 'Male: Died']
label3 = ['Female: Survived', 'Female: Died']
label4 = ['Male: Survived', 'Male: Died']

fig.add_trace(go.Bar(x=label1, y=sex_count['count'], name='Sex Count', marker_color='rgb(26, 118, 255)'), row=1, col=1)
fig.add_trace(go.Bar(x=label2, y=df_sex['count'], name='Sex vs Death Event', marker_color='rgb(235, 186, 40)'), row=1, col=2)
fig.add_trace(go.Pie(labels=label1, values=sex_count['count']), row=2, col=1)
fig.add_trace(go.Pie(labels=label2, values=df_sex['count']), row=2, col=2)
fig.add_trace(go.Pie(labels=label3, values=female_death['count']), row=3, col=1)
fig.add_trace(go.Pie(labels=label4, values=male_death['count']), row=3, col=2)

# fig.update_traces(hoverinfo="label+name+value")
fig.update_layout(height=1000, showlegend=True, title_text='SEX x DEATH EVENT')

fig.show()

### SMOKING x DEATH EVENT

**OBSERVATIONS:** 

* Most people in the dataset are non smokers. 131 or 66.5% non smoker, 66 or 33.5% smoker. 
* 24.2% of smokers died. 22.1% of non smokers died. 

In [None]:
df_smoking = df.groupby(['smoking', 'DEATH_EVENT'])[['DEATH_EVENT']].count()
df_smoking.columns = ['count']
df_smoking.reset_index(inplace=True)

smoking_count = df_smoking.groupby(['smoking'])[['count']].sum()
smoking_count.reset_index(inplace=True)

nonsmoking_death = df_smoking.loc[df_smoking['smoking'] == 0]
smoking_death = df_smoking.loc[df_smoking['smoking'] == 1]

subplot_titles=['SMOKING COUNT', 'SMOKING x DEATH EVENT COUNT', 'SMOKING PERCENTAGES', 
                'OVERALL SMOKING & DEATH EVENT', 'NO SMOKING x DEATH', 'SMOKING x DEATH']

fig = make_subplots(rows=3, cols=2, specs=[[{"type": "xy"}, {"type": "xy"}],
                                           [{"type": "domain"}, {"type": "domain"}],
                                           [{"type": "domain"}, {"type": "domain"}]],
                   subplot_titles=subplot_titles,
                   vertical_spacing = 0.13)

label1 = ['Non Smoker', 'Smoker']
label2 = ['Non Smoker: Survived', 'Non Smoker: Died', 'Smoker: Survived', 'Smoker: Died']
label3 = ['Non Smoker: Survived', 'Non Smoker: Died']
label4 = ['Smoker: Survived', 'Smoker: Died']

fig.add_trace(go.Bar(x=label1, y=smoking_count['count'], name='Smoker Count', marker_color='rgb(26, 118, 255)'), row=1, col=1)
fig.add_trace(go.Bar(x=label2, y=df_smoking['count'], name='Smoker vs Death Event', marker_color='rgb(235, 186, 40)'), row=1, col=2)
fig.add_trace(go.Pie(labels=label1, values=smoking_count['count']), row=2, col=1)
fig.add_trace(go.Pie(labels=label2, values=df_smoking['count']), row=2, col=2)
fig.add_trace(go.Pie(labels=label3, values=nonsmoking_death['count']), row=3, col=1)
fig.add_trace(go.Pie(labels=label4, values=smoking_death['count']), row=3, col=2)

# fig.update_traces(hoverinfo="label+name+value")
fig.update_layout(height=1000, showlegend=True, title_text='SMOKING x DEATH EVENT')

fig.show()

### CONTINUOUS VARIABLES

The charts below will display the count of the various continuous variables ('age', 'creatinine_phosphokinase', 'ejection_fraction', 'platelets', 'serum_creatinine', 'serum_sodium', 'time') according to whether or not a patient died or survived. 

In [None]:
fig = px.histogram(df, x="age", 
                   color="DEATH_EVENT",
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   marginal="box", 
                   nbins=10, hover_data=df.columns) 

fig.update_layout(height=500, title_text='AGE x DEATH EVENT', showlegend=True)

fig.show()

In [None]:
fig = px.histogram(df, 
                   x="creatinine_phosphokinase", 
                   color="DEATH_EVENT", 
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   marginal="box", 
                   hover_data=df.columns)

fig.update_layout(height=500, title_text='CREATININE PHOSPHOKINASE x DEATH EVENT', showlegend=True)

fig.show()

In [None]:
fig = px.histogram(df, 
                   x="ejection_fraction",
                   color="DEATH_EVENT", 
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   marginal="box", 
                   hover_data=df.columns)

fig.update_layout(height=500, title_text='EJECTION FRACTION x DEATH EVENT', showlegend=True)

fig.show()

In [None]:
fig = px.histogram(df, x="platelets", 
                   color="DEATH_EVENT", 
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   marginal="box", 
                   hover_data=df.columns)

fig.update_layout(height=500, title_text='PLATELETS x DEATH EVENT', showlegend=True)

fig.show()

In [None]:
fig = px.histogram(df, 
                   x="serum_creatinine", 
                   color="DEATH_EVENT", 
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   marginal="box", 
                   hover_data=df.columns)

fig.update_layout(height=500, title_text='SERUM CREATININE x DEATH EVENT', showlegend=True)

fig.show()

In [None]:
fig = px.histogram(df, 
                   x="serum_sodium", 
                   color="DEATH_EVENT", 
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   marginal="box", 
                   hover_data=df.columns)

fig.update_layout(height=500, title_text='SERUM SODIUM x DEATH EVENT', showlegend=True)

fig.show()

In [None]:
fig = px.histogram(df, 
                   x="time", 
                   color="DEATH_EVENT", 
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   marginal="box", 
                   hover_data=df.columns)

fig.update_layout(height=500, title_text='TIME x DEATH EVENT', showlegend=True)

fig.show()

## <a id='4c'>IVc. MULTIVARIATE ANALYSIS</a>


### DEATH & SURVIVAL PERCENTAGES BY SEX & HIGH BLOOD PRESSURE

In [None]:
df_de = df.groupby(['DEATH_EVENT', 'sex', 'high_blood_pressure'])[['age']].count()
df_de.reset_index(inplace=True)
df_de.rename(columns={'age':'count'}, inplace=True)

survived_female = df_de.loc[(df_de['DEATH_EVENT'] == 0) & (df_de['sex'] == 0)]
survived_male = df_de.loc[(df_de['DEATH_EVENT'] == 0) & (df_de['sex'] == 1)]
died_female = df_de.loc[(df_de['DEATH_EVENT'] == 1) & (df_de['sex'] == 0)]
died_male = df_de.loc[(df_de['DEATH_EVENT'] == 1) & (df_de['sex'] == 1)]

subplot_titles = ['FEMALE x SURVIVED' , 'MALE x SURVIVED', 'FEMALE x DIED', 'MALE x DIED']

fig = make_subplots(rows=2, cols=2, specs=[[{"type": "domain"}, {"type": "domain"}],
                                           [{"type": "domain"}, {"type": "domain"}]],
                   subplot_titles=subplot_titles, 
                   vertical_spacing = 0.10)

label = ['No HBP', 'HBP']

fig.add_trace(go.Pie(labels=label, values=survived_female['count'], hole=0.5, marker=dict(colors=['#2ad4cb','#e6c822'])), row=1, col=1)
fig.add_trace(go.Pie(labels=label, values=survived_male['count'], hole=0.5, marker=dict(colors=['#2ad4cb','#e6c822'])), row=1, col=2)
fig.add_trace(go.Pie(labels=label, values=died_female['count'], hole=0.5, marker=dict(colors=['#2ad4cb','#e6c822'])), row=2, col=1)
fig.add_trace(go.Pie(labels=label, values=died_male['count'], hole=0.5, marker=dict(colors=['#2ad4cb','#e6c822'])), row=2, col=2)

# fig.update_traces(hoverinfo="label+name+value")
fig.update_layout(height=750, showlegend=True, title_text='DEATH & SURVIVAL PERCENTAGES BY SEX & HIGH BLOOD PRESSURE')

fig.show()

### DEATH & SURVIVAL PERCENTAGES BY SEX & SMOKING STATUS

In [None]:
df_de2 = df.groupby(['DEATH_EVENT', 'sex', 'smoking'])[['age']].count()
df_de2.reset_index(inplace=True)
df_de2.rename(columns={'age':'count'}, inplace=True)

survived_female = df_de2.loc[(df_de2['DEATH_EVENT'] == 0) & (df_de2['sex'] == 0)]
survived_male = df_de2.loc[(df_de2['DEATH_EVENT'] == 0) & (df_de2['sex'] == 1)]
died_female = df_de2.loc[(df_de2['DEATH_EVENT'] == 1) & (df_de2['sex'] == 0)]
died_male = df_de2.loc[(df_de2['DEATH_EVENT'] == 1) & (df_de2['sex'] == 1)]

subplot_titles = ['FEMALE x SURVIVED' , 'MALE x SURVIVED', 'FEMALE x DIED', 'MALE x DIED']

fig = make_subplots(rows=2, cols=2, specs=[[{"type": "domain"}, {"type": "domain"}],
                                           [{"type": "domain"}, {"type": "domain"}]],
                   subplot_titles=subplot_titles,
                   vertical_spacing = 0.10)

label = ['Non-Smoking', 'Smoking']

fig.add_trace(go.Pie(labels=label, values=survived_female['count'], hole=0.5, marker=dict(colors=['#2ad4cb','#e6c822']), rotation=-45), row=1, col=1)
fig.add_trace(go.Pie(labels=label, values=survived_male['count'], hole=0.5, marker=dict(colors=['#2ad4cb','#e6c822'])), row=1, col=2)
fig.add_trace(go.Pie(labels=label, values=died_female['count'], hole=0.5, marker=dict(colors=['#2ad4cb','#e6c822'])), row=2, col=1)
fig.add_trace(go.Pie(labels=label, values=died_male['count'], hole=0.5, marker=dict(colors=['#2ad4cb','#e6c822'])), row=2, col=2)

fig.update_traces(hoverinfo="label+name+value")
fig.update_layout(height=700, 
                  showlegend=True, 
                  margin=dict(t=100, b=0, l=0, r=0),
                  title_text='DEATH & SURVIVAL PERCENTAGES BY SEX & SMOKING STATUS')

fig.show()

### SEX & AGE of those who Died

In [None]:
# df_de_age = df.groupby(['DEATH_EVENT', 'sex', 'age'])[['anemia']].count()
# df_de_age.reset_index(inplace=True)
# df_de_age.rename(columns={'anemia':'count'}, inplace=True)
# df_de_age = df_de_age.loc[df_de_age['DEATH_EVENT'] == 1]
# df_de_age

df_de1_age = df[['DEATH_EVENT', 'sex', 'age']].loc[df['DEATH_EVENT'] == 1]
df_de1_age

fig = px.histogram(df_de1_age, 
                   x="age", 
                   color="sex", 
                   marginal="box",
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   hover_data=df_de_age.columns)

fig.update_layout(height=500, title_text='SEX & AGE x DEATH EVENT', showlegend=True)

fig.show()


### SEX & AGE of the Survivors

In [None]:
df_de0_age = df[['DEATH_EVENT', 'sex', 'age']].loc[df['DEATH_EVENT'] == 0]
df_de0_age

fig = px.histogram(df_de0_age, 
                   x="age", 
                   color="sex", 
                   marginal="box",
                   color_discrete_sequence=['#e6c822','#2ad4cb'],
                   hover_data=df_de_age.columns)

fig.update_layout(height=500, title_text='SEX & AGE of the Survivors', showlegend=True)

fig.show()

## DATA NORMALIZATION

## MODEL DEVELOPMENT