# <div style="text-align: center; background-color: #191414;font-size:200%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>Heart Disease EDA</b></div>

<!-- ### Observations: -->

<h1 style="text-align: left;background-color: #191414; font-size:200%; font-family:Courier New; color: #F9B1B8; padding: 14px; line-height: 1; border-radius:10px"> <b>About Dataset</b></h1>

<blockquote style="margin-right:auto; font-family:Courier New; margin-left:auto; color:white; background-color: #4e2e4e; padding: 1em; margin:24px;">
   
<ul>
<li> <font color="white" size=+1.0><b>Age</b></font> :
    <ul>
        <li> Age of the patient [years]
    </ul>
<br>
    
<li> <font color="white" size=+1.0><b>Sex:</b></font>
    <ul>
    <li> Sex of the patient [M: Male, F: Female]
    </ul>
<br>

<li> <font color="white" size=+1.0><b>ChestPainType</b></font> :
    <ul>
        <li> Chest Pain Type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>RestingBP</b></font> :
    <ul>
        <li> Resting blood pressure [mm Hg]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>Cholesterol</b></font> :
    <ul>
        <li> Serum Cholesterol [mm/dl]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>FastingBS</b></font> :
    <ul>
        <li> Fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>RestingECG</b></font> :
    <ul>
        <li> Resting Electrocardiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>MaxHR</b></font> :
    <ul>
        <li> Maximum heart rate achieved [Numeric value between 60 and 202]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>ExerciseAngina</b></font> :
    <ul>
        <li> Exercise-induced angina [Y: Yes, N: No]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>Oldpeak</b></font> :
    <ul>
        <li> ST [Numeric value measured in depression]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>ST_Slope</b></font> :
    <ul>
        <li> The slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]
    </ul>
    
<br>
    
<li> <font color="white" size=+1.0><b>HeartDisease</b></font> :
    <ul>
        <li> Output class [1: heart disease, 0: Normal]
    </ul>
</blockquote>                   

<h1 style="text-align: left;background-color: #191414; font-size:200%; font-family:Courier New; color: #F9B1B8; padding: 14px; line-height: 1; border-radius:10px"> <b>Table of Contents</b></h1>


<a id="top"></a>
<div class="list-group" id="list-tab" role="tablist">
    
   * [1. Donut Charts](#1)
   * [2. Histograms for Feature Distributions](#2)
   * [3. Correlation Matrix](#3)
   * [4. 2D Density Plots](#4)
   * [5. Male vs Female Scatter Plots](#5)
   * [6. Male vs Female Sunburst](#6)
   * [7. Heart Disease Histograms](#7)

In [1]:
import pandas as pd
import numpy as np

# visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from sklearn.linear_model import LinearRegression

theme_colors = ['#F9B1B8',  '#EE4355',  '#B60618','#820815']

color_palette = ['#FFFFFF', '#F9B1B8', '#EE4355', '#B60618', '#820815', '#000000']

font = 'Courier New'

  shapely_geos_version, geos_capi_version_string


In [2]:
df = pd.read_csv('../input/heart-failure-prediction/heart.csv')

In [3]:
df

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0
...,...,...,...,...,...,...,...,...,...,...,...,...
913,45,M,TA,110,264,0,Normal,132,N,1.2,Flat,1
914,68,M,ASY,144,193,1,Normal,141,N,3.4,Flat,1
915,57,M,ASY,130,131,0,Normal,115,Y,1.2,Flat,1
916,57,F,ATA,130,236,0,LVH,174,N,0.0,Flat,1


In [4]:
st_slope_count = df.groupby(['ST_Slope']).size().reset_index().rename(columns={0: 'count'})
st_slope_count

Unnamed: 0,ST_Slope,count
0,Down,63
1,Flat,460
2,Up,395


In [5]:
df = df[df['RestingBP'] != 0]

<a id="1"></a>
# <div style="text-align: center; background-color: #191414;font-size:120%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>1. Donut Charts</b></div>

In [6]:
sex_count = df.groupby(['Sex']).size().reset_index().rename(columns={0: 'count'})
cp_count = df.groupby(['ChestPainType']).size().reset_index().rename(columns={0: 'count'})
fasting_bs_count = df.groupby(['FastingBS']).size().reset_index().rename(columns={0: 'count'})
resting_ecg_count = df.groupby(['RestingECG']).size().reset_index().rename(columns={0: 'count'})
angina_count = df.groupby(['ExerciseAngina']).size().reset_index().rename(columns={0: 'count'})
st_slope_count = df.groupby(['ST_Slope']).size().reset_index().rename(columns={0: 'count'})

fig = make_subplots(rows=3, cols=2,
                    specs=[[{'type':'domain'}, {'type':'domain'}],
                           [{'type':'domain'}, {'type':'domain'}],
                           [{'type':'domain'}, {'type':'domain'}]
                          ])


## Sex Donut Chart
fig.add_trace(
    go.Pie(
        labels=sex_count['Sex'],
        values=sex_count['count'],
        hole=.6,
        title='Sex',
        titlefont={'color':None, 'size': 24},       
        ),
    row=1,col=1
    )

## Chest Pain Type Donut Chart
fig.add_trace(
    go.Pie(
        labels=cp_count['ChestPainType'],
        values=cp_count['count'],
        hole=.6,
        title='Chest Pain Type',
        titlefont={'color':None, 'size': 24},
        ),
    row=1,col=2
    )

## Fasting Blood Sugar Donut Chart
fig.add_trace(
    go.Pie(
        labels=fasting_bs_count['FastingBS'],
        values=fasting_bs_count['count'],
        hole=.6,
        title='Fasting Blood Sugar',
        titlefont={'color':None, 'size': 24},
        ),
    row=2,col=1
    )


## RestingECG Donut Chart
fig.add_trace(
    go.Pie(
        labels=resting_ecg_count['RestingECG'],
        values=resting_ecg_count['count'],
        hole=.6,
        title='Resting ECG',
        titlefont={'color':None, 'size': 24},
        ),
    row=2,col=2
    )

## ExerciseAngina Donut Chart
fig.add_trace(
    go.Pie(
        labels=angina_count['ExerciseAngina'],
        values=angina_count['count'],
        hole=.6,
        title='Exercise Angina',
        titlefont={'color':None, 'size': 24},
        ),
    row=3,col=1
    )

## ST_Slope Donut Chart
fig.add_trace(
    go.Pie(
        labels=st_slope_count['ST_Slope'],
        values=st_slope_count['count'],
        hole=.6,
        title='ST Slope',
        titlefont={'color':None, 'size': 24},
        ),
    row=3,col=2
    )

fig.update_traces(
    hoverinfo='label+value',
    textinfo='label+percent',
    textfont_size=12,
    marker=dict(
        colors=theme_colors,
        line=dict(color='#EEEEEE',
                  width=2)
        )
    )


fig.layout.update(title="<b> Categorical Features Donut Charts <b>",
                  titlefont={'color':None, 'size': 24, 'family': 'Courier New'},
                  showlegend=False, 
                  height=1000, 
                  width=1000,
#                   paper_bgcolor="#333333",
                  template='plotly_dark',
                  title_x=0.5
                  )
fig.show()

<a id="2"></a>
# <div style="text-align: center; background-color: #191414;font-size:120%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>2. Histograms for Feature Distribution</b></div>

In [7]:
fig=make_subplots(rows=2,cols=2,subplot_titles=('<i>Age', '<i>Resting BP', '<i>Cholesterol', '<i>MaxHR'))
fig.add_trace(go.Histogram(x=df['Age'],name='Age'),row=1,col=1)
fig.add_trace(go.Histogram(x=df['RestingBP'],name='RestingBP'),row=1,col=2)
fig.add_trace(go.Histogram(x=df['Cholesterol'],name='Cholesterol'),row=2,col=1)
fig.add_trace(go.Histogram(x=df['MaxHR'],name='MaxHR'),row=2,col=2)

fig.update_layout(height=600, width=1000, title_text='<b>Feature Distribution', font_size=20)
fig.update_layout(template='plotly_dark', title_x=0.5, font_family='Courier New')

<a id="3"></a>
# <div style="text-align: center; background-color: #191414;font-size:120%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>3. Correlation Matrix</b></div>

In [8]:
color_palette = ['#FFFFFF', '#F9B1B8', '#EE4355', '#B60618', '#820815', '#000000']

corr = df.corr()
fig = go.Figure(data= go.Heatmap(z=corr,
                                 x=corr.index.values,
                                 y=corr.columns.values,
                                 colorscale=color_palette,
                                 text = corr.round(2), texttemplate="%{text}"
                                 )
                )

fig.update_layout(title_text='<b>Correlation Matrix<b>',
                  title_x=0.5,
                  titlefont={'size': 24, 'family': 'Courier New'},
                  width=600, height=600,
                  xaxis_showgrid=False,
                  yaxis_showgrid=False,
                  yaxis_autorange='reversed', 
                  paper_bgcolor=None,
                  )

fig.show()

<a id="4"></a>
# <div style="text-align: center; background-color: #191414;font-size:120%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>4. 2D Density Plots</b></div>

In [9]:
fig = px.density_heatmap(
    df, x='Age', y='RestingBP',
    marginal_x='histogram', marginal_y='histogram', histfunc='count'
)

fig.update_layout(
    title="Resting Blood Pressure X Age Groups Density Plot",
    xaxis_title="Age Groups",
    yaxis_title="Resting Blood Pressure",
    font=dict(
        family="Rubik",
        size=14
    )
)

fig.show()

In [10]:
fig = px.density_heatmap(
    df, x='Age', y='Cholesterol',
    marginal_x='histogram', marginal_y='histogram', histfunc='count'
)

fig.update_layout(
    title="Cholesterol X Age Groups Density Plot",
    xaxis_title="Age Groups",
    yaxis_title="Cholesterol",
    font=dict(
        family="Rubik",
        size=14
    )
)

fig.show()

In [11]:
fig = px.density_heatmap(
    df, x='Age', y='MaxHR',
    marginal_x='histogram', marginal_y='histogram', histfunc='count'
)

fig.update_layout(
    title="Max Heart Rate X Age Groups Density Plot",
    xaxis_title="Age Groups",
    yaxis_title="Max Heart Rate",
    font=dict(
        family="Rubik",
        size=14
    )
)

fig.show()

<a id="5"></a>
# <div style="text-align: center; background-color: #191414;font-size:120%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>5. Male vs Female Scatter Plots</b></div>

In [12]:
male_df = df[df['Sex'] == 'M']
female_df = df[df['Sex'] == 'F']

In [13]:
def makeLine(df, x, y):
    model = LinearRegression().fit(np.array(df[x]).reshape(-1,1), (np.array(df[y])))
    y_pred = model.predict(np.array(df[x]).reshape(-1,1))
    return y_pred

In [14]:
fig = make_subplots(
    rows=2, cols=2, subplot_titles=("Male Resting Blood Pressure", "Female Resting Blood Pressure", "Male Max Heart Rate", "Female Max Heart Rate")
)

# Add traces
fig.add_trace(go.Scatter(x=male_df['Age'], y=male_df['RestingBP'], mode='markers'), row=1, col=1)
fig.add_trace(go.Scatter(x=male_df['Age'], y=makeLine(male_df, 'Age', 'RestingBP'), mode='lines',name="Linear_reg_fit", marker_color='white'), row=1, col=1)

fig.add_trace(go.Scatter(x=female_df['Age'], y=female_df['RestingBP'], mode='markers'), row=1, col=2)
fig.add_trace(go.Scatter(x=female_df['Age'], y=makeLine(female_df, 'Age', 'RestingBP'), mode='lines',name="Linear_reg_fit", marker_color='white'), row=1, col=2)


fig.add_trace(go.Scatter(x=male_df['Age'], y=male_df['MaxHR'], mode='markers'), row=2, col=1)
fig.add_trace(go.Scatter(x=male_df['Age'], y=makeLine(male_df, 'Age', 'MaxHR'), mode='lines',name="Linear_reg_fit", marker_color='white'), row=2, col=1)

fig.add_trace(go.Scatter(x=female_df['Age'], y=female_df['MaxHR'], mode='markers'), row=2, col=2)
fig.add_trace(go.Scatter(x=female_df['Age'], y=makeLine(female_df, 'Age', 'MaxHR'), mode='lines',name="Linear_reg_fit", marker_color='white'), row=2, col=2)


# Update xaxis properties
fig.update_xaxes(title_text="Age", row=1, col=1)
fig.update_xaxes(title_text="Age", row=1, col=2)
fig.update_xaxes(title_text="Age", showgrid=False, row=2, col=1)
fig.update_xaxes(title_text="Age", type="log", row=2, col=2)

# Update yaxis properties
fig.update_yaxes(title_text="Resting Blood Pressure", row=1, col=1)
fig.update_yaxes(title_text="Resting Blood Pressure", row=1, col=2)
fig.update_yaxes(title_text="Max Heart Rate", showgrid=False, row=2, col=1)
fig.update_yaxes(title_text="Max Heart Rate", row=2, col=2)

# Update title and height
fig.update_layout(title_text="Male vs Female WRT Age", title_x=0.5, height=700, template='plotly_dark', showlegend=False,
        font=dict(
            family="Rubik",
            size=14)
)

fig.show()

### Observations:

<blockquote style="margin-right:auto; font-family:Courier New; margin-left:auto; color:white; background-color: #4e2e4e; padding: 1em; margin:24px;">
   
<ul>
<li> <font color="white" size=+1.0><b>Males</b></font> :
    <ul>
        <li> With Age the Resting Blood Pressure of Males Increases with a Slope of <b>0.46</b>.
        <li> With Age the Max Heart Rate of Males Decreases with a Slope of <b>1.06</b>.
    </ul>
<li> <font color="white" size=+1.0><b>Females:</b></font>
    <ul>
    <li> With Age the Resting Blood Pressure of Females Increases with a Slope of <b>0.64</b>.
    <li> With Age the Max Heart Rate of Females Decreases with a Slope of <b>0.71</b>.
</ul>                                                                                                                                             
</blockquote>

<a href="#top">Back to top</a>                                                                                                                                                   

<a id="6"></a>
# <div style="text-align: center; background-color: #191414;font-size:120%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>6. Male vs Female Sunburst</b></div>

In [15]:
## Grouping Datasets
male_cp_fbs = male_df.groupby(['ChestPainType', 'FastingBS']).size().reset_index().rename(columns={0: 'count'})
female_cp_fbs = female_df.groupby(['ChestPainType', 'FastingBS']).size().reset_index().rename(columns={0: 'count'})

male_st_ecg = male_df.groupby(['ST_Slope', 'RestingECG']).size().reset_index().rename(columns={0: 'count'})
female_st_ecg = female_df.groupby(['ST_Slope', 'RestingECG']).size().reset_index().rename(columns={0: 'count'})

male_ea_cp = male_df.groupby(['ExerciseAngina', 'ChestPainType']).size().reset_index().rename(columns={0: 'count'})
female_ea_cp = female_df.groupby(['ExerciseAngina', 'ChestPainType']).size().reset_index().rename(columns={0: 'count'})

## Creating Sunburst Figures
sb1 = px.sunburst(male_cp_fbs, values='count', path=['ChestPainType', 'FastingBS'])
sb2 = px.sunburst(female_cp_fbs, values='count', path=['ChestPainType', 'FastingBS'])

sb3 = px.sunburst(male_st_ecg, values='count', path=['ST_Slope', 'RestingECG'])
sb4 = px.sunburst(female_st_ecg, values='count', path=['ST_Slope', 'RestingECG'])

sb5 = px.sunburst(male_ea_cp, values='count', path=['ExerciseAngina', 'ChestPainType'])
sb6 = px.sunburst(female_ea_cp, values='count', path=['ExerciseAngina', 'ChestPainType'])

## Subplots
fig = make_subplots(rows=3, cols=2, specs=[
    [{"type": "sunburst"}, {"type": "sunburst"}],
    [{"type": "sunburst"}, {"type": "sunburst"}],
    [{"type": "sunburst"}, {"type": "sunburst"}]
], subplot_titles=("Male Chest Pain with Fasting Blood Sugar", "Female Chest Pain with Fasting Blood Sugar",
                   "Male ST Slope with Resting ECG", "Female ST Slope with Resting ECG",
                   "Male Exercise Angina with Chest Pain Type", "Female Exercise Angina with Chest Pain Type"))

## Plotting Figures
fig.add_trace(sb1.data[0], row=1, col=1)
fig.add_trace(sb2.data[0], row=1, col=2)
fig.add_trace(sb3.data[0], row=2, col=1)
fig.add_trace(sb4.data[0], row=2, col=2)
fig.add_trace(sb5.data[0], row=3, col=1)
fig.add_trace(sb6.data[0], row=3, col=2)

fig.update_traces(textinfo="label+percent parent")

# Update title and height
fig.update_layout(title_text="Male vs Female Sunburst", title_x=0.5, height=1200, template='plotly_dark', showlegend=False,
        font=dict(
            family="Rubik",
            size=14)
)

fig.show()

<a id="7"></a>
# <div style="text-align: center; background-color: #191414;font-size:120%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>7. Heart Disease Histograms</b></div>

In [16]:
fig = px.histogram(df, x="Age", color="HeartDisease", marginal="violin", template='plotly_dark')

fig.update_layout(title_text="Age vs Heart Disease", title_x=0.5, height=500, template='plotly_dark',
        font=dict(
            family="Rubik",
            size=14)
)

fig.show()

In [17]:
fig = px.histogram(df, x="MaxHR", color="HeartDisease", marginal="violin", template='plotly_dark')

fig.update_layout(title_text="Max Heart Rate vs Heart Disease", title_x=0.5, height=500, template='plotly_dark',
        font=dict(
            family="Rubik",
            size=14)
)

fig.show()

<a id="8"></a>
# <div style="text-align: center; background-color: #191414;font-size:120%; font-family:Courier New; color: #EE4355; padding: 20px; line-height: 1;border-radius:10px"><b>The End</b></div>

<a href="#top">Back to top</a>