# Fundamentals of Data Visualization Final Project
For this project we'll be looking at the Heart Disease Dataset which can be found here:
https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset?select=heart.csv

The dataset contains the following fields:
1. age: The person’s age in years
2. sex: The person’s sex (1 = male, 0 = female)
3. cp: chest pain type
    - Value 0: asymptomatic
    - Value 1: atypical angina
    - Value 2: non-anginal pain
    - Value 3: typical angina
4. trestbps: The person’s resting blood pressure (mm Hg on admission to the hospital)
5. chol: The person’s cholesterol measurement in mg/dl
6. fbs: The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)
7. restecg: resting electrocardiographic results
    - Value 0: showing probable or definite left ventricular hypertrophy by Estes’ criteria
    - Value 1: normal
    - Value 2: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
8. thalach: The person’s maximum heart rate achieved
9. exang: Exercise induced angina (1 = yes; 0 = no)
10. oldpeak: ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot)
11. slope: the slope of the peak exercise ST segment — 0: downsloping; 1: flat; 2: upsloping
12. ca: The number of major vessels (0–3) colored by flourosopy
13. thal: A blood disorder called thalassemia 
    - Value 0: NULL 
    - Value 1: fixed defect (no blood flow in some part of the heart)
    - Value 2: normal blood flow
    - Value 3: reversible defect (a blood flow is observed but it is not normal)
14. target: Heart Disease (0 or 1)

# Primary Goals
Our goals are to identify trends between the different heart measurements taken to identify relationships that could aid in the diagnosis of heart disease. Additionally, we'd like to see if these trends are impacted by things like sex and age. For this project, we will look at the following links:
1. Chest pain type vs. Heart Disease plotted alongside Age vs. Resting Blood Pressure
2. Maximum Heart rate vs. Resting Blood Pressure separated by Sex
3. Heart Disease separated by Sex plotted alongside Age vs. Cholesterol

In [13]:
# Modules used for this project
import polars as pl
import altair as alt
alt.renderers.enable("default")

RendererRegistry.enable('default')

In [3]:
# Import the data
url = "https://raw.githubusercontent.com/nsxydis/boulder_fdv/main/heart2.csv"
df = pl.read_csv(url)

In [39]:
# Tranformation Functions
def addSex(x):
    '''Converts 0 & 1 to Female and Male'''
    if x == 0:
        return "Female"
    elif x == 1:
        return "Male"
    else:
        return "Error"

def defineTarget(x):
    '''Returns No Disease or Diseased for 0 & 1 respectively'''
    if x == 0:
        return "No Disease"
    elif x == 1:
        return "Diseased"
    else:
        return "Error"

In [44]:
# Make any necessary manipulations
df = df.with_columns(pl.col('sex').apply(lambda x: addSex(x)).alias('Sex'))
df = df.with_columns(pl.col('target').apply(lambda x: defineTarget(x)).alias('Heart Disease'))

# Convert to pandas for altair
pd = df.to_pandas()

In [72]:
# Look at chest pain type versus heart disease
selection = alt.selection(type = 'multi', fields = ['cp', "Heart Disease"])
chestPain = alt.Chart(pd).mark_bar().encode(
    x = alt.X('cp:N', title = "Chest Pain Type"),
    y = 'count()',
    color = 'Heart Disease:N',
    column = 'Heart Disease:N'
).add_selection(selection)

# Add side graph of age versus blood pressure
pressure = alt.Chart(pd).mark_circle().encode(
    x = alt.X('age', title = "Patient Age in Years"),
    y = alt.Y('trestbps', title = "Resting Blood Pressure"),
    color = 'Sex:N'
).transform_filter(selection)

# Add a linear regression to the graph
params = pressure.transform_regression('age', 'trestbps').mark_line().encode(
    color = alt.Color('Regression:N')
).transform_fold(
    ['Linear Regression'],
    as_ = ["Regression", "y"]
)

pressure = pressure + params
chestPain | pressure

# Chest Pain versus Heart Disease


In [66]:
# Maximum Heart rate vs. Resting Blood Pressure separated by Sex
selection = alt.selection(type='multi', fields = ['Heart Disease'])
heartrate = alt.Chart(pd).mark_circle().encode(
    x = alt.X('thalach', title = "Maximum Heart Rate"),
    y = alt.Y('trestbps', title = "Resting Blood Pressure"),
    color = alt.condition(selection, "Heart Disease", alt.value('lightgray')),
    column = "Sex:N",
    tooltip = [
        'Sex', 
        alt.Tooltip('thalach', title = "Maximum Heart Rate"),
        alt.Tooltip('trestbps', title = "Resting Blood Pressure")
        ]
).add_selection(selection
).interactive()

heartrate

In [68]:
# Heart Disease separated by Sex plotted alongside Age vs. Cholesterol
selection = alt.selection(type = 'multi', fields = ['Sex', 'Heart Disease'])
disease = alt.Chart(pd).mark_bar().encode(
    x = 'Sex:N',
    y = 'count()',
    column = "Heart Disease:N"
).add_selection(selection)

cholesterol = alt.Chart(pd).mark_circle().encode(
    x = alt.X('age', title = "Patient Age in Years"),
    y = alt.Y('chol', title = "Cholestrol Level"),
    color = "Heart Disease:N"
).transform_filter(selection)

# Add a linear regression to the graph
params = cholesterol.transform_regression('age', 'chol').mark_line().encode(
    color = alt.Color('Regression:N')
).transform_fold(
    ['Linear Regression'],
    as_ = ["Regression", "y"]
)
cholesterol = cholesterol + params
disease | cholesterol