In [1]:
library(tidyverse)
library(tidymodels)


── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.2     [32m✔[39m [34mpurrr  [39m 1.0.1
[32m✔[39m [34mtibble [39m 3.2.1     [32m✔[39m [34mdplyr  [39m 1.1.1
[32m✔[39m [34mtidyr  [39m 1.3.0     [32m✔[39m [34mstringr[39m 1.5.0
[32m✔[39m [34mreadr  [39m 2.1.3     [32m✔[39m [34mforcats[39m 0.5.2
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.2     [32m✔[39m [34mrsample     [39m 1.1.1
[32m✔[39m [34mdials       [39m 1.1.0     [32m✔[39m [34mtune        [39m 1.0.1
[32m✔[39m [34minfer       [39m 1.0.4     [32m✔[39m [34mworkflows   [39m 1.1.2
[32m✔[39

Heart (cardiovascular) disease is a term displaying the wide range of heart conditions directly affecting the heart, associated blood vessels, and muscles surrounding the heart (Heart and Stroke Foundation Canada, n.d.). Such effects can result in long-term or short-term effects on the function of the heart itself spreading to other internal organs. Amongst the range of “heart” diseases coronary artery disease, commonly found in the United States is where the patients' blood vessels are narrowed and constricts the amount of blood supplying the heart. There is a multitude of prospective factors that may influence the likelihood of developing coronary artery disease or any variant of cardiovascular disease, including but not limited to, fasting blood sugar, cholesterol, and resting blood pressure.   

High levels of resting blood pressure are amongst one of the leading causes of cardiovascular disease resulting in stroke. This is due to the damaging of the lining of the arteries which can increase the probability of plaque buildup which narrows the arteries leading to the heart. Additionally, increased intake of cholesterol can build up inside of the blood vessels and restrict the flow to the heart, brain, lungs and kidneys (Centers for Disease Prevention and Control, 2022). Similarly, studies have observed and indicated fasting blood sugar as an underlying predictor in mortality of heart disease and the effects on the heart (National Library of Medicine, 2013). 

The objective of this project is to classify and categorize patients on their potential risk in developing heart disease.  

The question we will be addressing is: What is the likelihood of a patient at risk for heart disease based on their cholesterol, fasting blood sugar, and resting blood pressure? 

Columns: 
      1. #3  (age)       
      2. #4  (sex)       
      3. #9  (cp)        
      4. #10 (trestbps)  
      5. #12 (chol)      
      6. #16 (fbs)       
      7. #19 (restecg)   
      8. #32 (thalach)   
      9. #38 (exang)     
      10. #40 (oldpeak)   
      11. #41 (slope)     
      12. #44 (ca)        
      13. #51 (thal)      
      14. #58 (num)       (the predicted attribute)
      
1. Age in years.
2. Sex of Patient: 1 = Male 0 = Female
3. CP:  chest pain type
        -- Value 1: typical angina
        -- Value 2: atypical angina
        -- Value 3: non-anginal pain
        -- Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl)  (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
        -- Value 0: normal
        -- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of >              0.05 mV)
        -- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11.  slope: the slope of the peak exercise ST segment
        -- Value 1: upsloping
        -- Value 2: flat
        -- Value 3: downsloping
12. ca: number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14.  num: diagnosis of heart disease (angiographic disease status)
        -- Value 0: < 50% diameter narrowing
        -- Value 1: > 50% diameter narrowing
        (in any major vessel: attributes 59 through 68 are vessels)


In [2]:
heart_data <- as_tibble(read.csv("heart.csv")) |>
    select(cp, fbs, chol, target) |>

    rename(chestpain = cp,
           fast_bp = fbs,
           cholesterol = chol, 
           heart_disease = target) |>

    mutate(heart_disease = as_factor(heart_disease)) |>
           #chestpain = as_character(chestpain)) #|>

    mutate(heart_disease = fct_recode(heart_disease, "Yes" = "1", "No" = "0"))
    
head(heart_data)

chestpain,fast_bp,cholesterol,heart_disease
<int>,<int>,<int>,<fct>
0,0,212,No
0,1,203,No
0,0,174,No
0,0,203,No
0,1,294,No
0,0,248,Yes


The columns in the data frame represent the following:   
chestpain: Chest Pain Type:    0 = None   
                        1 = Typical Angina    
                        2 = Atypical Angina   
                        3 = Non-Angina Pain   
                        4 = Asymptomatic (No values in the present table)  
                        
fasting_bp: Fasting blood sugar value in milligrams per deciliter (mg/dL) of blood.  
                        0 = Below 120 mg/dL  
                        1 = Above 120 mg/dL
                            
cholesterol: Serum Cholesterol in milligrams per deciliter (mg/dL) of blood. High cholesterol is considering to be over 240 mg/dL. 

heart_disease: Presence of heart disease in general.  
                        0 = No  
                        1 = Yes
