### Preliminary exploratory data analysis ###

In [19]:
 ### Run this cell before continuing.
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
source('tests.R')
source('cleanup.R')

“cannot open file 'tests.R': No such file or directory”


ERROR: Error in file(filename, "r", encoding = encoding): cannot open the connection


**Demonstrate that the dataset can be read from the web into R.**

The dataset, 'processed_cleveland', can be downloaded from the following website: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Note that there are no column names in the dataset, but these can be manually added with reference to the attribute information provided.

**Clean and wrangle your data into a tidy format.**

In [62]:
processed_cleveland <- read_csv("data/processed.cleveland.data", col_names = FALSE) |>
    rename(
        age = X1, 
        sex = X2, 
        cp = X3,
        trestbps = X4, 
        chol = X5,
        fbs = X6, 
        restecg = X7, 
        thalach = X8,
        exang = X9, 
        oldpeak = X10, 
        slope = X11, 
        ca = X12, 
        thal = X13, 
        num = X14
    ) |>
    mutate(cp = as_factor(cp))

processed_cleveland

[1mRows: [22m[34m303[39m [1mColumns: [22m[34m14[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (2): X12, X13
[32mdbl[39m (12): X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X14

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num
<dbl>,<dbl>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>
63,1,1,145,233,1,2,150,0,2.3,3,0.0,6.0,0
67,1,4,160,286,0,2,108,1,1.5,2,3.0,3.0,2
67,1,4,120,229,0,2,129,1,2.6,2,2.0,7.0,1
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
57,1,4,130,131,0,0,115,1,1.2,2,1.0,7.0,3
57,0,2,130,236,0,2,174,0,0.0,2,1.0,3.0,1
38,1,3,138,175,0,0,173,0,0.0,1,?,3.0,0


Since sex and chest pain types are in numbers instead of it's corrosponding name, we are going to rename them.

In [92]:
# Create a wide data so I easily modifty each cell names in a column
cleveland_sex_wide <- processed_cleveland |>
                        pivot_wider(names_from = sex, values_from = cp) |>
                        rename("male" = "1", "female" = "0")

cleveland_sex_normal <- cleveland_sex_wide |>
                        pivot_longer(cols = male:female, names_to = "sex", values_to = "cp") |>
                        drop_na() |>
                        mutate(sex = as_factor(sex))

cleveland_cp_wide <- cleveland_sex_normal |>
                        pivot_wider(names_from = cp, values_from = sex) |>
                        rename("typical angina" = "1", "atypical angina" = "2", "non-anginal pain" = "3", "asymptomatic" = "4")

cleveland_cp_normal <- cleveland_cp_wide |>
                        pivot_longer(cols = "typical angina":"atypical angina", names_to = "cp", values_to = "sex") |>
                        drop_na() |>
                        mutate(cp = as_factor(cp))

cleveland_data_rename <- cleveland_cp_normal

cleveland_data_rename

age,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num,cp,sex
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<fct>,<fct>
63,145,233,1,2,150,0,2.3,3,0.0,6.0,0,typical angina,male
67,160,286,0,2,108,1,1.5,2,3.0,3.0,2,asymptomatic,male
67,120,229,0,2,129,1,2.6,2,2.0,7.0,1,asymptomatic,male
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
57,130,131,0,0,115,1,1.2,2,1.0,7.0,3,asymptomatic,male
57,130,236,0,2,174,0,0.0,2,1.0,3.0,1,atypical angina,female
38,138,175,0,0,173,0,0.0,1,?,3.0,0,non-anginal pain,male


Spittling the data into male and female add more description for later

In [93]:
cleveland_male <- cleveland_data_rename |>
                    filter(sex == "male")

cleveland_female <- cleveland_data_rename |>
                    filter(sex == "female")

cleveland_data_rename
cleveland_male
cleveland_female

age,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num,cp,sex
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<fct>,<fct>
63,145,233,1,2,150,0,2.3,3,0.0,6.0,0,typical angina,male
67,160,286,0,2,108,1,1.5,2,3.0,3.0,2,asymptomatic,male
67,120,229,0,2,129,1,2.6,2,2.0,7.0,1,asymptomatic,male
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
57,130,131,0,0,115,1,1.2,2,1.0,7.0,3,asymptomatic,male
57,130,236,0,2,174,0,0.0,2,1.0,3.0,1,atypical angina,female
38,138,175,0,0,173,0,0.0,1,?,3.0,0,non-anginal pain,male


age,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num,cp,sex
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<fct>,<fct>
63,145,233,1,2,150,0,2.3,3,0.0,6.0,0,typical angina,male
67,160,286,0,2,108,1,1.5,2,3.0,3.0,2,asymptomatic,male
67,120,229,0,2,129,1,2.6,2,2.0,7.0,1,asymptomatic,male
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
68,144,193,1,0,141,0,3.4,2,2.0,7.0,2,asymptomatic,male
57,130,131,0,0,115,1,1.2,2,1.0,7.0,3,asymptomatic,male
38,138,175,0,0,173,0,0.0,1,?,3.0,0,non-anginal pain,male


age,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num,cp,sex
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<fct>,<fct>
41,130,204,0,2,172,0,1.4,1,0.0,3.0,0,atypical angina,female
62,140,268,0,2,160,0,3.6,3,2.0,3.0,3,asymptomatic,female
57,120,354,0,0,163,1,0.6,1,0.0,3.0,0,asymptomatic,female
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
63,124,197,0,0,136,1,0.0,2,0.0,3.0,1,asymptomatic,female
57,140,241,0,0,123,1,0.2,2,0.0,7.0,1,asymptomatic,female
57,130,236,0,2,174,0,0.0,2,1.0,3.0,1,atypical angina,female


### Preliminary exploratory data analysis ###