# DSCI-100 Project Proposal: Classifying Heart disease

### Introduction:
Explain about heart disease, what variables contribute to heart disease (etc.) Make sure to put in sources



References:

### Data
Column names:
Here are the column names:
- age: age in years
- ca: number of major vessels (0-3)
- chol: serum cholestoral in mg/dl
- cp: chest pain type 
  - Value 1: typical angina
  - Value 2: atypical angina
  - Value 3: non-anginal pain
  - Value 4: asymptomatic
- exang: exercise induced angina 
  - Value 0: no
  - Value 1: yes
- fbs: (fasting blood sugar > 120 mg/dl) 
  - Value 0: false
  - Value 1: true
- num: diagnosis of heart disease 
  - Value 0: < 50% diameter narrowing
  - Value 1: > 50% diameter narrowing
- oldpeak: ST depression induced by exercise relative to rest
- restecg: resting electrocardiographic results
  - Value 0: normal
  - Value 1: having ST-T wave abnormality (T wave inversions and/or ST 
                    elevation or depression of > 0.05 mV)
  - Value 2: showing probable or definite left ventricular hypertrophy
                    by Estes' criteria
- sex: sex
  - Value 0 = female
  - Value 1 = male
- slope: the slope of the peak exercise ST segment
  - Value 1: upsloping
  - Value 2: flat
  - Value 3: downsloping
- thal
  - Value 3: normal
  - Value 6: fixed defect
  - Value 7: reverseable defect
- thalach: Maximum heart rate achieved
- trestbps: resting blood pressure (mm Hg)

Dataset retrieved from: https://archive.ics.uci.edu/ml/datasets/heart+Disease

In [1]:
# Load all libraries
library(repr)
library(tidyverse)
library(tidymodels)
options(repr.matrix.max.rows = 6)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.0     [32m✔[39m [34mrsample     [39m 1.0.0
[32m✔[39m [34mdials       [39m 1.0.0     [32m✔[39m [34mtune        [39m 1.0.0
[32m✔[39m [34minfer       [39m 1.0.2     [32m✔[39m [34mworkflows   [39m 1.0.0
[32m✔

In [3]:
# Load the dataset and set it's column names accordingly:
heart_all_data <- read_csv("data/processed.switzerland.data",
                  col_names = c("age", "sex", "cp", "trestbps", "chol", 
                              "fbs", "restecg", "thalach", "exang", 
                              "oldpeak", "slope", "ca", "thal", 
                              "num"))
# Change all ? into NA
heart_all_data[heart_all_data == "?"] <- NA

heart_all_data

[1mRows: [22m[34m123[39m [1mColumns: [22m[34m14[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (9): trestbps, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal
[32mdbl[39m (5): age, sex, cp, chol, num

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num
<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>
32,1,1,95,0,,0,127,0,.7,1,,,1
34,1,4,115,0,,,154,0,.2,1,,,1
35,1,4,,0,,0,130,1,,,,7,3
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
72,1,3,160,0,,2,114,0,1.6,2,2,,0
73,0,3,160,0,0,1,121,0,0,1,,3,1
74,1,2,145,0,,1,123,0,1.3,1,,,1


In [None]:
# Select and mutate columns that we're interested in into their appropriate datatypes
# Age, oldpeak, cholestrol

heart_data <- heart_all_data |>
              select(age, oldpeak, chol, num) |>
              mutate(a