# Predicting Angiographic Disease Status

# Introduction
With the advent of globalisation, we have seen an exponential improvement in the quality of life for people globally. Inventions from plastics and the internet have revolutionised the way human society functions. With globalisation, we saw both industrialisation and innovations in food technology. But, like everything, this has positively and negatively impacted human life. 

In our modern times, we have witnessed exorbitant increases in the prevalence rates of diseases fueled by unhealthy lifestyles, which have long-lasting effects on people's lives.

Angiographic disease refers to a condition which is associated with blood vessels and blood flow through these vessels. With this project, we are analysing the *processed.cleveland.data* file and using the data to predict the type of chest pain an individual has in the event the individual is sick.

Chest pains are classified into the four following types:
1. Typical angina
2. Atypical angina
3. Non-anginal pain
4. Asymptomatic

# Preliminary Exploratory Data Analysis

# Methods

# Expected Outcomes And Significance
## 1. What do we expect to find?
With our project we expect be able to predict the angeographic heart disease an individual has using the other columns as predictors
## 2. What impact could such findings have?
This 
## 3. What future questions could this lead to?

# Glossary
To help keep track of the terms that may be ambigious to the reader, we have add a glossary so that you can quickly check what different terms mean.

X1. (age)

X2. (sex)\
(1 = male; 0 = female)

X3. (cp)\
(chest pain type)
-- Value 1: typical angina
-- Value 2: atypical angina
-- Value 3: non-anginal pain
-- Value 4: asymptomatic

X4. (trestbps)\
resting blood pressure (in mm Hg on admission to the hospital)

X5. (chol)\
serum cholestoral in mg/dl

X6. (fbs)\
(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

X7. (restecg)\
 resting electrocardiographic results
-- Value 0: normal
-- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
-- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

X8. (thalach)\
maximum heart rate achieved

X9. (exang)\
exercise induced angina (1 = yes; 0 = no)

X10. (oldpeak)\
ST depression induced by exercise relative to rest

X11. (slope)\
the slope of the peak exercise ST segment
-- Value 1: upsloping
-- Value 2: flat
-- Value 3: downsloping

X12. (ca)\
number of major vessels (0-3) colored by flourosopy

X13. (thal)\
3 = normal; 6 = fixed defect; 7 = reversable defect

X14. (num) (the predicted attribute)\
diagnosis of heart disease (angiographic disease status)
-- Value 0: < 50% diameter narrowing
-- Value 1: > 50% diameter narrowing
(in any major vessel: attributes 59 through 68 are vessels)

In [15]:
# Importing Necessary Libraries
library(tidyverse)
# library(testthat)
# library(digest)
# library(repr)
# library(tidymodels)


# To make the code cleaner to easy to read
# options(repr.matrix.max.rows = 6)

In [10]:
# Loadind Data
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
heart_disease_data <- read_csv(url, 
                               col_names = FALSE)

# Covering Data Frame To Tibble
heart_disease_data <- as_tibble(heart_disease_data)

# Checking Data
heart_disease_data

ERROR: Error in read_csv(url, col_names = FALSE): could not find function "read_csv"


In [35]:
# Preprocessing Of Data
heart_disease_data <-   heart_disease_data |>
                        mutate(X2  = as.factor(X2) ,
                               X3  = as.factor(X3) ,
                               X6  = as.factor(X6) ,
                               X7  = as.factor(X7) ,
                               X9  = as.factor(X9) ,
                               X11 = as.factor(X11),
                               X12 = as.factor(X12),
                               X13 = as.factor(X13),
                               X14 = as.factor(X14))

# Renaming Columns
names <- c("age","sex","cp","trestbps","chol","fbs","restecg","thalach","exang","oldpeak","slope","ca","thal","num")

colnames(heart_disease_data) <- make.names(names)

heart_disease_data

age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num
<dbl>,<fct>,<fct>,<dbl>,<dbl>,<fct>,<fct>,<dbl>,<fct>,<dbl>,<fct>,<fct>,<fct>,<fct>
63,1,1,145,233,1,2,150,0,2.3,3,0.0,6.0,0
67,1,4,160,286,0,2,108,1,1.5,2,3.0,3.0,2
67,1,4,120,229,0,2,129,1,2.6,2,2.0,7.0,1
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
57,1,4,130,131,0,0,115,1,1.2,2,1.0,7.0,3
57,0,2,130,236,0,2,174,0,0.0,2,1.0,3.0,1
38,1,3,138,175,0,0,173,0,0.0,1,?,3.0,0


In [8]:
#Splitting
set.seed(45768)
heart_disease_data_split<-initial_split(heart_disease_data,prop=.75,strata="num")
training_data<-training(heart_disease_data_split)
testing_data<-testing(heart_disease_data_split)

ERROR: Error in initial_split(heart_disease_data, prop = 0.75, strata = "num"): could not find function "initial_split"


In [36]:
# Visualizations

In [32]:
# Classifications

In [33]:
# Regressions