# Using MANOVAs to analyze heart attack data

In [1]:
# Load Libraries
library("mvnormtest")
library("car")

"package 'car' was built under R version 4.1.3"
Loading required package: carData

"package 'carData' was built under R version 4.1.3"


In [3]:
# Load Data
heartattack <- read.csv("heartAttacks.csv")

### Details: 
Sex (IV) resting blood pressure & cholesterol (DV)

### Question Setup
It is well-known that men are more likely to have heart attacks than women. How does gender (sex) influence some of the heart attack predictors like resting blood pressure (trestbps) and cholesterol (chol)?

### Requirements
1) Test for MANOVA assumptions
2) Run MANOVA

## Data Wrangling

In [5]:
# Ensure variables are numeric
str(heartattack$trestbps)
str(heartattack$chol)

 int [1:303] 145 130 130 120 120 140 140 120 172 150 ...
 int [1:303] 233 250 204 236 354 192 294 263 199 168 ...


In [8]:
# Convert to numeric
heartattack$trestbps <- as.numeric(heartattack$trestbps)
heartattack$chol <- as.numeric(heartattack$chol)

In [9]:
# Subsetting
keeps <- c("trestbps", "chol")
heartattack1 <- heartattack[keeps]

In [10]:
# Format as Matrix
heartattack2 <- as.matrix(heartattack1)

# Test Assumptions

### Sample Size
303 so this assumption is met

### Multivariate Normality

In [11]:
# Drop any missing values
heartattack3 <- na.omit(heartattack2)

In [12]:
mshapiro.test(t(heartattack3))


	Shapiro-Wilk normality test

data:  Z
W = 0.94568, p-value = 3.93e-09


# Results 
The p value is less than .-05 which violates the assumption of multivariate normality; however, we will continue for learning purposes

## Homogeneity of Variance

In [13]:
leveneTest(heartattack$trestbps, heartattack$sex, data=heartattack)

"heartattack$sex coerced to factor."


Unnamed: 0_level_0,Df,F value,Pr(>F)
Unnamed: 0_level_1,<int>,<dbl>,<dbl>
group,1,1.359311,0.24458
,301,,


In [14]:
leveneTest(heartattack$chol, heartattack$sex, data=heartattack)

"heartattack$sex coerced to factor."


Unnamed: 0_level_0,Df,F value,Pr(>F)
Unnamed: 0_level_1,<int>,<dbl>,<dbl>
group,1,11.37598,0.0008413142
,301,,


## Absense of Multicollinearity

In [15]:
cor.test(heartattack$trestbps, heartattack$chol, method="pearson", use="complete.obs")


	Pearson's product-moment correlation

data:  heartattack$trestbps and heartattack$chol
t = 2.1534, df = 301, p-value = 0.03208
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.01064389 0.23262366
sample estimates:
      cor 
0.1231742 


## The Analysis 

In [18]:
MANOVA <- manova(cbind(trestbps, chol) ~ sex, data = heartattack)
summary(MANOVA)

           Df   Pillai approx F num Df den Df   Pr(>F)   
sex         1 0.040235   6.2882      2    300 0.002112 **
Residuals 301                                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Post Hocs

In [20]:
summary.aov(MANOVA, test = "wilks") 

 Response trestbps :
             Df Sum Sq Mean Sq F value Pr(>F)
sex           1    299  299.36  0.9732 0.3247
Residuals   301  92592  307.61               

 Response chol :
             Df Sum Sq Mean Sq F value  Pr(>F)    
sex           1  31778   31778  12.271 0.00053 ***
Residuals   301 779523    2590                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


# Results
There is a significant difference in resting blood presure and cholesterol by sex.