/
11-ANCOVA.Rmd
752 lines (532 loc) · 54.1 KB
/
11-ANCOVA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
# Analysis of Covariance {#ANCOVA}
[Screencasted Lecture Link](https://youtube.com/playlist?list=PLtz5cFLQl4KM2GjVUMy1Vy816d5lgbOvi)
```{r include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(comment = NA) #keeps out the hashtags in the knits
options(scipen=999)#eliminates scientific notation
```
The focus of this lecture is analysis of covariance. Sticking with the same research vignette as we used for the mixed design ANOVA, we rearrange the variables a bit to see how they work in an ANCOVA design. The results help clarify the distinction between *moderator* and *covariate.*
## Navigating this Lesson
There is about just about an hour of lecture. If you work through the materials with me, plan for an additional hour or two
While the majority of R objects and data you will need are created within the R script that sources the chapter, occasionally there are some that cannot be created from within the R framework. Additionally, sometimes links fail. All original materials are provided at the [Github site](https://github.com/lhbikos/ReCenterPsychStats) that hosts the book. More detailed guidelines for ways to access all these materials are provided in the OER's [introduction](#ReCintro)
### Learning Objectives
Learning objectives from this lecture include the following:
* Define a *covariate* and distinguish it from a *moderator.*
* Recognize the case where ANCOVA is a defensible statistical approach for analyzing the data.
* Name and test the assumptions underlying ANCOVA.
* Analyze, interpret, and write up results for ANCOVA.
* List the conditions that are prerequisite for the appropriate use of a covariate or control variable.
### Planning for Practice
In each of these lessons I provide suggestions for practice that allow you to select from problems that vary in degree of difficulty The least complex is to change the random seed and rework the problem demonstrated in the lesson. The results *should* map onto the ones obtained in the lecture.
The second option comes from the research vignette. For this ANCOVA article, I take a lot of liberties with the variables and research design. You could further mix and match for a different ANCOVA constellation.
As a third option, you are welcome to use data to which you have access and is suitable for ANCOVA. In either case the practice options suggest that you:
* test the statistical assumptions
* conduct an ANCOVA, including
- omnibus test and effect size
- report main effects and engage in any follow-up testing
- interpret results in light of the role of the second predictor variable as a *covariate* (as opposed to the moderating role in the prior lessons)
* write a results section to include a figure and tables
### Readings & Resources
In preparing this chapter, I drew heavily from the following resource(s). Other resources are cited (when possible, linked) in the text with complete citations in the reference list.
* Green, S. B., & Salkind, N. J. (2017). One-Way Analysis of Covariance (Lesson 27). In *Using SPSS for Windows and Macintosh: Analyzing and understanding data* (Eighth edition., pp. 151–160). Boston: Pearson. OR
- This lesson provides an excellent review of ANCOVA with examples of APA style write-ups. The downside is that it is written for use in SPSS.
* ANCOVA in R: The Ultimate Practical Guide. (n.d.). Retrieved from https://www.datanovia.com/en/lessons/ancova-in-r/
- This is the workflow we are using for the lecture and written specifically for R.
* Bernerth, J. B., & Aguinis, H. (2016). A critical review and best‐practice recommendations for control variable usage. *Personnel Psychology, 69*(1), 229–283. https://doi.org/10.1111/peps.12103
- An article from the industrial-organizational psychology world. Especially relevant for this lesson is the flowchart on page 273 and the discussion (pp. 270 to the end).
* Murrar, S., & Brauer, M. (2018). Entertainment-education effectively reduces prejudice. *Group Processes & Intergroup Relations, 21*(7), 1053–1077. https://doi.org/10.1177/1368430216682350
- This article is the source of our research vignette. I used this same article in the lesson on [mixed design ANOVA](#Mixed). Swapping variable roles can be useful in demonstrating how ANCOVA is different than mixed design ANOVA.
### Packages
The packages used in this lesson are embedded in this code. When the hashtags are removed, the script below will (a) check to see if the following packages are installed on your computer and, if not (b) install them.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#used to convert data from long to wide
#if(!require(reshape2)){install.packages("reshape2")}
#if(!require(broom)){install.packages("broom")}
#if(!require(tidyverse)){install.packages("tidyverse")}
#if(!require(psych)){install.packages("psych")}
#easy plots
#if(!require(ggpubr)){install.packages("ggpubr")}
#pipe-friendly R functions
#if(!require(rstatix)){install.packages("rstatix")}
#export objects for table making
#if(!require(MASS)){install.packages("MASS")}
#if(!require(knitr)){install.packages("knitr")}
#if(!require(dplyr)){install.packages("dplyr")}
#if(!require(apaTables)){install.packages("apaTables")}
```
## Introducing Analysis of Covariance (ANCOVA)
Analysis of covariance (ANCOVA) evaluates the null hypothesis that
* population means on a dependent variable are equal across levels of a factor(s) adjusting for differences on a covariate(s); stated differently -
* the population adjusted means are equal across groups
This lecture introduces a distinction between **moderators** and **covariates**.
**Moderator**: a variable that changes the strength or direction of an effect between two variables X (predictor, independent variable) and Y (criterion, dependent variable).
**Covariate**: an observed, continuous variable, that (when used properly) has a relationship with the dependent variable. It is included in the analysis, as a predictor, so that the predictive relationship between the independent (IV) and dependent (DV) are adjusted.
Understanding this difference may be facilitated by understanding one of the assumptions of ANCOVA -- that the slopes relating the covariate to the dependent variable are the same for all groups (i.e., the homogeneity-of-slopes assumption). If this assumption is violated then the between-group differences in adjusted means are not interpretable and the covariate should be treated as a moderator and analyses that assess the simple main effects (i.e., follow-up to a significant interaction) should be conducted.
A one-way ANCOVA requires three variables:
* IV/factor -- categorical (2 or more)
* DV -- continuous
* covariate -- continuous
Green and Salkind [-@green_one-way_2017] identified common uses of ANCOVA:
* Studies with a pretest and random assignment of subjects to factor levels. Variations on this research design include:
- assignment to factor levels based on that pretest,
- matching based on the pretest, and random assignment to factor levels,
- simply using the pretest as a covariate for the posttest DV.
* Studies with a potentially confounding variable (best when there is theoretical justification and prior empirical evidence for such) over which the researcher wants "control"
Although it is possible to have multi-way (e.g., 2-way, 3-way) ANCOVA, in this lecture we will only work two, one-way ANCOVAs representing these common use cases.
### Workflow for ANCOVA
Our analytic process will be similar to others in the ANOVA series. An ANCOVA workflow maps this in further detail.
![Image of the ANCOVA workflow](images/ANCOVA/wf_ancova.jpg)
1. Prepare the data
2. Evaluate potential violation of the assumptions
3. Compute the omnibus ANCOVA, and follow-up accordingly
- If significant: follow-up with post-hoc comparisons, planned contrasts, and/or polynomial
- If non-significant: stopping.
ANCOVA has four primary assumptions:
**Linearity**: The covariate is linearly related to the dependent variable within all levels of the factor (IV).
**Homogeneity of regression slopes**: The weights or slopes relating the covariate to the DV are equal across all levels of the factor.
**Normally distributed**: The DV is normally distributed in the population for any specific value of the covariate and for any one level of a factor. This assumption applies to every combination of the values of the covariate and levels ohttps://www.datanovia.com/en/lessons/ancova-in-r/f the factor and requires them all to be normally distributed. To the degree that population distributions are not normal and sample sizes are small, *p* values may not be trustworthy and power reduced. Evaluating this is frequently operationalized by inspecting the residuals and identifying outliers.
**Homogeneity of variances**: The variances of the DV for the conditional distributions (i.e., every combination of the values of the covariate and levels of the factor) are equal.
We are following the approach to analyzing ANCOVA identified in the Datanovia lesson on ANCOVA [@datanovia_ancova_nodate].
## Research Vignette
We will continue with the example used in the [mixed design ANOVA lesson](#Mixed) The article does not contain any ANCOVA analyses, but there is enough data that I can demonstrate the two general ways (i.e., controlling for the pretest, controlling for a potentially confounding variable) that ANCOVA is used.
Here is a quick reminder of the research vignette.
Murrar and Brauer's [-@murrar_entertainment-education_2018] article described the results of two studies designed to reduce prejudice against Arabs/Muslims. In the lesson on mixed design ANOVA, we only worked the first of two experiments reported in the study. Participants (*N* = 193), all who were White, were randomly assigned to one of two conditions where they watched six episodes of the sitcom [*Friends*](http://www.friends-tv.org/) or [*Little Mosque on the Prairie*](https://en.wikipedia.org/wiki/Little_Mosque_on_the_Prairie). The sitcoms and specific episodes were selected after significant pilot testing. The selection was based on the tension selecting stimuli that were as similar as possible, yet the intervention-oriented sitcom needed to invoke psychological processes known to reduce prejudice. The authors felt that both series had characters that were likable and relateble who were engaged in activities of daily living. The Friends series featured characters who were predominantly White, cis-gendered, and straight. The Little Mosque series portrays the experience Western Muslims and Arabs as they live in a small Canadian town. This study involved assessment across three waves: baseline (before watching the assigned episodes), post1 (immediately after watching the episodes), and post2 (completed 4-6 weeks after watching the episodes).
The study used *feelings and liking thermometers*, rating their feelings and liking toward 10 different groups of people on a 0 to 100 sliding scale (with higher scores reflecting greater liking and positive feelings). For the purpose of this analysis, the ratings of attitudes toward White people and attitudes toward Arabs/Muslims were used. A third metric was introduced by subtracting the attitudes towards Arabs/Muslims from the attitudes toward Whites. Higher scores indicated more positive attitudes toward Whites where as low scores indicated no difference in attitudes. To recap, there were three potential dependent variables, all continuously scaled:
* AttWhite: attitudes toward White people; higher scores reflect greater liking
* AttArab: attitudes toward Arab people; higher scores reflect greater liking
* Diff: the difference between AttWhite and AttArab; higher scores reflect a greater liking for White people
With random assignment, nearly equal cell sizes, a condition with two levels (Friends, Little Mosque), and three waves (baseline, post1, post2), this is perfect for mixed design ANOVA but suitable for an ANCOVA demonstration.
![Image of the design for the Murrar and Brauer (2018) study](images/mixed/Murrar_design.jpg)
### Data Simulation
Below is the code I have used to simulate the data. The simulation includes two dependent variables (AttWhite, AttArab), Wave (baseline, post1, post2), and COND (condition; Friends, Little_Mosque). There is also a caseID (repeated three times across the three waves) and rowID (giving each observation within each case an ID). You can use this simulation for two of the three practice suggestions.
```{r message=FALSE, warning=FALSE, tidy=TRUE, tidy.opts=list(width.cutoff=70)}
library(tidyverse)
#change this to any different number (and rerun the simulation) to rework the chapter problem
set.seed(210813)
#sample size, M and SD for each cell; this will put it in a long file
AttWhite<-round(c(rnorm(98,mean=76.79,sd=18.55),rnorm(95,mean=75.37,sd=18.99),rnorm(98, mean=77.47, sd=18.95), rnorm(95, mean=75.81, sd=19.29), rnorm(98, mean=77.79, sd=17.25), rnorm(95, mean=75.89, sd=19.44)),3)
#set upper bound for variable
AttWhite[AttWhite>100]<-100
#set lower bound for variable
AttWhite[AttWhite<0]<-0
AttArab<-round(c(rnorm(98,mean=64.11,sd=20.97),rnorm(95,mean=64.37,sd=20.03),rnorm(98, mean=64.16, sd=21.64), rnorm(95, mean=70.52, sd=18.55), rnorm(98, mean=65.29, sd=19.76), rnorm(95, mean=70.30, sd=17.98)),3)
#set upper bound for variable
AttArab[AttArab>100]<-100
#set lower bound for variable
AttArab[AttArab<0]<-0
rowID <- factor(seq(1,579))
caseID <- rep((1:193),3)
Wave <- c(rep("Baseline",193), rep("Post1", 193), rep ("Post2", 193))
COND <- c(rep("Friends", 98), rep("LittleMosque", 95), rep("Friends", 98), rep("LittleMosque", 95), rep("Friends", 98), rep("LittleMosque", 95))
#groups the 3 variables into a single df: ID#, DV, condition
Murrar_df<- data.frame(rowID, caseID, Wave, COND, AttArab, AttWhite)
#make caseID a factor
Murrar_df[,'caseID'] <- as.factor(Murrar_df[,'caseID'])
#make Wave an ordered factor
Murrar_df$Wave <- factor(Murrar_df$Wave, levels = c("Baseline", "Post1", "Post2"))
#make COND an ordered factor
Murrar_df$COND <- factor(Murrar_df$COND, levels = c("Friends", "LittleMosque"))
#creates the difference score
Murrar_df$Diff <- Murrar_df$AttWhite - Murrar_df$AttArab
```
Let's check the structure. We want
* rowID and caseID to be unordered factors,
* Wave and COND to be ordered factors,
* AttArab, AttWhite, and Diff to be numerical
```{r}
str(Murrar_df)
```
The structure looks satisfactory. R will automatically "order" factors alphabetically or numerically. In this lesson's example the alphabettical ordering (i.e., Baseline, Post1, Post2; Friends, LittleMosque) is consistent with the logic in our study.
If you want to export this data as a file to your computer, remove the hashtags to save it (and re-import it) as a .csv ("Excel lite") or .rds (R object) file. This is not a necessary step.
The code for the .rds file will retain the formatting of the variables, but is not easy to view outside of R. This is what I would do. *Note: My students and I have discovered that the the psych::describeBy() function seems to not work with files in the .rds format, but does work when the data are imported with .csv.*
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#to save the df as an .rds (think "R object") file on your computer;
#it should save in the same file as the .rmd file you are working with
#saveRDS(Murrar_df, "Murrar_RDS.rds")
#bring back the simulated dat from an .rds file
#Murrar_df <- readRDS("Murrar_RDS.rds")
```
The code for .csv will likely lose the formatting (i.e., stripping Wave and COND of their ordered factors), but it is easy to view in Excel.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#write the simulated data as a .csv
#write.table(Murrar_df, file="DiffCSV.csv", sep=",", col.names=TRUE, row.names=FALSE)
#bring back the simulated dat from a .csv file
#Murrar_df <- read.csv ("DiffCSV.csv", header = TRUE)
```
## Working the ANCOVA -- Scenario #1: Controlling for the pretest
So that we can begin to understand how the covariate operates, we are going to predict attitudes towards Arabs at post-test (AttArabP1) by condition (COND), controlling for attitudes toward Arabs at baseline (AttArabB). You may notice that in this analysis we are ignoring the second post-test. This is because I am simply demonstrating ANCOVA. To ignore the second post test would be a significant loss of information.
### Preparing the data
When the covariate in ANCOVA is a pretest, we need three variables:
* IV that has two or more levels; in our case it is the Friends and Little Mosque conditions
* DV that is continuous; in our case it is the attitudes toward Arabs at post1
* Covariate that is continuous; in our case it is the attitudes toward Arabs at baseline
The form of our data matters. The simulation created a *long* form (formally called the *person-period* form) of data. That is, each observation for each person is listed in its own row. In this dataset where we have 193 people with 3 observation (baseline, post1, post2) each, we have 579 rows. In ANCOVA where we use the pre-test as a covariate, we need all the data to be on a single row.This is termed the *person level* form of data. We can restructure the data with the *data.table* and *reshape2*()* packages.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
# Create a new df (Murrar_wide)
# Identify the original df
#In the transition from long-to-wide it seems like you can only do one time-varying variable at a time
#When there are multiple time-varying and time-static variables,
#put all the time-static variables on the left side of the tilde
#Put the name of the single time-varying variable in the concatonated list
Murrar1 <- reshape2::dcast(data = Murrar_df, formula =caseID +COND ~ Wave, value.var = "AttArab")
#before restructuring a second variable, rename the first variable
Murrar1 <- rename(Murrar1, AttArabB = "Baseline", AttArabP1 = "Post1", AttArabP2 = "Post2")
#repeat the process for additional variables; but give the new df new names -- otherwise you'll overwrite your work
Murrar2 <- reshape2::dcast(data = Murrar_df, formula =caseID ~Wave, value.var = "AttWhite")
Murrar2 <- rename(Murrar2, AttWhiteB = "Baseline", AttWhiteP1 = "Post1", AttWhiteP2 = "Post2")
#Now we join them
Murrar_wide <- dplyr::full_join(Murrar1, Murrar2, by = c("caseID"))
str(Murrar_wide )
```
If you want to export this data as a file to your computer, remove the hashtags to save it (and re-import it) as a .csv ("Excel lite") or .rds (R object) file. This is not a necessary step.
The code for the .rds file will retain the formatting of the variables, but is not easy to view outside of R. This is what I would do.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#to save the df as an .rds (think "R object") file on your computer;
#it should save in the same file as the .rmd file you are working with
#saveRDS(Murrar_wide, "MurrarW_RDS.rds")
#bring back the simulated dat from an .rds file
#Murrar_wide <- readRDS("MurrarW_RDS.rds")
```
The code for .csv will likely lose the formatting (i.e., stripping Wave and COND of their ordered factors), but it is easy to view in Excel.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#write the simulated data as a .csv
#write.table(Murrar_wide, file="MurrarW_CSV.csv", sep=",", col.names=TRUE, row.names=FALSE)
#bring back the simulated dat from a .csv file
#Murrar_wide <- read.csv ("MurrarW_CSV.csv", header = TRUE)
```
### Evaluating the statistical assumptions
There are a number of assumptions in ANCOVA. These include:
* random sampling
* independence in the scores representing the dependent variable
- there is, of course, intentional dependence in any repeated measures or within-subjects variable
* linearity of the relationship between the covariate and DV within all levels of the independent variable
* homogeneity of the regression slopes
* a normally distributed DV for any specific value of the covariate and for any one level of a factor
* homogeneity of variance
These are depicted in the flowchart, below.
![Image of the ANCOVA workflow, showing our current place in the process](images/ANCOVA/wf_ANCOVA_assumptions.jpg)
#### Linearity assumption
ANCOVA assumes that there is linearity between the covariate and outcome variable at each level of the grouping variable. In our case this means that there is linearity between the pre-test (covariate) and post-test (outcome variable) at each level of the intervention (Friends, Little Mosque).
We can create a scatterplot (with regression lines) between covariate (our pretest) and the outcome (post-test1).
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
ggpubr::ggscatter (
Murrar_wide, x = "AttArabB", y = "AttArabP1",
color = "COND", add = "reg.line"
)+
ggpubr::stat_regline_equation(
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~"), color = COND)
)
```
As in not surprising (because we tested a similar set of variables in the mixed design chapter), this relationship look like an interaction effect. Let's continue our exploration.
#### Homogeneity of regression slopes
This assumption requires that the slopes of the regression lines formed by the covariate and the outcome variable are the same for each group. The assumption evaluates that there is no interaction between the outcome and covariate. The plotted regression lines should be parallel.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
Murrar_wide %>% rstatix::anova_test(AttArabP1 ~ COND*AttArabB)
```
Because the statistically significant interaction term is violation of homogeneity of regression slopes $(F [1, 189] = 4.297, p = .040, \eta^2 = 0.022)$ we should not proceed with ANCOVA as a statistical option. However, for the sake of demonstration, I will continue. One of the reasons I wanted to work this example as ANCOVA is to demonstrate that covariates and moderators each have their role. We can already see how this data is best analyzed with mixed design ANOVA.
#### Normality of residuals
Our goal here is to specify a model and extract *residuals*: the difference between the observed value of the DV and its predicted value. Each data point has one residual. The sum and mean of residuals are equal to 0.
Once we have saved the residuals, we can treat them as data and evaluate the shape of their distribution. We hope that the distribution is not statistically significantly different from a normal one. We first compute the model with *lm()* (lm stands for "linear model"). This is a linear regression.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#Create a linear regression model predicting DV from COV & IV
AttArabB_Mod <- lm (AttArabP1 ~ AttArabB + COND, data = Murrar_wide)
AttArabB_Mod
```
With the *broom::augment()* function we can augment our *lm()* model object to add fitted values and residuals.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#new model by augmenting the lm model
AttArabB_Mod.metrics <- broom::augment(AttArabB_Mod)
#shows the first three rows of the UEmodel.metrics
head(AttArabB_Mod.metrics,3)
```
From this, we can assess the normality of the residuals using the Shapiro Wilk test
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#apply shapiro_test to that augmented model
rstatix::shapiro_test(AttArabB_Mod.metrics$.resid)
```
The statistically significant Shapiro Wilk test has indicated a violation of the normality assumption (*W* = 0.984, *p* = .026).
#### Homogeneity of variances
ANCOVA presumes that the variance of the residuals is equal for all groups. We can check this with the Levene's test.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
AttArabB_Mod.metrics %>% rstatix::levene_test(.resid ~ COND)
```
A non-significant Levene's test indicated no violation of the homogeneity of the residual variances for all groups $(F[1, 191] = 3.515 p = .062)$.
#### Outliers
We can identify outliers by examining the standardized (or studentized) residuals. This is the residual divided by its estimated standard error. Standardized residuals are interpreted as the number of standard errors away from the regression line.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#from our model metrics
#show us any standardized residuals that are >3
AttArabB_Mod.metrics%>%
filter(abs(.std.resid)>3)%>%
as.data.frame()
```
We do have one outlier with a standardized residual that has an absolute value greater than 3. At this point I am making a mental note of this. If this were "for real" I might more closely inspect these data. I would look at the whole response. If any response seemed invalid (e.g., random, extreme, or erratic responding) I would delete it. If the responses seemed valid, I *could* truncate them to exactly 3 SEs or. I could also ignore it. Kline [-@kline_data_2016] has a great section on some of these options.
Code for deleting outliers can be found in earlier chapters, including [Mixed Design ANOVA](#Mixed).
As noted by the suggestion of an interaction effect, our preliminary analyses suggests that ANCOVA is not the best option. We know from the prior lesson that a mixed design ANOVA worked well. In the spirit of an example, here's a preliminary write-up so far:
#### Summarizing results from the analysis of assumptions
>A one-way analysis of covariance (ANCOVA) was conducted. The independent variable, condition, had two levels: Friends, Little Mosque. The dependent variable was attitudes towards Arabs expressed by the participant at post-test and covariate was the pre-test assessment of the same variable. A preliminary analysis evaluating the homogeneity-of-slopes assumption indicated that the relationship between the covariate and the dependent variable differed significantly as a function of the independent variable, $F (1, 189) = 4.297, p = .040, \eta^2 = 0.022$. Regarding the assumption that the dependent variable is normally distributed in the population for any specific value of the covariate and for any one level of a factor, results of the Shapiro-Wilk test of normality on the model residuals was also significant,$W = 0.984, p = .026$. Only one datapoint (in the Little Mosque condition) had a standardized residual (-3.37) that exceeded an absolute value of 3.0. A non-significant Levene's test indicated no violation of the homogeneity of the residual variances for all groups, $F(1, 191) = 3.515, p = .062$.
### Calculating the Omnibus ANOVA
We are ready to conduct the omnibus ANOVA.
![Image of the ANCOVA workflow, showing our current place in the process.](images/ANCOVA/wf_ANCOVA_omnibus.jpg)
*Order of variable entry* matters in ANCOVA. Thinking of the *controlling for* language associate with covariates, we want to remove the effect of the covariate before we run the one-way ANOVA. With this ANCOVA we are asking the question, "Does the condition (Friends or Little Mosque) contribute to more positive attitudes toward Arabs, when controlling for the pre-test score?"
In repeated measures projects, we expect there to be dependency in the data. That is, in most cases prior waves will have significant prediction on later waves. When ANCOVA uses a prior asssessment or wave as a covariate, that variable "claims" as much variance as possible and the subsequent variable can capture what is left over.
In the code below, we are predicting attitudes toward Arabs at post1 from the condition (Friends or Little Mosque), controlling for attitudes toward Arabs at baseline.
The *ges* column provides the effect size, $\eta^2$. Conventionally, values of .01, .06, and .14 are considered to be small, medium, and large effect sizes, respectively.
You may see different values (.02, .13, .26) offered as small, medium, and large -- these values are used when multiple regression is used. A useful summary of effect sizes, guide to interpreting their magnitudes, and common usage can be found [here](https://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/effectSize) [@watson_rules_2020].
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
MurrarB_ANCOVA <- Murrar_wide %>%
rstatix::anova_test(AttArabP1 ~ AttArabB + COND)
rstatix::get_anova_table(MurrarB_ANCOVA )
```
There was a non-significant effect of the baseline covariate on the post-test $(F[1, 190] = 0.665, p = .416, \eta^2 = 0.003)$. After controlling for the baseline attitudes toward Arabs, there was a statistically significant effect of condition on post-test attitudes toward Arabs, $F(1,190) = 26.361, p < .001, \eta^2 = 0.122$. This effect appears to be moderate-to-large in size.
### Post-hoc pairwise comparisons (controlling for the covariate)
Just like in one-way ANOVA, we follow-up the significant effect of condition. We'll use all-possible pairwise comparisons. In our case, we only have two levels of the categorical factor, so this run wouldn't be necessary. I included it to provide the code for doing so. If there were three or more variables, we would see all possible comparisons.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
pwc_B <- Murrar_wide %>%
rstatix::emmeans_test(
AttArabP1 ~ COND, covariate = AttArabB,
p.adjust.method = "none"
)
pwc_B
```
Not surprisingly (since this single pairwise comparison is redundant with the omnibus ANCOVA), results suggest a statistically significant difference between Friends and Little Mosque at Post1.
With the script below we can obtain the covariate-adjusted marginal means. These are termed *estimated marginal means.* Take a look at these and compare them to what we would see in the regular descriptives. It is helpful to see the grand mean (AttArabB) and then the marginal means (emmean).
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
emmeans_B <- rstatix::get_emmeans(pwc_B)
emmeans_B
```
Note that the *emmeans* process produces slightly different means than the raw means produced with the *psych* package's *describeBy()* function. Why? Because the *get_emmeans()* function uses the model that included the covariate. That is, the *estimated* means are covariate-adjusted.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
descripts_P1 <- psych::describeBy(AttArabP1 ~ COND, data = Murrar_wide, mat = TRUE)
descripts_P1
#Note. Recently my students and I have been having intermittent struggles with the describeBy function in the psych package. We have noticed that it is problematic when using .rds files and when using data directly imported from Qualtrics. If you are having similar difficulties, try uploading the .csv file and making the appropriate formatting changes.
```
$(M = 59.02, SD = 21.65)$
$(M = 73.92, SD = 18.51)$
In our case the adjustments are very minor. Why? The effect of the attitudes toward Arabs baseline test on the attitudes toward Arabs post test was nonsignificant. We can see this in the bivariate correlations, below.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
MurP1_Rmat <- psych::corr.test(Murrar_wide[c("AttArabB", "AttArabP1")])
MurP1_Rmat
```
The correlation between attitudes toward Arabs at baseline and post test are nearly negligible $(r = -0.05, p = .47)$.
### APA style results for Scenario 1
As we assemble the elements for an APA style result sections, a table with the means, adjusted means, and correlations may be helpful.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
apaTables::apa.cor.table(Murrar_wide[c("AttArabB", "AttArabP1")], table.number = 1 )
#You can save this as a Microsoft word document by adding this statement into the command: filename = "your_filename.doc"
```
Additionally, writing this output to excel files helped create the two tables that follow. The *MASS* package is useful to export the model objects into .csv files. They are easily opened in Excel where they can be manipulated into tables for presentations and manuscripts.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
MASS::write.matrix(pwc_B, sep = ",", file = "pwc_B.csv")
MASS::write.matrix(emmeans_B, sep = ",", file = "emmeans_B.csv")
MASS::write.matrix(descripts_P1, sep = ",", file = "descripts_P1.csv")
```
Ultimately, I would want a table that included this information. Please refer to the APA style manual for more proper formatting for a manuscript that requires APA style.
|Table 1
|:-----------------------------------------------|
|Unadjusted and Covariate-Adjusted Descriptive Statistics
|Condition |Unadjusted |Covariate-Adjusted
|:--------------|:-----------:|:----------------:|
| |*M* |*SD* |*EMM* |*SE*
|:--------------|:----:|:----:|:----:|:---:|
|Friends |59.02 |21.65 |59.01 |2.04 |
|Little Mosque |73.92 |18.51 |73.93 |2.07 |
Unlike the figure we created when we were testing assumptions, this script creates a plot from the model (which identifies AttArabB in its role as covariate). Thus, the relationship between condition and AttArabP1 controls for the effect of the AttArabB covariate.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
pwc_B <- pwc_B %>% rstatix::add_xy_position(x = "COND", fun = "mean_se")
ggpubr::ggline(rstatix::get_emmeans(pwc_B), x = "COND", y = "emmean", title = "Figure 1. Post-test Attitudes by Condition, Controlling for Pre-test Attitudes") +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2) +
ggpubr::stat_pvalue_manual(pwc_B, hide.ns = TRUE, tip.length = .02, y.position = c(80))
```
**Results**
>A one-way analysis of covariance (ANCOVA) was conducted. The independent variable, condition, had two levels: Friends, Little Mosque. The dependent variable was attitudes towards Arabs expressed by the participant at post-test and covariate was the baseline assessment of the same variable. Descriptive statistics are presented in Table 1. A preliminary analysis evaluating the homogeneity-of-slopes assumption indicated that the relationship between the covariate and the dependent variable differed significantly as a function of the independent variable, $F (1, 189) = 4.297, p = .040, \eta^2 = 0.022$. Regarding the assumption that the dependent variable is normally distributed in the population for any specific value of the covariate and for any one level of a factor, results of the Shapiro-Wilk test of normality on the model residuals was also significant,$W = 0.984, p = .026$. Only one datapoint (in the Little Mosque condition) had a standardized residual (-3.37) that exceeded an absolute value of 3.0. A non-significant Levene's test indicated no violation of the homogeneity of the residual variances for all groups, $F(1, 191) = 3.515 p = .062$.
>There was a non-significant effect of the baseline covariate on the post-test $(F[1, 190] = 0.665, p = .416, \eta^2 = 0.003)$. After controlling for the baseline attitudes toward Arabs, there was a statistically significant effect of condition on post-test attitudes toward Arabs, $F(1,190) = 26.361, p < .001, \eta^2 = 0.122$. This effect appears to be moderate-to-large. Given there were only two conditions, no further follow-up was required. As illustrated in Figure 1, results suggest that those in the Little Mosque condition $(M = 73.92, SD = 18.51)$ had more favorable attitudes toward Arabs than those in the Friends condition $(M = 59.02, SD = 21.65)$. Means and covariate-adjusted means are presented in Table 1b.
## Working the ANCOVA -- Scenario #2: Controlling for a confounding or covarying variable
In the scenario below, I am simulating a one-way ANCOVA, predicting attitudes toward Arabs at post1 as a function of sitcom condition (Friends, Little Mosque), controlling for the participants' attitudes toward Whites. That is, the ANCOVA will compare the the means of the two groups (at post1, only), adjusted for level of attitudes toward Whites
TO BE CLEAR: This is not the best way to analyze this data. With such a strong, balanced design, the multi-way, mixed design ANOVAs were an excellent choice that provided much fuller information than this demonstration, below. The purpose of this over-simplified demonstration is merely to give another example of using a variable as a *covariate* rather than a *moderator*.
### Preparing the data
When the covariate in ANCOVA is a potentially confounding variable, we need three variables:
* IV that has two or more levels; in our case it is the Friends and Littls Mosque sitcom conditions.
* DV that is continuous; in our case it attitudes toward Arabs at post1 (AttArabP1).
* Covariate that is continuous; in our case it attitudes toward Whites at post1 (AttWhiteP1). *Note* We could have also chosen attitudes toward Whites at baseline.
We can continue using the Murrar_wide df.
### Evaluating the statistical assumptions
There are a number of assumptions in ANCOVA. These include:
* random sampling
* independence in the scores representing the dependent variable
* linearity of the relationship between the covariate and DV within all levels of the independent variable
* homogeneity of the regression slopes
* a normally distributed DV for any specific value of the covariate and for any one level of a factor
* homogeneity of variance
These are depicted in the flowchart, below.
![Image of the ANCOVA workflow, showing our current place in the process](images/ANCOVA/wf_ANCOVA_assumptions.jpg)
#### Linearity assumption
ANCOVA assumes that there is linearity between the covariate and outcome variable at each level of the grouping variable. In our case this means that there is linearity between the attitudes toward Whites (covariate) and attitudes toward Arabs (outcome variable) at each level of the intervention (Friends, Little Mosque).
We can create a scatterplot (with regression lines) between the covariate (attitudes toward Whites) and the outcome (attitudes toward Arabs).
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
ggpubr::ggscatter (
Murrar_wide, x = "AttWhiteP1", y = "AttArabP1",
color = "COND", add = "reg.line"
)+
ggpubr::stat_regline_equation(
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~"), color = COND)
)
```
As we look at this scatterplot, we are trying to determine if there is an interaction effect (rather than a covarying effect). The linearity here, looks reasonable and not terribly "interacting" (to help us decide whether empathy should be a covariate or a moderator). More testing can help us make this distinction.
#### Homogeneity of regression slopes
This assumption requires that the slopes of the regression lines formed by the covariate and the outcome variable are the same for each group. The assumption evaluates that there is no interaction between the outcome and covariate. The plotted regression lines should be parallel.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
Murrar_wide %>% rstatix::anova_test(AttArabP1 ~ COND*AttWhiteP1)
```
Preliminary analysis supported ANCOVA as a statistical option in that there was no violation of the homogeneity of regression slopes as the interaction term was not statistically significant, $F (1, 189) = 1.886, p = .171, \eta^2 = 0.010$.
#### Normality of residuals
Assessing the normality of residuals means running the model, capturing the unexplained portion of the model (i.e., the *residuals*), and then seeing if they are normally distributed. Proper use of ANCOVA is predicated on normally distributed residuals.
We first compute the model with *lm()*. The *lm()* function is actually testing what we want to test. However, at this early stage, we are just doing a "quick run and interpretation" to see if we are within the assumptions of ANCOVA.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
#Create a linear regression model predicting DV from COV & IV
WhCov_mod <- lm (AttArabP1 ~ AttWhiteP1 + COND, data = Murrar_wide)
WhCov_mod
```
We can use the *augment(model)* function from the *broom* package to add fitted values and residuals.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
WhCov_mod.metrics <- broom::augment(WhCov_mod)
#shows the first three rows of the UEcon_model.metrics
head(WhCov_mod.metrics,3)
```
Now we assess the normality of residuals using the Shapiro Wilk test. The script below captures the ".resid" column from the model.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
rstatix::shapiro_test(WhCov_mod.metrics$.resid)
```
The statistically significant Shapiro Wilk test indicate a violation of the normality assumption ($W = 0.984, p = .029$). As I mentioned before, there are better ways to analyze this research vignette. None-the-less, we will continue with this demonstration so that you will have the procedural and conceptual framework for conducting ANCOVA.
#### Homogeneity of variances
ANCOVA presumes that the variance of the residuals is equal for all groups. We can check this with the Levene's test.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
WhCov_mod.metrics %>% rstatix::levene_test(.resid ~ COND)
```
Contributing more evidence that ANCOVA is not the best way to analyze this data, a statistically significant Levene's test indicates a violation of the homogeneity of the residual variances $(F[1, 191] = 4.539, p = .034)$.
#### Outliers
We can identify outliers by examining the standardized (or studentized) residual. This is the residual divided by its estimated standard error. Standardized residuals are interpreted as the number of standard errors away from the regression line.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
WhCov_mod.metrics %>%
filter(abs(.std.resid)>3)%>%
as.data.frame()
```
There is one outlier with a standardized residual with an absolute value greater than 3. At this point I am making a mental note of this. If this were "for real" I might more closely inspect these data. I would look at the whole response. If any response seems invalid (e.g., random, erratic, or extreme responding) I would delete it. If the response seem valid, I *could* truncate them to within 3 SEs. I could also ignore it. Kline [-@kline_data_2016] has a great section on some of these options.
#### Summarizing the results from the analysis of assumptions
>A one-way analysis of covariance (ANCOVA) was conducted. The independent variable, sitcom condition, had two levels: Friends, Little Mosque. The dependent variable was attitudes towards Arabs at pre-test. Preliminary anlayses which tested the assumptions of ANCOVA were mixed. Results suggesting that the relationship between the covariate and the dependent variable did not differ significantly as a function of the independent variable $(F [1, 189] = 1.886, p = .171, \eta^2 = 0.010)$ provided evidence that we did not violate the homogeneity-of-slopes assumption. In contrast, the Shapiro-Wilk test of normality on the model residuals was statistically significant $(W = 0.984, p = .029)$. This means that we likely violated the assumption that the dependent variable is normally distributed in the population for any specific value of the covariate and for any one level of a factor. Regarding outliers, one datapoint (-3.38) had a standardized residual that exceeded an absolute value of 3.0. Further, a statistically significant Levene's test indicated a violation of the homogeneity of the residual variances for all groups, $(F[1, 191] = 4.539, p = .034)$.
Because the intent of this analysis was to demonstrate how ANCOVA differs from mixed design ANOVA we proceeded with the analysis. Were this for "real research" we would have chosen a different analysis.
### Calculating the Omnibus ANOVA
We are ready to conduct the omnibus ANOVA.
![Image of the ANCOVA workflow, showing our current place in the process.](images/ANCOVA/wf_ANCOVA_omnibus.jpg)
*Order of variable entry* matters in ANCOVA. Thinking of the *controlling for* language associated with covariates, we firstly want to remove the effect of the covariate.
In the code below we are predicting attitudes toward Arabs at post1 from attitudes toward Whites at post1 (the covariate) and sitcom condition (Friends, Little Mosque).
The *ges* column provides the effect size, $\eta^2$ where a general rule-of-thumb for interpretation is .01 (small), .06 (medium), and .14 (large) [@lakens_calculating_2013].
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
WhCov_ANCOVA <- Murrar_wide %>%
rstatix::anova_test(AttArabP1 ~ AttWhiteP1 + COND)
rstatix::get_anova_table(WhCov_ANCOVA)
```
There was a non-significant effect of the attitudes toward Whites covariate on the attitudes toward Arabs at post-test, $F (1, 190) = 0.014, p = .907, \eta^2 < .001$. After controlling for attitudes toward Whites, there was a statistically significant effect in attitudes toward Arabs at post-test between the conditions, $F(1, 190) = 26.119, p < .001, \eta^2 = 0.121$. The effect size was moderate-to-large.
### Post-hoc pairwise comparisons (controlling for the covariate)
With only two levels of sitcom condition (Friends, Little Mosque), we do not need to conduct post-hoc pairwise comparisons. However, because many research designs involve three or more levels, I will use code that would evaluates them here.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
pwc_cond <- Murrar_wide %>%
rstatix::emmeans_test(
AttArabP1 ~ COND, covariate = AttWhiteP1,
p.adjust.method = "none"
)
pwc_cond
```
Results suggest a statistically significant post-test difference between the Friends and Little Mosque sitcom conditions.
With the script below we can obtain the covariate-adjusted marginal means. These are termed *estimated marginal means.*
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
emmeans_cond <- rstatix::get_emmeans(pwc_cond)
emmeans_cond
```
As before, these means are usually different (even if only ever-so-slightly) than the raw means you would obtain from the descriptives.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
descripts_cond <- psych::describeBy(AttArabP1 ~ COND, data = Murrar_wide, mat = TRUE)
descripts_cond
```
### APA style results for Scenario 2
Tables with the means, adjusted means, and pairwise comparison output may be helpful. The *apa.cor.table()* function in the *apaTables* package is helpful for providing means, standarddeviations, and correlations.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
apaTables::apa.cor.table(Murrar_wide[c("AttArabP1", "AttWhiteP1")], table.number = 2 )
#You can save this as a Microsoft word document by adding this statement into the command: filename = "your_filename.doc"
```
Writing this output to excel files helped create the two tables that follow.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
MASS::write.matrix(pwc_cond, sep = ",", file = "pwc_con.csv")
MASS::write.matrix(emmeans_cond, sep = ",", file = "emmeans_con.csv")
MASS::write.matrix(descripts_cond, sep = ",", file = "descripts_con.csv")
```
Ultimately, I would want a table that included this information. Please refer to the APA style manual for more proper formatting for a manuscript that requires APA style.
|Table 1b
|:-----------------------------------------------|
|Unadjusted and Covariate-Adjusted Descriptive Statistics
|Condition |Unadjusted |Covariate-Adjusted
|:--------------|:-----------:|:----------------:|
| |*M* |*SD* |*EMM* |*SE*
|:--------------|:----:|:----:|:----:|:---:|
|Friends |59.02 |21.65 |59.03 |2.04 |
|Little Mosque |73.92 |18.51 |73.92 |2.08 |
Unlike the figure we created when we were testing assumptions, this script creates a plot from the model (which identifies AttWhiteP1 in its role as covariate). Thus, the relationship between condition and AttArabP1 controls for the effect of the AttArabB covariate.
```{r tidy=TRUE, tidy.opts=list(width.cutoff=70)}
pwc_cond <- pwc_cond %>% rstatix::add_xy_position(x = "COND", fun = "mean_se")
ggpubr::ggline(rstatix::get_emmeans(pwc_B), x = "COND", y = "emmean", title = "Figure 1 Attitudes toward Arabs by Condition, Controlling for Attitudes toward Whites") +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2) +
ggpubr::stat_pvalue_manual(pwc_B, hide.ns = TRUE, tip.length = .02, y.position = c(80))
```
**Results**
>A one-way analysis of covariance (ANCOVA) was conducted. The independent variable, sitcom condition, had two levels: Friends, Little Mosque. The dependent variable was attitudes towards Arabs at pre-test. We controlled for attitudes toward Whites. Preliminary analyses which tested the assumptions of ANCOVA were mixed. Results suggesting that the relationship between the covariate and the dependent variable did not differ significantly as a function of the independent variable $(F [1, 189] = 1.886, p = .171, \eta^2 = 0.010)$ provided evidence that we did not violate the homogeneity-of-slopes assumption. In contrast, the Shapiro-Wilk test of normality on the model residuals was statistically significant $(W = 0.984, p = .029)$. This means that we likely violated the assumption that the dependent variable is normally distributed in the population for any specific value of the covariate and for any one level of a factor. Regarding outliers, one datapoint (-3.38) had a standardized residual that exceeded an absolute value of 3.0. Further, a statistically significant Levene's test indicated a violation of the homogeneity of the residual variances for all groups, $(F[1, 191] = 4.539, p = .034)$.
Because the intent of this analysis was to demonstrate how ANCOVA differs from mixed design ANOVA we proceeded with the analysis. Were this for "real research" we would have chosen a different analysis.
>There was a non-significant effect of the attitudes toward Whites covariate on the attitudes toward Arabs post-test, $F (1,190) = 0.014, p = .907, \eta^2 < .001$. After controlling for attitudes toward Whites, there was a statistically significant effect in attitudes toward Arabs at post-test between the conditions, $F(1. 190) = 26.119, p < .001, \eta^2 = 0.121$. The effect size was moderately large. Means and covariate-adjusted means are presented in Table 1b.
## More (and a recap) on covariates
Covariates, sometimes termed *controls* are often used to gain statistical control over variables that are difficult to control in a research design. That is, it may be impractical for polychotomize an otherwise continuous variable and/or it is impractical to have multiple factors and so a covariate is a more manageable approach. Common reasons for including covariates include [@bernerth_critical_2016]:
* they mathematically remove variance associated with nonfocal variables,
* the *purification principle* -- removing unwanted or confusing variance,
* they remove the *noise* in the analysis to clear up the clear up the relationship between IV and DVs.
Perhaps it is an oversimplification, but we can think of three categories of variables: moderators, covariates, and mediators. Through ANOVA and ANCOVA, we distinguish between moderator and covariate.
**Moderator**: a variable that changes the strength or direction of an effect between two variables X (predictor, independent variable) and Y (criterion, dependent variable).
**Covariate**: an observed, continuous variable, that (when used properly) has a relationship with the dependent variable. It is included in the analysis, as a predictor, so that the predictive relationship between the independent (IV) and dependent (DV) are adjusted.
Bernerth and Aguinis [-@bernerth_critical_2016] conducted a review of how and when control variables were used in nearly 600 articles published between 2003 and 2012. Concurrently with their analysis, they provided guidance for when to use control variables (covariates). The flowchart that accompanies their article is quite helpful. Control variables (covariates) should only be used when:
1. Theory suggests that the potential covariate(s) relate(s) to variable(s) in the currrent study.
2. There is empirical justification for including the covariate in the study.
3. The covariate can be measured reliably.
Want more? Instructions for calculating a two-way ANCOVA are here: https://www.datanovia.com/en/lessons/ancova-in-r/
## Practice Problems
The suggestions for homework differ in degree of complexity. I encourage you to start with a problem that feels "do-able" and then try at least one more problem that challenges you in some way. At a minimum your data should have three levels in the independent variable. At least one of the problems you work should have a statistically significant interaction effect that you work all the way through.
Regardless, your choices should meet you where you are (e.g., in terms of your self-efficacy for statistics, your learning goals, and competing life demands). Whichever you choose, you will focus on these larger steps in one-way ANCOVA, including:
* test the statistical assumptions
* conduct an ANCOVA
* if the predictor variable has more three or more levels, conduct follow-up testing
* present both means and coviarate-adjusted means
* write a results section to include a figure and tables
### Problem #1: Play around with this simulation.
Copy the script for the simulation and then change (at least) one thing in the simulation to see how it impacts the results.
* If ANCOVA is new to you, perhaps you just change the number in "set.seed(210813)" from 210813 to something else. Then rework Scenario#1, Scenario#2, or both. Your results should parallel those obtained in the lecture, making it easier for you to check your work as you go.
* If you are interested in power, change the sample size to something larger or smaller.
* If you are interested in variability (i.e., the homogeneity of variance assumption), perhaps you change the standard deviations in a way that violates the assumption.
### Problem #2: Conduct a one-way ANCOVA with the DV and covariate at post2.
The Murrar et al. [-@murrar_entertainment-education_2018]article has three waves: baseline, post1, post2. In this lesson, I focused on the post1 waves. Rerun this analysis using the post2 wave data.
### Problem #3: Try something entirely new.
Using data for which you have permission and access (e.g., IRB approved data you have collected or from your lab; data you simulate from a published article; data from an open science repository; data from other chapters in this OER), complete an ANCOVA.
### Grading Rubric
Regardless which option(s) you chose, use the elements in the grading rubric to guide you through the practice.
Using the lecture and workflow (chart) as a guide, please work through all the steps listed in the proposed assignment/grading rubric.
|Assignment Component | Points Possible | Points Earned|
|:-------------------------------------- |:----------------: |:------------:|
|1. Narrate the research vignette, describing the IV, DV, and COV. | 5 |_____ |
|2. Simulate (or import) and format data. | 5 |_____ |
|3. Evaluate statistical assumptions. | 5 |_____ |
|4. Conduct omnibus ANCOVA (w effect size).| 5 | _____ |
|5. If the IV has three or more levels, conduct follow-up tests.| 5 |_____ |
|6. Present means and covariate-adjusted means; interpret them.| 5 |_____ |
|7. APA style results with table(s) and figure.| 5 |_____ |
|8. Explanation to grader. | 5 |_____ |
|**Totals** | 35 |_____ |
```{r, child= 'Worked_Examples/15-10-woRked_ANCOVA.Rmd'}
```
```{r include=FALSE}
sessionInfo()
```