# Analysis of covariance (ANCOVA)

ANCOVA is used to compare one variable (**the dependent variable**) in two or more populations while considering other continuous variables.

These continuous varaibles, which are not part of the main experimental manipulation but have an influence on the dependent variable, are known as **covariates**.

## Loading Data

In [3]:
viagraData <- read.delim("data/ViagraCovariate.dat", stringsAsFactors = T, header = T)

viagraData$dose <- factor(viagraData$dose, levels = c(1:3), labels = c("Placebo","Low Dose", "High Dose"))
summary(viagraData)

viagraData

        dose        libido      partnerLibido  
 Placebo  : 9   Min.   :2.000   Min.   :0.000  
 Low Dose : 8   1st Qu.:3.000   1st Qu.:1.000  
 High Dose:13   Median :4.000   Median :2.500  
                Mean   :4.367   Mean   :2.733  
                3rd Qu.:5.750   3rd Qu.:4.000  
                Max.   :9.000   Max.   :7.000  

dose,libido,partnerLibido
<fct>,<int>,<int>
Placebo,3,4
Placebo,2,1
Placebo,5,5
Placebo,2,1
Placebo,2,2
Placebo,2,2
Placebo,7,7
Placebo,2,4
Placebo,4,5
Low Dose,7,5


## Checking homogeneity of variance

In [4]:
library(car)

leveneTest(viagraData$libido, viagraData$dose, center=median)

Loading required package: carData



Unnamed: 0_level_0,Df,F value,Pr(>F)
Unnamed: 0_level_1,<int>,<dbl>,<dbl>
group,2,0.3255637,0.7249156
,27,,


Levene's test is **not significant**, which means that for these data the variances are very similar.

## Checking the predictor variable and covariates are independent

ANCOVA assumes that the predictor variable (groups) and covariates are independent. 
We can test this by running an ANOVA with the covariate as the outcome.

In [7]:
summary( aov(partnerLibido ~ dose , data=viagraData) )


            Df Sum Sq Mean Sq F value Pr(>F)
dose         2  12.77   6.385   1.979  0.158
Residuals   27  87.10   3.226               

The F-ratio is **not significant**, so we can carry out the ANCOVA.

## Fitting an ANCOVA model

### Type I and Type III sum of squares

In [9]:
covariateFirst <- aov(libido ~ partnerLibido + dose, data=viagraData) 
summary(covariateFirst)

doseFirst <- aov(libido ~ dose + partnerLibido, data=viagraData) 
summary(doseFirst)

              Df Sum Sq Mean Sq F value Pr(>F)  
partnerLibido  1   6.73   6.734   2.215 0.1487  
dose           2  25.19  12.593   4.142 0.0274 *
Residuals     26  79.05   3.040                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

              Df Sum Sq Mean Sq F value Pr(>F)  
dose           2  16.84   8.422   2.770 0.0812 .
partnerLibido  1  15.08  15.076   4.959 0.0348 *
Residuals     26  79.05   3.040                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that the order in which we enter predicators into a model makes a difference to the effects in the overall ANOVA.

In model **libido ~ partnerLibido + dose**, **partnerLibido** is not significant and **dose** is significant.

In model **libido ~ dose + partnerLibido**, **dose** is not significant and **partnerLibido** is significant.

We can use Type III sums of squares (instead of Type I) by Anova() function to get consistent  results.

In [14]:
Anova(covariateFirst, type="III")

Anova(doseFirst, type="III")

Unnamed: 0_level_0,Sum Sq,Df,F value,Pr(>F)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>
(Intercept),12.94306,1,4.257204,0.04920158
partnerLibido,15.07575,1,4.958681,0.03483338
dose,25.18519,2,4.141929,0.02744654
Residuals,79.04712,26,,


Unnamed: 0_level_0,Sum Sq,Df,F value,Pr(>F)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>
(Intercept),12.94306,1,4.257204,0.04920158
dose,25.18519,2,4.141929,0.02744654
partnerLibido,15.07575,1,4.958681,0.03483338
Residuals,79.04712,26,,


Note that although the results are consistent, but they are **not correct** because Type III sum of squares require **orthogonal contrasts** (The default dummy coding is nonorthogonal) .

### ANCOVA and Type I sum of squares

If we want Type I sum of squares, we should enter the covariates first, then the independent variables in ANCOVA.
So we should use **libido ~ partnerLibido + dose**.

In [15]:
viagraModel <- aov(libido ~ partnerLibido + dose, data=viagraData) 

summary(doseFirst)
summary.lm(doseFirst)

              Df Sum Sq Mean Sq F value Pr(>F)  
dose           2  16.84   8.422   2.770 0.0812 .
partnerLibido  1  15.08  15.076   4.959 0.0348 *
Residuals     26  79.05   3.040                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Call:
aov(formula = libido ~ dose + partnerLibido, data = viagraData)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2622 -0.7899 -0.3230  0.8811  4.5699 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)     1.7892     0.8671   2.063   0.0492 *
doseLow Dose    1.7857     0.8494   2.102   0.0454 *
doseHigh Dose   2.2249     0.8028   2.771   0.0102 *
partnerLibido   0.4160     0.1868   2.227   0.0348 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.744 on 26 degrees of freedom
Multiple R-squared:  0.2876,	Adjusted R-squared:  0.2055 
F-statistic:   3.5 on 3 and 26 DF,  p-value: 0.02954


### ANCOVA and Type III sum of squares

We must use orthogonal contrast to get correct results of Type III sum of squares.

In [22]:
contrasts(viagraData$dose)

Unnamed: 0,Low Dose,High Dose
Placebo,0,0
Low Dose,1,0
High Dose,0,1


In [25]:
contrasts(viagraData$dose) <- cbind(c(-2,1,1),c(0,-1,1))

contrasts(viagraData$dose)

0,1,2
Placebo,-2,0
Low Dose,1,-1
High Dose,1,1


In [27]:
viagraModel <- aov(libido ~ partnerLibido + dose, data=viagraData) 

Anova(viagraModel, type ="III")

Unnamed: 0_level_0,Sum Sq,Df,F value,Pr(>F)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>
(Intercept),76.06904,1,25.020457,3.342399e-05
partnerLibido,15.07575,1,4.958681,0.03483338
dose,25.18519,2,4.141929,0.02744654
Residuals,79.04712,26,,


In [28]:
summary.lm(viagraModel)


Call:
aov(formula = libido ~ partnerLibido + dose, data = viagraData)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2622 -0.7899 -0.3230  0.8811  4.5699 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.1260     0.6250   5.002 3.34e-05 ***
partnerLibido   0.4160     0.1868   2.227  0.03483 *  
dose1           0.6684     0.2400   2.785  0.00985 ** 
dose2           0.2196     0.4056   0.541  0.59284    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.744 on 26 degrees of freedom
Multiple R-squared:  0.2876,	Adjusted R-squared:  0.2055 
F-statistic:   3.5 on 3 and 26 DF,  p-value: 0.02954


## Testing for homogeneity of regression slopes

ANCOVA sassumes that the relationship bwtween the covariate and outcome variable (in this case **partnerLibido** and **dose**) should be similar at different levels of the predictor variable (in this case three **dose** groups).

In [31]:
hoRS <- aov(libido ~ partnerLibido * dose, data = viagraData)

Anova(hoRS, type="III")

Unnamed: 0_level_0,Sum Sq,Df,F value,Pr(>F)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>
(Intercept),53.54187,1,21.920735,9.32259e-05
partnerLibido,17.18222,1,7.034625,0.0139474621
dose,36.55756,2,7.483569,0.0029795645
partnerLibido:dose,20.42659,2,4.181456,0.0276671129
Residuals,58.62052,24,,


The effect of interfaction of **partnerLibido** and **dose** (**partnerLibido:dose**) is significant, so the assumption can't hold.