# One-Way ANOVA

In statistics, one-way analysis of variance (abbreviated **one-way ANOVA**) is a technique that can be used to compare means of two or more samples (using the F distribution). 
This technique can be used only for numerical response data, the "Y", usually one variable, and numerical or (usually) categorical input data, the "X", always one variable, hence "one-way".

## Entering Data

In [1]:
libido <- c(3,2,1,1,4,5,2,4,2,3,7,4,5,3,6)
dose <- gl(3,5, labels = c("Placebo","Low Dose","High Dose"))
viagraData<-data.frame(dose, libido)

viagraData
summary(viagraData)

dose,libido
<fct>,<dbl>
Placebo,3
Placebo,2
Placebo,1
Placebo,1
Placebo,4
Low Dose,5
Low Dose,2
Low Dose,4
Low Dose,2
Low Dose,3


        dose       libido     
 Placebo  :5   Min.   :1.000  
 Low Dose :5   1st Qu.:2.000  
 High Dose:5   Median :3.000  
               Mean   :3.467  
               3rd Qu.:4.500  
               Max.   :7.000  

## Exploring the data

In [2]:
library(pastecs)

by(viagraData$libido, viagraData$dose, stat.desc)

viagraData$dose: Placebo
     nbr.val     nbr.null       nbr.na          min          max        range 
   5.0000000    0.0000000    0.0000000    1.0000000    4.0000000    3.0000000 
         sum       median         mean      SE.mean CI.mean.0.95          var 
  11.0000000    2.0000000    2.2000000    0.5830952    1.6189318    1.7000000 
     std.dev     coef.var 
   1.3038405    0.5926548 
------------------------------------------------------------ 
viagraData$dose: Low Dose
     nbr.val     nbr.null       nbr.na          min          max        range 
   5.0000000    0.0000000    0.0000000    2.0000000    5.0000000    3.0000000 
         sum       median         mean      SE.mean CI.mean.0.95          var 
  16.0000000    3.0000000    3.2000000    0.5830952    1.6189318    1.7000000 
     std.dev     coef.var 
   1.3038405    0.4074502 
------------------------------------------------------------ 
viagraData$dose: High Dose
     nbr.val     nbr.null       nbr.na          min       

## Checking homogeneity of variance

ANOVA assumes that **the variances of the groups are equal**. We use **Levene's test** to test this assumption.

In [3]:
library(car)

leveneTest(viagraData$libido, viagraData$dose, center = median)

Loading required package: carData



Unnamed: 0_level_0,Df,F value,Pr(>F)
Unnamed: 0_level_1,<int>,<dbl>,<dbl>
group,2,0.1176471,0.8900225
,12,,


The p-value is 0.89 > 0.0.5, which shows that Levene's test is **not significant**. 
This means that for these data the variances between groups are similar, we can do the ANOVA.

## Main Analysis

ANOVA uses **F-test (F-statistic)** to accept or reject the null hypothesis that means of different groups are the same.

### Analysis using lm()

In [4]:
model1<- lm(libido~dose, data=viagraData)

summary(model1)


Call:
lm(formula = libido ~ dose, data = viagraData)

Residuals:
   Min     1Q Median     3Q    Max 
  -2.0   -1.2   -0.2    0.9    2.0 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)     2.2000     0.6272   3.508  0.00432 **
doseLow Dose    1.0000     0.8869   1.127  0.28158   
doseHigh Dose   2.8000     0.8869   3.157  0.00827 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.402 on 12 degrees of freedom
Multiple R-squared:  0.4604,	Adjusted R-squared:  0.3704 
F-statistic: 5.119 on 2 and 12 DF,  p-value: 0.02469


### Analysis using aov()

In [5]:
model2 <- aov(libido ~ dose , data = viagraData)

summary(model2)

            Df Sum Sq Mean Sq F value Pr(>F)  
dose         2  20.13  10.067   5.119 0.0247 *
Residuals   12  23.60   1.967                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see that results (F-statistics) of the two methods are the same, p-value is 0.0247 < 0.05.
So we can **reject the null hypothesis**, and there are significant difference among means of different groups.

## Planned Contrasts

We know from the F-ratio that one or more of the differences between means are statistically significant.
It's necessary to find out which groups differs. We can use **planned contrasts** ( **planned comparison**) to do this.

We use **weights** to **dummy variables** as the following table to carry out the contrast:

| Group | Dummy variable 1 <br />($contrast_1$) | Dummy variable 2 <br />($contrast_2$) |  Product <br />($contrast_1*contrast_2$) |
| :- | :-: | :-: | :-: |
| Placebo | -2 | 0 | 0 |
| Low dose | 1 | -1 | -1 |
| High dose | 1 | 1 | 1 |
| Total | 0 | 0 | 0 |


Note that values in the last column (**Product of all dummy variables**) sum to zero. This makes the contrast *independent* and **orthogonal**.

First we contrast Group "Low dose" "High dose" with "Placebo" (See Column "Dummy variable 1", note that all 3 values sum to zero.

Then we contrast "Low dose" with "High dose" (Note that the weight of "Placebo" in contrast2 is 0, so it's not contrasted).

### Set the weights

In [14]:
contrast1 <- c(-2,1,1)
contrast2 <- c(0,-1,1)

contrasts(viagraData$dose) <- cbind(contrast1,contrast2)
contrasts(viagraData$dose)

Unnamed: 0,contrast1,contrast2
Placebo,-2,0
Low Dose,1,-1
High Dose,1,1


### Conduct the contrast

In [16]:
model1<- lm(libido~dose, data=viagraData)

summary(model1)


Call:
lm(formula = libido ~ dose, data = viagraData)

Residuals:
   Min     1Q Median     3Q    Max 
  -2.0   -1.2   -0.2    0.9    2.0 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     3.4667     0.3621   9.574 5.72e-07 ***
dosecontrast1   0.6333     0.2560   2.474   0.0293 *  
dosecontrast2   0.9000     0.4435   2.029   0.0652 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.402 on 12 degrees of freedom
Multiple R-squared:  0.4604,	Adjusted R-squared:  0.3704 
F-statistic: 5.119 on 2 and 12 DF,  p-value: 0.02469


We can see that **contrast 1 is significant**, while contrast 2 is not significant.

## Trend analysis

By using dummy coding, we can conduct **polynomial contrast**, which tests for trends in the data.

Note: The categories to contrast must be **sortable**.

In [18]:
contrasts(viagraData$dose) <- contr.poly(3)

contrasts(viagraData$dose)

.L,.Q
-0.7071068,0.4082483
-7.850462000000001e-17,-0.8164966
0.7071068,0.4082483


In [19]:
model1<- lm(libido~dose, data=viagraData)

summary(model1)


Call:
lm(formula = libido ~ dose, data = viagraData)

Residuals:
   Min     1Q Median     3Q    Max 
  -2.0   -1.2   -0.2    0.9    2.0 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.4667     0.3621   9.574 5.72e-07 ***
dose.L        1.9799     0.6272   3.157  0.00827 ** 
dose.Q        0.3266     0.6272   0.521  0.61201    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.402 on 12 degrees of freedom
Multiple R-squared:  0.4604,	Adjusted R-squared:  0.3704 
F-statistic: 5.119 on 2 and 12 DF,  p-value: 0.02469


Dummy variable "L" tests a **linear trend**, which is significant. 

Dummy variable "Q" tests a **quadratic trend**, which is not significant. 