## Multiple Comparisons:


When you preform a hypothesis test, or find a confidence interval, you often try to control the chance of an error. In the testing situation, the Type I error is the chance you reject the null hypothesis when the null hypothesis is true, that is, it is the chance of a false positive. One often wants that chance to be no larger than a specified level $\alpha$, e.g., $.05$. With confidence intervals, you wish that the chance the parameter of interest is covered by the interval to be at least $1-\alpha$.


Suppose there are several tests or confidence intervals begin considered simultaneously. For example, in the leprosy example, one may want confidence intervals for $\gamma_1 = (\alpha_1 + \alpha_2)/2 − \alpha_3$ and $\gamma_2 = \alpha_1 − \alpha_2$. If each confidence interval has a $0.95$ chance of covering its true value, what is the chance that both cover their true values simultaneously? It is somewhere between $0.90$ and $0.95$. Thus we can at best guarantee that there is a $0.90$ chance both are correct. The difference between $0.90$ and $0.95$ may not be a big deal, but consider more than two intervals, say 5, or 10, or 20. The chance that ten $0.95$ intervals are all correct is bounded from below by $0.50$: For twenty, the bound is $0$!

$A_i$ is true iff $$ \gamma_i \in (\gamma_i \pm t_\nu,\alpha_i/2se(\gamma_i)); P(A_i) = 1 - \alpha_i $$


Then if $1 − \alpha_0$ is the desired chance all intervals cover their parameters, we need that

$$
P(A_1 \cap A_2 \cap \dots \cap A_J) \geq 1-\alpha_0
$$

Looking at complements instead, we have that $P(\bar{A_i})=\alpha_i$ and wish to have



$$
P(\bar{A_1} \cup \bar{A_2} \cup \dots \cup \bar{A_J}) \leq \alpha_0
$$

(why?)


The probability of a union is less than or equal to the sum of probabilities, hence

$$
P(\bar{A_1} \cup \bar{A_2} \cup \dots \cup \bar{A_J})\leq \sum_{i=1}^{J}\alpha_i
$$

So how can we choose the $\alpha_i$ so that $\sum_{i}\alpha_i \leq \alpha_0$ ?


## Question 1. How do we choose individual type 1 error rates such that we get an experimentwise error rate of $\alpha_0$?


The big advantage of the Bonferroni approach is that there are no special assumptions on the relationships between the intervals. They can be any type (t, z, etc.); they can be from the same experiment or different experiments; they can be on any combination of parameters. They main drawback is the conservativeness. That is, because there is a $\geq$ in the equation, one is likely to be understating the overall coverage probability, maybe by a great deal if $J$ is large. More accurate bounds in special cases lead to smaller intervals (which is good), without violating the coverage probability bound. The next two sections deal with such cases.


Fisher's Least Significant Difference.


1. Perform an analysis of variance to test $H_0: \mu_1 = \mu_2 = \dots = \mu_t$ against the alternative hypothesis that at least one of the means differs from the rest.
2. If there is insufficient evidence to reject $H_0$ using $F=MSB/MSW$ proceed no further.
3. If H0 is rejected, define the least significant difference (LSD) to be the observed difference between two sample means necessary to declare the corresponding population means different.

4. For a specified value of $\alpha$, the least significant difference for comparing two means is

￼￼￼$$
LSD_{ij} = t_{\alpha/2}\sqrt{S^2_W(1/n_i + 1/n_j)}
$$

￼5. Then compare all pairs of sample means. If $|\bar{y_i}-\bar{y_j}| \geq LSD_{ij}$ then
the corresponding population means $\mu_i$ and $\mu_j$ are considered different.

6.For each pairwise comparison of population means, the probability of a Type I error is fixed at a specified value of $\alpha=0.05$.

Researchers conducted an experiment to compare the effectiveness of four new weight-reducing agents to that of an existing agent. The researchers randomly divided a random sample of 50 males into five equal groups, with preparation A1 assigned to the first group, A2 to the second group, and so on. They then gave a prestudy physical to each person in the experiment and told him how many pounds overweight he was. A comparison of the mean number of pounds overweight for the groups showed no significant differences. The researchers then began the study, and each group took the prescribed preparation for a fixed period of time. The weight losses recorded at the end of the study period are given in the following data.frame
The standard agent is labeled agent S, and the four new agents are labeled A1, A2, A3, and A4. 

In [31]:
data <- scan(text='
12.4 10.7 11.9 11.0 12.4 12.3 13.0 
9.1 11.5 11.3 9.7 13.2 12.5 10.7 
8.5 11.6 10.2 10.9 9.0 10.6 11.3
12.7 13.2 11.8 11.9 12.2 9.6 9.9
8.7 9.3 8.2 8.3 9.0 11.3 11.2')

data <- matrix(data, ncol=7,nrow=5, byrow=T)
row.names(data)<-c("a1", "a2", "a3", "a4", "s")
data

0,1,2,3,4,5,6,7
a1,12.4,10.7,11.9,11.0,12.4,12.3,13.0
a2,9.1,11.5,11.3,9.7,13.2,12.5,10.7
a3,8.5,11.6,10.2,10.9,9.0,10.6,11.3
a4,12.7,13.2,11.8,11.9,12.2,9.6,9.9
s,8.7,9.3,8.2,8.3,9.0,11.3,11.2


In [32]:
data <- t(data)
data<-data.frame(data)
library(reshape2)
library(plyr)

data_m<-melt(data)
colnames(data_m)<-c("agent","weight")
data_m$agent <- factor(data_m$agent)
ms <- ddply(data_m, .(agent), summarize, mean= mean(weight))

No id variables; using all as measure variables


In [33]:
ms


Unnamed: 0,agent,mean
1,a1,11.95714
2,a2,11.14286
3,a3,10.3
4,a4,11.61429
5,s,9.428571


In [34]:
Sw2 <- 1.24

#### Q2. Using the data in the table and $S_{w}^2$ provided, perform a Fisher's Least Significant Difference Procedure at $\alpha=0.05$. Report the agent means that are significantly different from each other.
(You can also use LSD.test in the agricolae package).

#### Tukey's W Procedure

1. Rank the t sample means.
2. Two population means mi and mj are declared different if $|\bar{y_i}-\bar{y_j}| \geq W$ where $W=q_{\alpha}(t,v)\sqrt{s_w^2/n}$ $s_w^2$ is the mean square within samples based on $v$ degrees of freedom, $q_{\alpha}(t, v)$ is the upper-tail critical value of the Studentized range for comparing $t$ different populations, and $n$ is the number of observations in each sample. 
3. The error rate that is controlled is an experimentwise error rate. Thus, the probability of observing an experiment with one or more pairwise comparisons falsely declared to be significant is specified at $\alpha_0$

A limitation of Tukey’s procedure is the requirement that all the sample means are based on the same number of data values. Tukey (1953) and Kramer (1956) independently proposed an approximate procedure in the case of unequal sample sizes. 

#### Q3. Using the weight loss data, perform a Tukey's W Procedure on all pairwise comparisons between the agents. 

(You may use the TukeyHSD function)

In [35]:
install.packages(c("agricolae", "DescTools", "multcomp"), repos='http://cran.us.r-project.org')


The downloaded source packages are in
	‘/private/var/folders/44/z41l8sf111x6k2bjrjbdjhl80000gn/T/RtmpimS48t/downloaded_packages’


Updating HTML index of packages in '.Library'
Making 'packages.html' ... done


In [36]:
modl <- lm(weight ~ agent, data=data_m)
### TukeyHSD(aov(modl))

#### Q4. Report the means that are significantly different.

#### Dunnett's Procedure.

In experiments in which a control is included, the researchers would want to determine whether the mean responses from the active treatments differ from the mean for the control. Dunnett (1955) developed a procedure for comparisons to a control that controls the experimentwise Type I error rate.

1. For a specified value of $\alpha_E$ Dunnett’s D value for comparing $\mu_i$ to $\mu_c$, the control mean is
$D = d_{\alpha}(k,v)\sqrt{2s^2_w/n}$ where $n$ is the common sample size for the treatment groups (including the control) $k =t-1$, the number of noncontrol treatments; $\alpha$ is the desired experimentwise error rate; $s_w^2$ is the mean square within; $v$ is the degrees of freedom associated with $s_w^2$ ; and $d_{\alpha}(k,v)$ is the critical Dunnett value (Table 11 of the Appendix).

2. For the two-sided alternative $H_a: \mu_i\neq\mu_c$, we declare $\mu_i$ different from $\mu_c$ if 
$|\bar{y_i} - \bar{y_c}| \geq D$ where the value of $d_{\alpha}(k,v)$ is the two-sided value in Table 11 in the
Appendix.

3. For the one-sided alternative $H_a: \mu_i>\mu_c$, we declare $\mu_i$ different from $\mu_c$ if 
$(\bar{y_i} - \bar{y_c}) \geq D$ where the value of $d_{\alpha}(k,v)$ is the two-sided value in Table 11 in the
Appendix.

4. For the one-sided alternative $H_a: \mu_i < \mu_c$, we declare $\mu_i$ different from $\mu_c$ if 
$(\bar{y_i}-\bar{y_c}) \leq -D$ where the value of $d_{\alpha}(k,v)$ is the two-sided value in Table 11 in the
Appendix.

5. The Type I error rate that is controlled is the experimentwise error.


#### Q5. Treat agent $s$ as a control. Perform dunnett's procedure at $\alpha=0.05$. 

In [37]:
### Try this code!
library(multcomp)

modl <- aov(weight ~ agent, data=data_m)
### summary(glht(modl, linfct=mcp(agent="Dunnett")))


#### Scheffe's S Procedure

A more general procedure, proposed by Scheffé (1953), can be used to make all possible comparisons among the $t$ population means. Although Scheffé’s procedure can be applied to pairwise comparisons among the $t$ population means, it is more conservative 


1. Consider any linear comparison among the t population means of the form
$a_1\mu_1 + \dots + a_t\mu_t$. 
2. We wish to test the null hypothesis
$H_0: l=0$ vs. $H_a: l\neq 0$.
3. The test statistic is $\hat{l} = a_1\bar{y_1} + \dots + a_t\bar{y_t}$
4. $S = \sqrt{\hat{V}(\hat{l})(t-1)F_{\alpha, df_1, df_2}}$
5. $\hat{V}(\hat{l}) = s_w^2 \sum_{i=1}^{t} \frac{a_i^2}{n_i}$
6. Reject if $|\hat{l}| > S$ 
7. The error rate that is controlled is an experimentwise error rate. If weconsider all imaginable contrasts, the probability of observing an experiment with one or more contrasts falsely declared to be significant is designated by $\alpha$

#### Q6. Use Scheffe's S Procedure to test the following set of linear contrasts

In [38]:
contr <- matrix(scan(
text="4 -1 -1 -1 -1
      0 1 1 -1 -1
      0 1 -1 0 0
      0 0 0 1 -1"), ncol=5, nrow=4, byrow=T)

colnames(contr) <- c("a1", "a2", "a3", "a4", "s")
contr

a1,a2,a3,a4,s
4,-1,-1,-1,-1
0,1,1,-1,-1
0,1,-1,0,0
0,0,0,1,-1
