Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimensionality of Scheffe Method #171

Closed
zkuizx opened this issue Feb 6, 2020 · 15 comments
Closed

Dimensionality of Scheffe Method #171

zkuizx opened this issue Feb 6, 2020 · 15 comments

Comments

@zkuizx
Copy link

zkuizx commented Feb 6, 2020

I am using a one-way anova analysis with six levels. I have three contrasts of interest. When I use Scheffe method to get the confidence interval of them, the dimensionality from the output is 2 instead of 5. This only happens when I use the newest version of emmeans. When I use an old version of emmeans, the dimensionality is still 5. As my understanding, the dimensionality from Scheffe method should be always 5 = 6 - 1 if I have 6 levels and only focus on the contrasts. Is this a bug?

The attached are part of the R codes and output:

my.con = list(tc1.2 = c(0.5, -0.5, 0, 0.5, -0.5, 0), tc1.3 = c(0.5, 0, -0.5, 0.5, 0, -0.5) ,tc2.3 = c(0, 0.5, -0.5, 0, 0.5, -0.5))

summary(contrast(weight.lm, method = my.con, adjust = "scheffe"), level = 1 - alpha, side = "two-sided", infer = c(T,T))

contrast estimate SE df lower.CL upper.CL t.ratio p.value
tc1.2 -5.64 0.445 60 -6.72 -4.55 -12.656 <.0001
tc1.3 -8.00 0.445 60 -9.08 -6.92 -17.963 <.0001
tc2.3 -2.36 0.445 60 -3.45 -1.28 -5.307 <.0001

Confidence level used: 0.94
Conf-level adjustment: scheffe method with dimensionality 2
P value adjustment: scheffe method with dimensionality 2

@rvlenth
Copy link
Owner

rvlenth commented Feb 6, 2020 via email

@zkuizx
Copy link
Author

zkuizx commented Feb 6, 2020

Hi, Russ,

Thank you very much! Your explanation confirmed my initial thought. I know my three contrasts listed above can be written as functions of two contrasts. Actually, I tried several other combinations of contrasts and found the change of dimension.

I agree with you that if there are only three pre-planned contrasts, then the adjustment with 5 dimension here is way too conservative. But if I do some "data snooping" and find that these three contrasts, I think we should still use the dimension of 5 and your suggestion provides me a perfect solution.

Again, thank you very much for the help!

@rvlenth
Copy link
Owner

rvlenth commented Feb 8, 2020

I think your concerns are important enough to provide for manually specifying the rank to use in the Scheffe adjustment. After all, the purpose is data-snooping, and it seems it would not be uncommon to not specify all the contrasts we want in one call. So I have incorporated the possibility of a "hidden" argument scheffe.rank that may be included among ....

> library(emmeans)
> fm1 = aov(count ~ spray, InsectSprays)
> emm1 = emmeans(fm1, "spray")
> con1 = emmeans:::consec.emmc(LETTERS[1:6])[c(1,2,5)]

> summary(contrast(emm1, con1), adjust = "scheffe")
 contrast estimate  SE df t.ratio p.value
 B - A       0.833 1.6 66  0.520  0.9651 
 C - B     -13.250 1.6 66 -8.276  <.0001 
 F - E      13.167 1.6 66  8.223  <.0001 

P value adjustment: scheffe method with rank 3 

> summary(contrast(emm1, con1), adjust = "scheffe", scheffe.rank = 5)
 contrast estimate  SE df t.ratio p.value
 B - A       0.833 1.6 66  0.520  0.9981 
 C - B     -13.250 1.6 66 -8.276  <.0001 
 F - E      13.167 1.6 66  8.223  <.0001 

P value adjustment: scheffe method with rank 5

Note that I changed the annotation form "dimensionality" to "rank". It was just less awkward to use "rank" in the revised documentation, and so the annotation is changed to make it consistent.

The updated version will be pushed to github soon.

@zkuizx
Copy link
Author

zkuizx commented Feb 8, 2020

Hi, Russell,

Thank you very much! This is fantastic. Such implementation of manual specification of rank also provides a solution for another potential problem. When I used the contrast (1, 1, 1, -1, -1, -1) / 3 and several other contrasts, the rank of 6 instead of 5 was used. I think due to the numerical error, sometimes the system may not treat (1, 1, 1, -1, -1, -1) / 3 as an contrast. Your implementation actually provides a solution for this: I can manually set the rank to 5.

Again, thank you very much for the help! Have a nice weekend!

@rvlenth
Copy link
Owner

rvlenth commented Feb 9, 2020

I think if you got a dimensionality of 6, you must have had a linear function somewhere in there that is not a contrast (coefficients do not sum to zero). I suppose it is possible that division by 3 creates a numerical inexactness, but I seriously doubt that that would exceed the tolerances for rank determination used by qr().

You can check this manually: Make a matrix of the contrasts you used, and run qr((). Then look at the $rank slot of the result. You can also calculate the sum of each set of coefficients.

> mat = cbind(as.matrix(emmeans:::pairwise.emmc(1:6)), c(1, 1, 1, -1, -1, -1) / 3)
> qr(mat)$rank
[1] 5

@zkuizx
Copy link
Author

zkuizx commented Feb 9, 2020

Hi, Russel,

I am quite surprised too. Here is what I got from R. You can see that only contrasts are in there. The rank from qr is 5 but the dimensionality is 6. By the way, how did you embed your R code in your message? They look much nicer than mine.

> my.con <- list(a = c(1, 1, 1, -1, -1, -1) / 3, 
    b12 = c(1, 0, 0, -1, 0, 0), 
    b13 = c(0, 1, 0, 0, -1, 0), 
    b23 = c(0, 0, 1, 0, 0, -1))
> my.con.mix <- c(my.con, emmeans:::consec.emmc(1:6))
> summary(contrast(rec.lsm, method = my.con.mix, adjust = "scheffe"), infer = c(T,T))
 contrast estimate      SE df lower.CL upper.CL t.ratio p.value
 a        -0.07233 0.00802 12 -0.10633 -0.03834 -9.021  0.0001 
 b12      -0.08333 0.01389 12 -0.14222 -0.02445 -6.000  0.0042 
 b13      -0.08067 0.01389 12 -0.13955 -0.02178 -5.808  0.0055 
 b23      -0.05300 0.01389 12 -0.11189  0.00589 -3.816  0.0901 
 2 - 1    -0.00633 0.01389 12 -0.06522  0.05255 -0.456  0.9997 
 3 - 2     0.03333 0.01389 12 -0.02555  0.09222  2.400  0.4905 
 4 - 3     0.05633 0.01389 12 -0.00255  0.11522  4.056  0.0647 
 5 - 4    -0.00900 0.01389 12 -0.06789  0.04989 -0.648  0.9981 
 6 - 5     0.00567 0.01389 12 -0.05322  0.06455  0.408  0.9999 

Confidence level used: 0.95 
Conf-level adjustment: scheffe method with dimensionality 6 
P value adjustment: scheffe method with dimensionality 6 
> 
> aa <- cbind(my.con[[1]], my.con[[2]], my.con[[3]], my.con[[4]], as.matrix(emmeans:::consec.emmc(1:6)))
> aa
                      2 - 1 3 - 2 4 - 3 5 - 4 6 - 5
1  0.3333333  1  0  0    -1     0     0     0     0
2  0.3333333  0  1  0     1    -1     0     0     0
3  0.3333333  0  0  1     0     1    -1     0     0
4 -0.3333333 -1  0  0     0     0     1    -1     0
5 -0.3333333  0 -1  0     0     0     0     1    -1
6 -0.3333333  0  0 -1     0     0     0     0     1
> qr(aa)$rank
[1] 5

@rvlenth
Copy link
Owner

rvlenth commented Feb 9, 2020

Hmmm. OK, I have verified this. The rank determination is actually based on the linfct slot. In debug mode (on a different example with 6 means), I found:

Browse[2]> object@linfct
       (Intercept)     sprayB     sprayC     sprayD     sprayE     sprayF
 [1,] 1.110223e-16  0.3333333  0.3333333 -0.3333333 -0.3333333 -0.3333333
 [2,] 0.000000e+00  0.0000000  0.0000000 -1.0000000  0.0000000  0.0000000
 [3,] 0.000000e+00  1.0000000  0.0000000  0.0000000 -1.0000000  0.0000000
 [4,] 0.000000e+00  0.0000000  1.0000000  0.0000000  0.0000000 -1.0000000
 [5,] 0.000000e+00  1.0000000  0.0000000  0.0000000  0.0000000  0.0000000
 [6,] 0.000000e+00 -1.0000000  1.0000000  0.0000000  0.0000000  0.0000000
 [7,] 0.000000e+00  0.0000000 -1.0000000  1.0000000  0.0000000  0.0000000
 [8,] 0.000000e+00  0.0000000  0.0000000 -1.0000000  1.0000000  0.0000000
 [9,] 0.000000e+00  0.0000000  0.0000000  0.0000000 -1.0000000  1.0000000
Browse[2]> qr(object@linfct)$rank
[1] 6

so apparently, even a minuscule value in the intercept column rules the day. If I add zapsmall() to it,

Browse[2]> zapsmall(object@linfct)
      (Intercept)     sprayB     sprayC     sprayD     sprayE     sprayF
 [1,]           0  0.3333333  0.3333333 -0.3333333 -0.3333333 -0.3333333
 [2,]           0  0.0000000  0.0000000 -1.0000000  0.0000000  0.0000000
 [3,]           0  1.0000000  0.0000000  0.0000000 -1.0000000  0.0000000
 [4,]           0  0.0000000  1.0000000  0.0000000  0.0000000 -1.0000000
 [5,]           0  1.0000000  0.0000000  0.0000000  0.0000000  0.0000000
 [6,]           0 -1.0000000  1.0000000  0.0000000  0.0000000  0.0000000
 [7,]           0  0.0000000 -1.0000000  1.0000000  0.0000000  0.0000000
 [8,]           0  0.0000000  0.0000000 -1.0000000  1.0000000  0.0000000
 [9,]           0  0.0000000  0.0000000  0.0000000 -1.0000000  1.0000000
Browse[2]> qr(zapsmall(object@linfct))$rank
[1] 5

so I modified the code accordingly. Not sure when I'll push that up, but that should at least help.

Thanks again

@rvlenth
Copy link
Owner

rvlenth commented Feb 9, 2020

Highlight the code and click on <> in the toolbar. OR manually type a line with 3 back-ticks before and after the code block (```r to get syntax highlighting)

@zkuizx
Copy link
Author

zkuizx commented Feb 9, 2020

Hi, Russel,

Thank you very much for the quick response! This is really helpful! I will definitely try the tricks you mentioned to me in future.

Best wishes,

@rvlenth
Copy link
Owner

rvlenth commented Feb 10, 2020

I think this is resolved, so am closing.

@rvlenth rvlenth closed this as completed Feb 10, 2020
@meierluk
Copy link

Could it be the case the scheffe.rank is not working when there is only one contrast?

> data("PlantGrowth")
> fit <- aov(weight ~ group, data = PlantGrowth)
> fit.emm <- emmeans(fit, specs = ~ group)
> 
> ## Want to test the contrast c(1/2, -1, 1/2) with Scheffé
> 
> ## The following just gives the uncorrected p-value (of the regular t-test)
> summary(contrast(fit.emm, method = list(c(1/2, -1, 1/2)), adjust = "scheffe"),
+         scheffe.rank = 2)
 contrast        estimate    SE df t.ratio p.value
 c(0.5, -1, 0.5)    0.618 0.241 27 2.560   0.0164 

> 
> ## Adding a second contrast seems to help
> summary(contrast(fit.emm, method = list(c(1/2, -1, 1/2),
+                                         c(1, 0, -1)), adjust = "scheffe"),
+         scheffe.rank = 2)
 contrast        estimate    SE df t.ratio p.value
 c(0.5, -1, 0.5)    0.618 0.241 27  2.560  0.0532 
 c(1, 0, -1)       -0.494 0.279 27 -1.772  0.2265 

P value adjustment: scheffe method with rank 2 

@rvlenth
Copy link
Owner

rvlenth commented May 10, 2021 via email

@rvlenth rvlenth added the bug label May 10, 2021
@rvlenth
Copy link
Owner

rvlenth commented May 10, 2021

Indeed. It took more doing than you might think to correct this. But definitely, if we want Scheffe adjustment with specified rank, it should not ignore that. Now I have, with your example:

> confint(contrast(fit.emm, method = list(c(1/2, -1, 1/2)), adjust = "sch"), scheffe.rank = 2)
 contrast        estimate    SE df lower.CL upper.CL
 c(0.5, -1, 0.5)    0.618 0.241 27 -0.00732     1.24

Confidence level used: 0.95 
Conf-level adjustment: scheffe method with rank 2 

> test(contrast(fit.emm, method = list(c(1/2, -1, 1/2)), adjust = "sch"), scheffe.rank = 2)
 contrast        estimate    SE df t.ratio p.value
 c(0.5, -1, 0.5)    0.618 0.241 27 2.560   0.0532 

P value adjustment: scheffe method with rank 2 

> test(contrast(fit.emm, method = list(c(1/2, -1, 1/2)), adjust = "bon"), scheffe.rank = 2)
 contrast        estimate    SE df t.ratio p.value
 c(0.5, -1, 0.5)    0.618 0.241 27 2.560   0.0164 

So we have it working right for both CIs and tests. The last run verifies that it still ignores families of one in a non-Scheffe situation, even if we specified a scheffe rank.

One more test case:

> test(contrast(fit.emm, method = list(c(1/2, -1, 1/2)), adjust = "sch"), scheffe.rank = 1)
 contrast        estimate    SE df t.ratio p.value
 c(0.5, -1, 0.5)    0.618 0.241 27 2.560   0.0164 

P value adjustment: scheffe method with rank 1

It could have ignored this case since scheffe.rank == 1, but I think it is OK to explicitly annotate that anyway.

Thanks for reporting this.

@rvlenth rvlenth reopened this May 10, 2021
rvlenth added a commit that referenced this issue May 10, 2021
Correction to Scheffe correction bug (#171 addendum)
@meierluk
Copy link

Wonderful, thanks for the fast fix!

@rvlenth
Copy link
Owner

rvlenth commented Jun 30, 2021

I think this issue is resolved, so am closing.

@rvlenth rvlenth closed this as completed Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants