Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joint_tests omits covariate transformed with asin(sqrt()) via make.tran() #438

Closed
hnguyen19 opened this issue Aug 29, 2023 · 8 comments
Closed
Labels

Comments

@hnguyen19
Copy link

Describe the bug

joint_tests omits covariate transformed with asin(sqrt()) via make.tran()

To reproduce

set.seed(83)
library(emmeans)

dummy <- expand.grid(
  covar  = runif(72, min = 0, max = 1),
  resp = runif(72, min = 0, max = 2),
  site = c("a", "b", "c", "d", "e", "g"),
  C = 1:4,
  R = c("i", "j", "k")
)

as.tran <- make.tran("asin.sqrt")
mod <- with(as.tran, lm(sqrt(resp) ~ site  +  C * R * linkfun(covar), data = dummy))
joint_tests(mod)

model term df1    df2 F.ratio p.value
 site         5 373231   0.000  1.0000
 C            1 373231   0.000  1.0000
 R            2 373231   0.000  1.0000
 C:R          2 373231   0.000  1.0000

Switching resp and covar, and the joint_tests(mod2) below gave the full ancova model

mod2 <- with(as.tran, lm( linkfun(covar) ~ site  +  C * R * sqrt(resp), data = dummy))
joint_tests(mod2)
 model term df1    df2 F.ratio p.value
 site         5 373231   0.000  1.0000
 C            1 373231   0.000  1.0000
 R            2 373231   0.000  1.0000
 resp         1 373231   0.000  1.0000
 C:R          2 373231   0.000  1.0000
 C:resp       1 373231   0.000  1.0000
 R:resp       2 373231   0.000  1.0000
 C:R:resp     2 373231   0.000  1.0000

Expected behavior

A full ANCOVA table similar to the table obtained from joint_tests(mod2).

@hnguyen19 hnguyen19 added the bug label Aug 29, 2023
@rvlenth rvlenth added question and removed bug labels Aug 29, 2023
@rvlenth
Copy link
Owner

rvlenth commented Aug 29, 2023

The covariate is actually taken into account, but then the results are suppressed because those terms have zero d.f..:

> joint_tests(mod, show0df = TRUE)
 model term   df1    df2 F.ratio p.value note
 site           5 373231   0.000  1.0000     
 C              1 373231   0.000  1.0000     
 R              2 373231   0.000  1.0000     
 covar          0     NA      NA      NA  d e
 C:R            2 373231   0.000  1.0000     
 C:covar        0     NA      NA      NA  d e
 R:covar        0     NA      NA      NA  d e
 C:R:covar      0     NA      NA      NA  d e
 (confounded)   0 373231     NaN     NaN     

d: df1 reduced due to linear dependence 
e: df1 reduced due to non-estimability 

This in turn happens because by default, joint_tests() reduces the covariate using the meanint() function:

> meanint(dummy$covar)
[1] -0.5176167  1.4823833
> as.tran$linkfun(meanint(dummy$covar))
[1] NaN NaN
Warning messages:
1: In sqrt(mu/alpha) : NaNs produced
2: In asin(sqrt(mu/alpha)) : NaNs produced

That is, the transformation is applied to values outside its valid range. If you use a function that yields values within the interval (0,1), you get less puzzling results:

> joint_tests(mod, cov.reduce = function(x) mean(x) + c(-0.1, 0.1))
 model term df1    df2 F.ratio p.value
 site         5 373231   0.000  1.0000
 C            1 373231   0.000  1.0000
 R            2 373231   0.000  1.0000
 covar        1 373231   0.000  1.0000
 C:R          2 373231   0.000  1.0000
 C:covar      1 373231   0.000  1.0000
 R:covar      2 373231   0.000  1.0000
 C:R:covar    2 373231   0.000  1.0000

@rvlenth
Copy link
Owner

rvlenth commented Aug 29, 2023

BTW, the example is kind of flaky because you use response and covariate values as dimensions in the grid you created. I think you intended to do:

set.seed(1234)  # so random numbers are reproducible

dummy <- expand.grid(
  site = c("a", "b", "c", "d", "e", "g"),
  C = 1:4,
  R = c("i", "j", "k")
)
dummy$covar <- runif(72, min = 0, max = 1)
dummy$resp <- runif(72, min = 0, max = 2)

Now if we re-fit the model, we see P values other than 1:

> joint_tests(mod, cov.reduce = \(x) mean(x) + c(-0.1, 0.1))
 model term df1 df2 F.ratio p.value
 site         5  55   1.038  0.4049
 C            1  55   0.728  0.3972
 R            2  55   4.993  0.0102
 covar        1  55   0.000  0.9948
 C:R          2  55   0.623  0.5400
 C:covar      1  55   0.661  0.4197
 R:covar      2  55   0.513  0.6013
 C:R:covar    2  55   0.202  0.8173

@hnguyen19
Copy link
Author

The covariate is actually taken into account, but then the results are suppressed because those terms have zero d.f..:

> joint_tests(mod, show0df = TRUE)
 model term   df1    df2 F.ratio p.value note
 site           5 373231   0.000  1.0000     
 C              1 373231   0.000  1.0000     
 R              2 373231   0.000  1.0000     
 covar          0     NA      NA      NA  d e
 C:R            2 373231   0.000  1.0000     
 C:covar        0     NA      NA      NA  d e
 R:covar        0     NA      NA      NA  d e
 C:R:covar      0     NA      NA      NA  d e
 (confounded)   0 373231     NaN     NaN     

d: df1 reduced due to linear dependence 
e: df1 reduced due to non-estimability 

This in turn happens because by default, joint_tests() reduces the covariate using the meanint() function:

> meanint(dummy$covar)
[1] -0.5176167  1.4823833
> as.tran$linkfun(meanint(dummy$covar))
[1] NaN NaN
Warning messages:
1: In sqrt(mu/alpha) : NaNs produced
2: In asin(sqrt(mu/alpha)) : NaNs produced

That is, the transformation is applied to values outside its valid range. If you use a function that yields values within the interval (0,1), you get less puzzling results:

> joint_tests(mod, cov.reduce = function(x) mean(x) + c(-0.1, 0.1))
 model term df1    df2 F.ratio p.value
 site         5 373231   0.000  1.0000
 C            1 373231   0.000  1.0000
 R            2 373231   0.000  1.0000
 covar        1 373231   0.000  1.0000
 C:R          2 373231   0.000  1.0000
 C:covar      1 373231   0.000  1.0000
 R:covar      2 373231   0.000  1.0000
 C:R:covar    2 373231   0.000  1.0000

Thank you for the explanation and for correcting my dummy data set. I thought asin.sqrt took [0,1] inclusive as mod run error-free (I did see NAs for when some values on the original scale were larger than 1.) I am confused why 0 df occurred. Transforming the covar and save it as a column before fitting the model did not return any 0 df. I'm using your revised dummy data set in the next step.

dummy$covar.t <- asin(sqrt(dummy$covar))
model term  df1 df2 F.ratio p.value
 site          5  55   1.038  0.4049
 C             1  55   0.692  0.4090
 R             2  55   5.055  0.0097
 covar.t       1  55   0.000  0.9948
 C:R           2  55   0.623  0.5402
 C:covar.t     1  55   0.661  0.4197
 R:covar.t     2  55   0.513  0.6013
 C:R:covar.t   2  55   0.202  0.8173

In case covar must be transformed before fitting, would you recommend cov.reduce = \(x) mean(x) + c(-0.1, 0.1) as the default solution to get the full ANCOVA table? When exploring different options for cov.reduce or cov.keep, I saw very different pairs of F and p for the same factor using the same model.

@rvlenth
Copy link
Owner

rvlenth commented Aug 29, 2023

If you transform first and save it as a predictor, then you are talking about a different model with covar.t as a predictor instead of covar. The joint tests are based, respectively, on a symmetric interval on the covar scale or a symmetric interval on the covar.t scale, and since the transformation in nonlinear, a symmetric interval on one scale is not symmetric on the other.

I do not have a recommendation as to which is right or which is preferable. And as is discussed on the help page, covariates create complications and there are infinitely many possible results for joint_tests() depending on what intervals you choose for covar (or t.covar). And I suggest that precisely because this is a point of confusion, it is really difficult to explain what it is you are testing with joint_tests() and so you are probably better off not using it at all.

What may be more meaningful to you might be results like this:

> joint_tests(mod, at = list(covar = c(0.25, 0.5, 0.75)), by = "covar")
covar = 0.25:
 model term df1 df2 F.ratio p.value
 site         5  55   1.038  0.4049
 C            1  55   0.041  0.8394
 R            2  55   4.804  0.0119
 C:R          2  55   0.562  0.5732

covar = 0.50:
 model term df1 df2 F.ratio p.value
 site         5  55   1.038  0.4049
 C            1  55   0.934  0.3380
 R            2  55   4.514  0.0153
 C:R          2  55   0.616  0.5439

covar = 0.75:
 model term df1 df2 F.ratio p.value
 site         5  55   1.038  0.4049
 C            1  55   1.311  0.2572
 R            2  55   1.566  0.2181
 C:R          2  55   0.365  0.6961

which gives you three different anova tables for just the factors, depending on which covariate value is used. Those tests are based on easy-to-understand hypotheses, formulated for those specific covariate values.

@hnguyen19
Copy link
Author

hnguyen19 commented Aug 31, 2023

Thank you for the details. I was looking for the impact of the covar on the resp and wanted to see how the F-value associated with covar compared to those of the categorical variables, and then decide whether to use the covar in predicting resp.

For cases that covar is transformed with log, which is non-linear, 0 df does not occur. Is it inappropriate to call joint_tests without specifying the levels of covar?

mod5 <- lm(sqrt(resp)  ~ site  +  C * R * log(covar + 0.01), data = dummy)
 joint_tests(mod5)
 model term df1 df2 F.ratio p.value
 site         5  55   1.152  0.3445
 C            1  55   1.579  0.2142
 R            2  55   1.797  0.1755
 covar        1  55   0.040  0.8416
 C:R          2  55   0.671  0.5155
 C:covar      1  55   1.033  0.3139
 R:covar      2  55   0.060  0.9415
 C:R:covar    2  55   0.665  0.5186

In this case, though, covar should not be used, but in my real data set, the covar had the biggest impact on resp.

@rvlenth
Copy link
Owner

rvlenth commented Aug 31, 2023

You are asking if joint_tests() should be used for model selection, and the answer is decidedly no. That function, and indeed the entire emmeans package, is designed to perform post hoc analyses of an existing model. None of it is designed for model selection. And I would never use type 3 tests for model selection anyway. You should use type-1 (anova()) or type-2 (car::Anova()) or other methods designed for selecting models.

This series of questions seems to be following a meandering path. We (or at least I) have established that this is not a bug, and now we're not even talking about the same transformation.

@rvlenth
Copy link
Owner

rvlenth commented Sep 4, 2023

It looks like this thread is completed, so am closing

@rvlenth rvlenth closed this as completed Sep 4, 2023
@hnguyen19
Copy link
Author

Thanks for correcting. I appreciate your help.

rvlenth added a commit that referenced this issue Sep 6, 2023
Also new make.meanint() and make.symmint() fcns
  (inspired by #438)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants