Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emmeans no longer recognizes stratified factors in Survival #473

Closed
mdjpe opened this issue Mar 31, 2024 · 9 comments
Closed

Emmeans no longer recognizes stratified factors in Survival #473

mdjpe opened this issue Mar 31, 2024 · 9 comments

Comments

@mdjpe
Copy link

mdjpe commented Mar 31, 2024

I sent the below to Terry Therneau (maintainer of the survival package) and he gave an unhelpful response:

"Why are you asking me about a failure in another package? Ask the author of emmeans."

As an ordinary user, all I know is that script which used to work doesn't work now. Could you take a look?

Survival models with a stratified factor appear to have stopped cooperating with emmeans.

For example, the following model, which was fully functional on R for Windows last year (and was used in my published work), currently produces errors when run in R for Ubuntu (I don't currently have R on a Windows machine, so can't check that):

cphfr <- coxph(Surv(tstart, tstop, death) ~ strata(Region) * Patties + Fume, newBHPsurv, cluster=ColonyGroup)

emm <- emmeans(cphfr, ~ Patties | Region)
seem <- summary(pairs(emm, reverse = T), by = NULL, type="response", adjust = "none", infer = c(T, T))
names(seem)[1] <- "Patties effect"
pander(seem)

Error in emmeans(cphfr, ~Patties | Region) :
No variable named Region in the reference grid
In addition: Warning message:
In model.matrix.default(trms, m, contrasts.arg = object$contrasts) :
variable 'strata(Region)' is absent, its contrast will be ignored

The above definitely worked in:

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

survival_3.4-0 emmeans_1.8.2

And is not working in:

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
[1] survival_3.5-7 emmeans_1.10.0

@mdjpe mdjpe added the bug label Mar 31, 2024
@rvlenth
Copy link
Owner

rvlenth commented Mar 31, 2024 via email

@rvlenth
Copy link
Owner

rvlenth commented Apr 1, 2024

There was a change in emmeans version 1.8.8 in relation to issue #429. It seems to be exactly the reverse of what you are reporting. Can you review that issue please, and let me know how yours differs?

Note

I edited and deleted from your initial submission my instructions for reporting issues. The idea is for you to use the same sectioning (the prompts that start with ##) and replace the instructions with your comments relevant to those instructions. I would appreciate your following that practice in the future.

@rvlenth
Copy link
Owner

rvlenth commented Apr 1, 2024

PS -- I cannot reproduce your issue because I do not have the dataset newBPHsurv. My issue guidelines clearly say that I need a reproducible example. So I will close this issue until you re-open it and either provide the dataset or prepare a similar example using data I can access, e.g. a dataset in the survey package. (Please re-open this issue; do not create a new issue.)

@rvlenth rvlenth closed this as completed Apr 1, 2024
@mdjpe
Copy link
Author

mdjpe commented Apr 1, 2024

I don't see a re-open button, so maybe the comments section will do.

Here is a reproducible example (cox model borrowed from Therneau: "A package for survival analysis in R". My copy is dated Aug 5, 2022.):

require(survival)
Loading required package: survival
require(emmeans)
Loading required package: emmeans

cfit2 <- coxph(Surv(time, status) ~ age + sex + wt.loss + strata(inst), data=lung)

emm <- emmeans(cfit2, ~ inst)
Error in emmeans(cfit2, ~inst) :
No variable named inst in the reference grid

Specifying strata(inst) instead does not help:

emm <- emmeans(cfit2, ~ strata(inst))
Error in emmeans(cfit2, ~strata(inst)) :
No variable named inst in the reference grid

Other variables in the model produce the expected results:

emm <- emmeans(cfit2, ~ wt.loss)
emm
wt.loss emmean SE df asymp.LCL asymp.UCL
9.78 0.679 0.736 Inf -0.763 2.12

Replacing strata(inst) with inst in the model allows emmeans to be generated (but presumably violates the proportional hazards assumption)

cfit222 <- coxph(Surv(time, status) ~ age + sex + wt.loss + inst, data=lung)
emm <- emmeans(cfit222, ~ inst)
emm
inst emmean SE df asymp.LCL asymp.UCL
11 0.41 0.679 Inf -0.92 1.74

My own case requires an interaction between a treatment and a stratified factor. Emmeans are not available for either the stratified factor or the interacting treatment. For example,

cfit222dd <- coxph(Surv(time, status) ~ strata(sex)*age, data=lung)
emm <- emmeans(cfit222dd, ~ sex)

Error in emmeans(cfit222dd, ~sex) :
No variable named sex in the reference grid
In addition: Warning message:
In model.matrix.default(trms, m, contrasts.arg = object$contrasts) :
variable 'strata(sex)' is absent, its contrast will be ignored

emm <- emmeans(cfit222dd, ~ age)
Warning message:
In model.matrix.default(trms, m, contrasts.arg = object$contrasts) :
variable 'strata(sex)' is absent, its contrast will be ignored
emm
Error in X[ii, ii, drop = FALSE] %*% y[ii] : non-conformable arguments

Below is the full output from sessionInfo()

sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

time zone: America/Winnipeg
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] emmeans_1.10.0 survival_3.5-7

loaded via a namespace (and not attached):
[1] compiler_4.3.3 Matrix_1.6-5 estimability_1.4.1 tools_4.3.3
[5] coda_0.19-4.1 mvtnorm_1.2-4 splines_4.3.3 grid_4.3.3
[9] xtable_1.8-4 lattice_0.22-5

@mdjpe
Copy link
Author

mdjpe commented Apr 1, 2024 via email

@mdjpe
Copy link
Author

mdjpe commented Apr 1, 2024 via email

@rvlenth
Copy link
Owner

rvlenth commented Apr 1, 2024

An issue can be reopened via the "Reopen with comment" button just next to the "Comment" button below the comment compose window. Similarly, the issue can be closed with the "Close with comment" button. I think that since you are listed as a "participant" on this issue, I think that you can reopen it; and I think I have experienced this happening with past issue submitters. But maybe I'm wrong.

I asked you to look at issue #429, but you offer no comment on that. To tell the truth, I am surprised that your example worked in the past, due to that very bug. I suppose it has to do with the interaction. And if so (see below), some coeffficients are not available, and it's possible that what seemed to "work" in fact didn't, so that the results obtained are nonsense.

This seems to be a no-win situation. I fixed #429 on July 10, 2023 by disallowing strata factors. Restoring strata factors will possibly fix the problem here, but will revive the bug in #429.

Meanwhile, I've looked at various articles on the web about strata in Cox regression, and several of them say that you cannot do inference about the stratification factors. So that, along with other considerations, suggests to me that it is correct to not allow stratification factors in the reference grid. My intuition suggests that strata are something akin to random block effects in a linear model. This seems to be borne-out by the coefficient of inst not being included in the summary:

> summary(cfit2)
Call:
coxph(formula = Surv(time, status) ~ age + sex + wt.loss + strata(inst), 
    data = lung)

  n= 213, number of events= 151 
   (15 observations deleted due to missingness)

             coef exp(coef)  se(coef)      z Pr(>|z|)
age      0.023535  1.023814  0.010415  2.260  0.02383
sex     -0.516036  0.596882  0.190617 -2.707  0.00679
wt.loss -0.001698  0.998303  0.006848 -0.248  0.80413

        exp(coef) exp(-coef) lower .95 upper .95
age        1.0238     0.9767    1.0031    1.0449
sex        0.5969     1.6754    0.4108    0.8672
wt.loss    0.9983     1.0017    0.9850    1.0118

Concordance= 0.637  (se = 0.034 )
Likelihood ratio test= 14.06  on 3 df,   p=0.003
Wald test            = 13.32  on 3 df,   p=0.004
Score (logrank) test = 13.67  on 3 df,   p=0.003

For the other model, we get

> summary(cfit222dd)
Call:
coxph(formula = Surv(time, status) ~ strata(sex) * age, data = lung)

  n= 228, number of events= 165 

                          coef exp(coef)  se(coef)      z Pr(>|z|)
age                   0.019064  1.019247  0.011353  1.679   0.0931
strata(sex)sex=2:age -0.008353  0.991682  0.019336 -0.432   0.6657

                     exp(coef) exp(-coef) lower .95 upper .95
age                     1.0192     0.9811    0.9968     1.042
strata(sex)sex=2:age    0.9917     1.0084    0.9548     1.030

Concordance= 0.546  (se = 0.026 )
Likelihood ratio test= 3.37  on 2 df,   p=0.2
Wald test            = 3.29  on 2 df,   p=0.2
Score (logrank) test = 3.29  on 2 df,   p=0.2

This summary does include coefficients for the interaction, but not a main effect of strata(sex). But I wonder if this is just an oversight, or if this model makes sense. I don't know.

Adding further confusion is that strata() is just a function, and though its documentation says that it is designed for use with coxph models, we could in fact use it with any model, e.g. one fitted with lm():

> junk <- lm(time ~ age + sex + wt.loss + strata(inst), data = lung)
> emmeans(junk, "sex")
 sex emmean   SE  df lower.CL upper.CL
   1    276 25.3 192      226      326
   2    336 26.3 192      284      388
>    . . .
> junk2 <- lm(time ~ age + sex + wt.loss + inst, data = lung)
> emmeans(junk2, "sex")
 sex emmean   SE  df lower.CL upper.CL
   1    294 19.0 208      256      331
   2    344 23.2 208      298      390

The model junk2 makes some sense, at least mechanically, whereas junk itself is allowed, but doesn't make sense and yields different results. Something like that could explain why I was at first mistaken in allowing stratification factors as factors of inferential interest.

The bottom line

I cannot allow stratification factors as factors of inferential interest in Cox models, because even if it's appropriate to do so in the thinking of some users, we don't have regression coefficients, nor variances and covariances for them. I do not know why it seemed to work in earlier versions, but I am guessing that those results are suspect.

@rvlenth
Copy link
Owner

rvlenth commented Apr 4, 2024

OK, here's the deal. I don't know what's right -- even after going through the user manual for survival and looking at all 136 cases where the word "strata" is used (many, to my irritation, being grammatical errors where the author meant "stratum"). So I decided to play along with it. That means allowing any factors in strata in the reference grid but deleting columns of the X matrix if there are no corresponding coefficients. So now, everything runs without error, as demonstrated below. But please note that we can have no main effect of a strata() factor, so if you have strata(x) as an additive term in a model, you get the same results for each level of x -- and if you compare levels of x, you'll get zeros.

The changes I made also side-step the bug observed in #429.

Just keep in mind that I'm not keeping you from doing anything that might be nonsensical -- it's on you to do appropriate analyses.

## What the hell to do about `strata()`?

library(survival)
library(emmeans)

#### Additive model
moda <- coxph(Surv(time, status) ~ wt.loss + age + strata(sex), data = lung)
#### Interaction model
modi <- coxph(Surv(time, status) ~ wt.loss + age*strata(sex), data = lung)
### survreg additive model
sra <- survreg(Surv(time, status) ~ wt.loss + age + strata(sex), data = lung)
### survreg interactive model
sri <- survreg(Surv(time, status) ~ wt.loss + age * strata(sex), data = lung)

at = list(age = c(20,30))

emmeans(moda, ~ age*sex, at = at)
##  age sex emmean    SE  df asymp.LCL asymp.UCL
##   20   1  0.386 0.203 Inf  -0.01142     0.783
##   30   1  0.578 0.296 Inf  -0.00178     1.158
##   20   2  0.386 0.203 Inf  -0.01142     0.783
##   30   2  0.578 0.296 Inf  -0.00178     1.158
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

emmeans(modi, ~ age*sex, at = at)
##  age sex emmean    SE  df asymp.LCL asymp.UCL
##   20   1  0.459 0.247 Inf   -0.0255     0.943
##   30   1  0.688 0.364 Inf   -0.0259     1.402
##   20   2  0.247 0.332 Inf   -0.4042     0.897
##   30   2  0.369 0.492 Inf   -0.5956     1.335
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

emmeans(sra, ~ age*sex, at = at)
##  age sex emmean    SE  df lower.CL upper.CL
##   20   1   6.70 0.311 209     6.09     7.32
##   30   1   6.56 0.242 209     6.08     7.04
##   20   2   6.70 0.311 209     6.09     7.32
##   30   2   6.56 0.242 209     6.08     7.04
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

emmeans(sri, ~ age*sex, at = at)
##  age sex emmean    SE  df lower.CL upper.CL
##   20   1   6.58 0.310 208     5.97     7.20
##   30   1   6.43 0.243 208     5.95     6.91
##   20   2   6.70 0.308 208     6.09     7.31
##   30   2   6.61 0.241 208     6.13     7.08
## 
## Results are given on the log (not the response) scale. 
## Confidence level used: 0.95

Created on 2024-04-04 with reprex v2.0.2

@rvlenth rvlenth added bug and removed wontfix labels Apr 4, 2024
rvlenth added a commit that referenced this issue Apr 4, 2024
@mdjpe
Copy link
Author

mdjpe commented Apr 19, 2024

Hi, Dr. Lenth. Following your revisions, I have run the script from my previous work and it returns the results that it used to, which is correct according to my understanding. That is, it gives estimates for the non-stratified factor at each level of the stratified factor, and they are the same estimates as previously. Thank you for your help.

@mdjpe mdjpe closed this as completed Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants