Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emtrends odd behavior with nuisance variables #448

Closed
jnugentkp opened this issue Oct 6, 2023 · 6 comments
Closed

emtrends odd behavior with nuisance variables #448

jnugentkp opened this issue Oct 6, 2023 · 6 comments
Labels

Comments

@jnugentkp
Copy link

jnugentkp commented Oct 6, 2023

Describe the bug

Hi Russ - thanks for your work on this great package. I have recently been using the emmeans() function on categorical variables in a model with 50+ variables and 250K+ observations, so employing the "nuisance" argument to simplify the grid has been important to that workflow. I recently tried the same approach using the emtrends() function with a continuous variable. It works as expected when I don't include any nuisance variables, but when including them, the output doesn't make sense to me (has only one group when it should have two; has an odd trend estimate). See example below. I can't tell if I'm misunderstanding how emtrends() handles nuisance arguments, or whether it's a bug. Thanks!
Josh Nugent
Kaiser Permanente Division of Research

To reproduce

library(emmeans)
    
set.seed(1)
n <- 5000
A <- rnorm(n)
x1 <- rnorm(n)
x2 <- rbinom(n, size = 2, prob = .25)
group <- rbinom(n = n, size = 1, prob = .5)
    
slope0 <- .25
slope1 <- .5
noise <- rnorm(n)
    
y0 <- slope0*A + group + noise
y1 <- slope1*A + group + noise
y <- ifelse(group, y1, y0)
   
group_fact <- as.factor(group)
    
mod <- lm(y ~ group_fact*A + x1 + x2)
summary(mod)
    
# Nuisance parameters that happen to be noise as well
W <- c("x1", "x2")


# Works in simulated data without specifying nuisance set,
# but not with large data set with many nuisance variables...
emtrends(object = mod, 
         specs = ~ group_fact*A,
         var = "A")

# These are the correct estimates of the group-level slopes
# group_fact        A A.trend     SE   df lower.CL upper.CL
# 0          -0.00319   0.243 0.0193 4994    0.205    0.281
# 1          -0.00319   0.490 0.0197 4994    0.451    0.529
    
# Does not work as expected
emtrends(object = mod, 
                specs = ~ group_fact*A,
                var = "A", nuisance = W)

# group_fact        A A.trend   SE   df lower.CL upper.CL
# 0          -0.00319     135 3.79 4994      128      143
# Results are averaged over the levels of: 2 nuisance factors 
# Confidence level used: 0.95

Expected behavior

I expect that the slopes should be around .25 for the "0" group and .5 for the "1" group, as they are when I exclude the nuisance argument.

@jnugentkp jnugentkp added the bug label Oct 6, 2023
@rvlenth
Copy link
Owner

rvlenth commented Oct 6, 2023

Thanks for reporting this. Clearly there's a problem, and I'll look into it.

PS --- It's a little confusing to include A in the specs, since its value isn't material to the estimates. We get less confusing output by leaving it out:

> emtrends(object = mod, 
+          specs = ~ group_fact,
+          var = "A")
 group_fact A.trend     SE   df lower.CL upper.CL
 0            0.243 0.0193 4994    0.205    0.281
 1            0.490 0.0197 4994    0.451    0.529

Confidence level used: 0.95 

@rvlenth
Copy link
Owner

rvlenth commented Oct 6, 2023

In the meantime, it is possible to do it by manually computing the difference quotients, even with nuisance factors included:

> rg = ref_grid(mod, at = list(A = c(-.5,.5)), nuis = W)
> confint(contrast(emmeans(rg, ~A|group_fact), "consec"), by = NULL)
 contrast       group_fact estimate     SE   df lower.CL upper.CL
 A0.5 - (A-0.5) 0             0.243 0.0193 4994    0.200    0.286
 A0.5 - (A-0.5) 1             0.490 0.0197 4994    0.446    0.534

Results are averaged over the levels of: 2 nuisance factors 
Confidence level used: 0.95 
Conf-level adjustment: mvt method for 2 estimates 

@jnugentkp
Copy link
Author

Thanks for that workaround - very helpful. I'll use that in the meantime and stay tuned to see if there's an update at some point. Many thanks-

@rvlenth
Copy link
Owner

rvlenth commented Oct 6, 2023

It was a bookkeeping error. I made a fix to the code for nuisance factors so that the grid slot is consistent with the linfct slot. That alone actually fixes it, but in the meantime I had made a fix to emtrends so it counts the rows in linfct instead of grid. That also fixed the bug.

> emtrends(mod, "group_fact", "A", nuis = W)
 group_fact A.trend     SE   df lower.CL upper.CL
 0            0.243 0.0193 4994    0.205    0.281
 1            0.490 0.0197 4994    0.451    0.529

Results are averaged over the levels of: 2 nuisance factors 
Confidence level used: 0.95 

@rvlenth rvlenth closed this as completed Oct 6, 2023
@rvlenth
Copy link
Owner

rvlenth commented Oct 6, 2023

PS you can install it from GitHub (see how on the main code page) and it should work

@jnugentkp
Copy link
Author

Great, thanks so much! I figured it was a something small. Glad to hear it was an easy fix. I'll use the GitHub installation method and move forward. Have a great weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants