Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

predict: hard-to-fix bug with 'intermediate' re.form #402

Open
bbolker opened this Issue Nov 10, 2016 · 3 comments

Comments

Projects
None yet
2 participants
Owner

bbolker commented Nov 10, 2016

In predict.merMod, we can set re.form=NA to include no random effect terms in the prediction and re.form=NULL to include all random effect terms. These both work fine. Specifying a subset of the random effect terms will fail at present. This is hard to fix, because in the process of doing the new prediction we are trying to build a new model frame from the new data based on a different random-effects model term. The new model frame must be built including information about any data-based bases (e.g. poly, splines::ns terms) in the original random-effects formula, which means needing to inherit the predvars attribute of the terms attribute of the original model frame, but this will break if there are variables in predvars that are not in the new data frame.

A workaround would be to require the user to include all predictor variables in the original formula to in the new prediction data frame (even though their value would be ignored, e.g. could be NA); having the code add these unneeded variables (if we can figure out what they are) would be a way we could hack around the problem.

(The following explanation is long but I can't see how to minimize it further.)

Suppose we have a model based on these minimal data (bogus, but good enough for illustration) and this formula:

dd <- data.frame(x=1:10,f1=letters[1:2],f2=LETTERS[1:2],f3=factor(1:2))
ff <- ~ 1 + (1|f1) + (1|f2) + (poly(x,2)|f3)    

In order to construct the model frame (i.e. select variables from the data, process NA values, get ready to construct a model matrix), we use lme4::subbars to replace bars with plus signs:

(ff2 <- lme4::subbars(ff))
## ~1 + (1 + f1) + (1 + f2) + (poly(x, 2) + f3)
## (simplifies to ~f1 + f2 + poly(x, 2) + f3)
m1 <- model.frame(ff2,data=dd)
names(m1)
## [1] "f1"         "f2"         "poly(x, 2)" "f3"        
attr(terms(m1),"predvars")
## list(f1, f2,
## poly(x, 2, coefs = list(alpha = c(5.5, 5.5), norm2 = c(1, 10, 82.5, 528))), f3)

Now I want to be able to construct new model frames/matrices in two scenarios.

(1) based on these data (subset of x values, but all predictors):

dd2 <- dd[1,]

(a)

model.frame(terms(m1),data=dd2)

This works fine because we have predvars stored, in particular the information for constructing the poly() basis.

But if we tried to use the original formula rather than the terms object from the matrix we'd be in trouble

(b)

model.frame(lme4::subbars(ff),dd2)
## Error in poly(x, 2) : 'degree' must be less than number of unique points

(this is actually a good outcome - if we had a few x values we wouldn't get the error, but we would get the wrong orthogonal basis!)

(2) based on these data:

dd3 <- dd[c("f1","f2")]

but only using the first two random effects (i.e. re.form = ~(1|f1)+(1|f2)):

ff3 <- ~(1|f1) + (1|f2)

This (using the new formula) is fine:

(a)

model.frame(lme4::subbars(ff3),dd3)

But this (using the old formula with predvars in it) is not:

(b)

model.frame(terms(m1),dd3)

It fails because we don't have x (or f3) in the new model frame.

Is there a clean general solution to this problem, particularly one that doesn't involve operating on the predvars attribute (which is an unevaluated languag object, so somewhat ugly to process ...)

Owner

bbolker commented Nov 10, 2016

Still banging my head on this. Came up with (slightly ugly) machinery for removing missing variables from predvars, but that fails because we end up with an inconsistent terms object. I really don't want to write all of the machinery to remove a variable everywhere it needs to be removed from a terms object. Alternatively I can try to replace any complex/data-dependent predvars elements with their values from the original fit ... ??

@bbolker bbolker added a commit that referenced this issue Nov 10, 2016

@bbolker bbolker fighting with #402 a04e5be
Contributor

pitakakariki commented Nov 11, 2016

If you're entertaining ugly solutions...

What if we passed model.frame a data argument that's an environment. The environment has the variables we want; the parent has the variables model.frame thinks it needs but really doesn't.

Contributor

pitakakariki commented Nov 11, 2016

Something like:

dd4 <- list2env(dd3, parent=list2env(dd))
model.frame(terms(m1), dd4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment