Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
predict: hard-to-fix bug with 'intermediate' re.form #402
Comments
|
Still banging my head on this. Came up with (slightly ugly) machinery for removing missing variables from |
bbolker
added a commit
that referenced
this issue
Nov 10, 2016
|
|
bbolker |
a04e5be
|
|
If you're entertaining ugly solutions... What if we passed |
|
Something like:
|
bbolker commentedNov 10, 2016
In
predict.merMod, we can setre.form=NAto include no random effect terms in the prediction andre.form=NULLto include all random effect terms. These both work fine. Specifying a subset of the random effect terms will fail at present. This is hard to fix, because in the process of doing the new prediction we are trying to build a new model frame from the new data based on a different random-effects model term. The new model frame must be built including information about any data-based bases (e.g.poly,splines::nsterms) in the original random-effects formula, which means needing to inherit thepredvarsattribute of thetermsattribute of the original model frame, but this will break if there are variables inpredvarsthat are not in the new data frame.A workaround would be to require the user to include all predictor variables in the original formula to in the new prediction data frame (even though their value would be ignored, e.g. could be
NA); having the code add these unneeded variables (if we can figure out what they are) would be a way we could hack around the problem.(The following explanation is long but I can't see how to minimize it further.)
Suppose we have a model based on these minimal data (bogus, but good enough for illustration) and this formula:
In order to construct the model frame (i.e. select variables from the data, process NA values, get ready to construct a model matrix), we use
lme4::subbarsto replace bars with plus signs:Now I want to be able to construct new model frames/matrices in two scenarios.
(1) based on these data (subset of
xvalues, but all predictors):(a)
This works fine because we have predvars stored, in particular the information for constructing the
poly()basis.But if we tried to use the original formula rather than the
termsobject from the matrix we'd be in trouble(b)
(this is actually a good outcome - if we had a few
xvalues we wouldn't get the error, but we would get the wrong orthogonal basis!)(2) based on these data:
but only using the first two random effects (i.e. re.form = ~(1|f1)+(1|f2)):
This (using the new formula) is fine:
(a)
But this (using the old formula with
predvarsin it) is not:(b)
It fails because we don't have
x(orf3) in the new model frame.Is there a clean general solution to this problem, particularly one that doesn't involve operating on the
predvarsattribute (which is an unevaluatedlanguagobject, so somewhat ugly to process ...)