-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prediction for new group levels (rstanarm vs. lme4) #268
Labels
Comments
For brms reference models, there is a related (though slightly different) issue: paul-buerkner/brms#1286. |
After some internal correspondence, we came to the conclusion that this is indeed a bug and needs to be fixed (by making the lme4 prediction draw new group-level effects for new levels). |
fweber144
added a commit
to fweber144/projpred
that referenced
this issue
Mar 4, 2022
downstream code on-the-fly. The approach implemented now changes the default of `seed` (and `.seed`) arguments from `NULL` to `sample.int(.Machine$integer.max, 1)`. For top-level functions which are only called by the user and not within projpred (should only be `cv_varsel()`, `project()`, and `proj_predict()`), this is not strictly necessary, but done for consistency with lower-level functions which are also called within projpred. The main advantage of generating seeds on-the-fly is that more seeds may be easily added and used in downstream code (which will be important for fixing issue stan-dev#268), without always having to add a new argument to top-level functions. Apart from that, this approach makes it possible that users set a seed once at the beginning of their script and then use the default `seed` (and `.seed`) arguments -- they will then get reproducible results which would not have been the case for the former implementation. However, due to the resetting of `.Random.seed`, this approach does not avoid yet that the same seed is re-used multiple times (which is probably bad practice from a theoretical point of view).
Merged
fweber144
added a commit
to fweber144/projpred
that referenced
this issue
Mar 11, 2022
Note that I was talking of lme4 above (so GLMMs), but GAMMs are probably concerned, too. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For multilevel rstanarm reference models, there is an inconsistency between the reference model and the submodels with respect to the calculation of the linear predictor for new group levels in a K-fold CV: rstanarm reference models call
rstanarm::posterior_linpred.stanreg()
with no other arguments apart fromobject
andnewdata
:projpred/R/refmodel.R
Line 498 in d7c64e4
lme4::predict.merMod()
is called with no other arguments apart fromobject
,newdata
, andallow.new.levels = TRUE
:projpred/R/divergence_minimizers.R
Line 427 in d7c64e4
My question is now: Do you think this needs to be changed and if yes, how? The problem is that rstanarm's options for dealing with new levels are not quite the same as lme4's options. The only common possibility (as far as I understand these two packages) is to set group-level effects for all levels (i.e., existing and new ones) to zero. This, however, has important implications for the evaluation of the models' predictive performance: Adding a group-level effect to a submodel which previously only included population-level effects would then make no (or in case there is a dispersion parameter, only little) difference with respect to the predictive performance measures like ELPD, RMSE, etc. In other words, due to the existing levels, the value of the performance measures might look worse than predictive performance actually is. Apart from that, I'm not sure if, for families with a dispersion parameter, the original dispersion parameter estimates would then still be appropriate (similarly to this).
Needed for the following examples:
Example for how rstanarm deals with existing and new levels:
The same for lme4:
The text was updated successfully, but these errors were encountered: