-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt summary()
and print()
output to pct_solution_terms_cv
#289
Comments
pct_solution_terms_cv
into the summary()
and print()
outputsummary()
and print()
output to pct_solution_terms_cv
Thank you for explaining #316 and pointing me to this! For us the proposed solution would be excellent - access to solution_terms_cv would enable us to explore model selection frequency a la Heinze et al (https://avehtari.github.io/modelselection/bodyfat.html) and moreover allow us to perform further modelling on solution_terms_cv directly (rather than a single variable selection, typically our work involves variable reduction to inform expanded data collection, before a final variable selection). |
PR #406 introduced some major new features concerning this issue, so you can now use them via the GitHub version (branch |
I think the
summary()
andprint()
output forvsel
objects fromcv_varsel()
needs to be adapted to properly account forpct_solution_terms_cv
. I'm not sure yet about the best way to achieve this, so I'll leave this open for discussion. At the very end of this comment, I started some thoughts.Currently, only the solution path from the full-data search is shown. However, the CV folds may have differing solution paths (this is why
pct_solution_terms_cv
exists), at least as long as we have the search included in the CV, i.e., LOO CV withvalidate_search = TRUE
or K-fold CV (the latter currently only supportsvalidate_search = TRUE
). Ifpct_solution_terms_cv
is not incorporated into thesummary()
andprint()
output (as is currently the case), users might get the false impression that there is no uncertainty with respect to the solution path and that the printed values of the performance measures (e.g., the ELPD values) are based on the full-data search (which they don't, because they are based on the cross-validated searches).Example:
In this case, the CV folds have different solution paths:
cvvs$pct_solution_terms_cv
givesTo see the different solution paths more clearly, debug the
cv_varsel()
call above usingdebug(projpred:::kfold_varsel)
untilsolution_terms_cv
has been created and then inspectsolution_terms_cv
:The
print()
output, however, does not point out the uncertainty with respect to the different solution paths:print(cvvs)
gives:Perhaps a first step would be to remove column
solution_terms
from theSelection Summary:
output table (see above). Thesolution_terms()
accessor function can always be used to get the solution path from the full-data search. For accessing the possibly differing solution paths from the CV, a newprop_solution_terms_cv()
(or similarly named) accessor function for<vsel_object_with_cv>$pct_solution_terms_cv
could be created. Its documentation could say something like "For each model sizesize
(rows) and each solution termsolterm
(columns), the returned matrix contains the proportion of the CV folds which havesolterm
at positionsize
of the solution terms.". Alternatively, instead of creating a newprop_solution_terms_cv()
accessor function, the existingsolution_terms()
accessor function could be extended.The text was updated successfully, but these errors were encountered: