Miscellaneous modifications and fixes #223

fweber144 · 2021-10-07T16:01:00Z

This PR performs some minor enhancements, including some bug fixes. See the commit messages for details.

…`solution_terms.projection()` method.

…or messages. Use the new `solution_terms()` S3 generic.

…l()`. Make the defaults clear from the function signature. This also fixes a bug: `is.null(cv_search)` was not the default anymore (neither in `project()`, nor in `varsel()`, nor in `cv_varsel()`), especially not for `datafit`s where this condition is used to set `cv_search` to `FALSE`.

…, namely: ```r library(rstanarm) data(bball1970) options(mc.cores = parallel::detectCores(logical = FALSE)) myfit <- stan_glmer( cbind(Hits, AB - Hits) ~ (1 | Player), data = bball1970, family = binomial("logit"), prior_intercept = normal(-1, 1), seed = 101 ) library(projpred) myprj <- project(myfit, solution_terms = c("(1 | Player)"), nclusters = 2) bball1970_new <- data.frame(Player = "newplayer1", Hits = 11, AB = 45) mylinpred <- proj_linpred(myprj, newdata = bball1970_new) ```

…it"`s.

…arsel()` and then `cv_varsel()` should pass all these arguments to `parse_args_cv_varsel()`.

…modified. Make the defaults clear from the function signature.

otherwise, `cv_search = FALSE` causes the error "Please provide an `object` of class "vsel" or use `cv_search = TRUE`." ).

…y (which doesn't seem to have had any consequences up to now, though): Argument `obs` of `k_refmodel$fetch_data()` referred to the original dataset whereas argument `obs` of `k_refmodel$family$mu_fun()` referred to the k-fold dataset. This also affected how `newdata` was subsetted.

becomes `$fetch_data()`) since this was only used in `$mu_fun()` and can be solved more easily there.

…ric(ynew)` (since `inherits(1L, "numeric")` is `FALSE` but `is.numeric(1L)` is `TRUE` ).

…l.R (the more appropriate place).

… and notify the user of the possible solution terms.

… are as many of them as there are clusters of draws), not just of the first one.

… (`offsetnew` and `weightsnew` may be numeric vectors, for example).

…med. This is important for `extract_model_data()` functions which have `...` instead of explicit `wrhs` and `orhs` arguments (see, e.g., brms version <= 2.16.1).

…)` (and if necessary, adapt them). This is important for `extract_model_data()` functions which return weights and offsets of length 0 (see, e.g., brms version <= 2.16.1).

…cal vector`.

`.get_sub_summaries()` is not needed.

…onsistency with `cv_varsel()` and to avoid confusion with `p_sel`, `p_pred`, and `p_ref`).

fweber144 · 2021-10-07T18:14:10Z

I forgot to mention the following:

as.matrix.projection() doesn't seem to be possible for datafits (where project() has been applied to an object of class vsel). I don't know if this is on purpose. For this PR, I was assuming yes, so we now throw an appropriate error message.
project() doesn't seem to be possible for datafits directly. I don't know if this is on purpose. For this PR, I was assuming yes, so we now throw an appropriate error message.
For this PR, I was assuming that for datafits, the default of argument cv_search in varsel() and cv_varsel() should be FALSE because project() behaves that way and because the outdated is.null(cv_search) condition in parse_args_varsel() hinted at that behavior.

* can we run for 2 hours? * minor fix * average dis? * force numeric * drop dimension if vector * fix lpd computation #105 * speed up dependencies? * does this work? * fake commit * Update NEWS file. (#111) * expect length to be n * bump version * error parsing * Fix #106 (#112) Fix output dimensions of lpd output when only 1 observation is present. * Update NEWS file and bump version number (#113) * Add missing `else` cases for `weights` and `offset` in `.extract_model_data()`. (#114) * Fix formula parsing with multidimensional interactions (#103) For now, `t2(x, z)` can be added without `x` or `z`. * fix #115 with a single predictor solution_terms_cv_ch is a vector, ensure it's always a matrix * Update/fix NEWS file. (#117) * update NEWS * Fixes for argument `nterms` of `proj_helper()` (#110) * In proj_helper(): `list(...)$nterms` is always `NULL` as there is a predefined argument `nterms`. * Fix a bug for calls like ```r mylinpred <- proj_linpred(project(myfit, solution_terms = c("V3")), nterms = 2) ``` which previously raised an error (but should work without raising an error) and for calls like ```r mylinpred <- proj_linpred(project(myfit, solution_terms = c("V3")), nterms = 1) ``` which previously did not raise an error (but should raise an error). * Fix what is probably a precedence typo, see `?Syntax`. * In proj_helper(): Use `count_terms_chosen(x$solution_terms)` instead of `(length(x$solution_terms) + 1)` to be better prepared for no-intercept models. * Argument `nterms` in proj_helper() is passed to project(), so it doesn't match the output of count_terms_chosen() (since count_terms_chosen() counts the intercept but argument `nterms` of project() does not). Therefore, add +1 to `nterms` (if specified). This also affects the behavior of `nterms` in the case `inherits(object, "projection") || (length(object) > 0 && inherits(object[[1]], "projection"))`, i.e. when `nterms` is used to filter `object`. * Do not use `nterms` in the case `inherits(object, "projection") || (length(object) > 0 && inherits(object[[1]], "projection"))`, i.e. do not use `nterms` to filter `object` if it results from project() (filtering wasn't possible before commit 8909d38e372f5945364d6083d33a02856555cf32 anyway). * Fixup commit cd04242196ca6d915db9b8c79bc5ba1f3939577f. * Update/fix NEWS file and add contributor (#118) * fix nclusters[_pred] handling!! only use clusters for less than 20 draws * don't use 5 clusters for kfold predictions * make sure we are using rstanarm * Fix a `fetch_submodel()` for intercept-only submodels (#119) * Fix fetch_submodel() for intercept-only "datafits". * Fix a unit test according to the change by commit 9264ad830b71385e77bb5e153b44237ed1f70951. * Update NEWS file. * Miscellaneous fixes and improvements (#92) * Increase development-specific version number. * Remove "*.Rproj" from Git ignore list. * Add RStudio's default .Rproj file. * projpred.Rproj: Add "StripTrailingWhitespace: Yes". * projpred.Rproj: Add "AutoAppendNewline: Yes". * Add "/src/projpred.dll" to Git ignore list. * In the .Rproj file: Set "PackageRoxygenize" field to common values. * NEWS.md: Remove unnecessary whitespace. * Docs for varsel(): Correct typos and improve formatting (and reflow the roxygen2 comments and then re-document). * Docs for varsel(): Correct a possible typo in the documentation of argument `object`. * Docs for varsel(): Remove defaults for `ndraws` and `ndraws_pred` as they could be misunderstood by the user (in fact, the default for `ndraws` and `ndraws_pred` is to be ignored since `nclusters` and `nclusters_pred` are specified by default). Correct the default for `nclusters` (20 instead of 10). Correct the default for `nclusters_pred` mentioned in a comment in the code (400 instead of 5). NEWS.md: Incorporate the increase of the defaults for arguments `nclusters` and `nclusters_pred` (already at earlier versions, 2.0.3 and 2.0.4). * Docs for varsel(): Re-document. * Docs for project(): Correct a possible typo for argument `nterms`, namely replaced "If a list, [...]" by "If a numeric vector, [...]". Also reflow the corresponding roxygen2 comment and re-document. * Docs for project(): Unify two `@param regul` items (one of which had slipped into the previous `@param` item). * Slightly improve an error message in project(). * Comment `fit_glm_callback()` (does not seem to be used). * Use version number 2.0.4.9000 instead of 2.0.4.9001 (to avoid too fine-grained version numbers). * Remove an unused ellipsis causing the message `** byte-compile and prepare package for lazy loading\n Note: ... may be used in an incorrect context` upon package installation. * Simplify/generalize documentation of argument `object` of proj_linpred() and proj_predict() and a related error message. * Re-document. * Re-document. * After the merge with 'develop': Re-apply fixes and formatting in the documentation of varsel.R * Correct documentation of argument `seed` of varsel.refmodel(). * Fix unit tests (to match commit 4dfb4b74979ee4f1ff78ef931bd9cf57c074d244). * Improve some test titles. * Use backticks consistently to indicate code. * Fix typo in the NEWS file. * Improve an error message in proj_helper(). * Slightly improve documentation for argument `integrated` of proj_linpred(). * Correct documentation for proj_linpred() and proj_predict(): Only proj_linpred() supports output element `lpd`. Also slightly improve the formatting. * Update NEWS file. * Avoid the `no visible binding for global variable` note in R CMD check (see <https://dplyr.tidyverse.org/articles/programming.html#eliminating-r-cmd-check-notes>). * Fix a syntax error in a commented part of the vignettes' YAML headers. * Function where() is not exported by package tidyselect, so redefine it in misc.R to avoid a note in R CMD check which would occur for usage of tidyselect:::where(). * Remove a mistakable comment in compute_lpd(). * Slightly improve a comment in proj_linpred(). * Improve an error message in the unit tests. * Simplify definition of `ynew` in proj_linpred(). * Simplify calls to extract_model_data(). * Simplify the div_minimizer() call in project_submodel() (addresses GitHub issue #88). * Remove an unnecessary `form <- refmodel$formula` definition. * Slightly improve an error messsage for `nterms` in proj_helper(). * Use as.vector() instead of as.numeric() (for consistency with the `pred` definition above). * Revert the creation of a new object `predictions` (introduced by commit 26d83e63c8a7f28bff44a9be002ed72c65ed273f) which is not necessary anymore since commit be0c1dc5639bbf5e00f004f1ce6ded904de208c4. * Remove argument `nterms` in proj_linpred(), proj_predict(), and proj_helper() since it may simply be passed via the ellipsis. * Improve documentation of the ellipsis for proj_linpred() and proj_predict(). * Rename the local function proj_predict() to proj_predict_local() to distinguish it from the exported function proj_predict(). * After commit 32a1a53285eb0127cb29503a87fc07674269ecbf, there is no need anymore for proj_helper()'s internal objects `nterms` and `projected_sizes`. * Use `projs` and `proj` consistently in `proj_helper()` (and thereby save the memory for `proj`). * Improve the documentation of the return value of proj_linpred() and proj_predict(). * NEWS file: Create a new heading for the current version. * Revert "fake commit" This reverts commit 1388efb27bed17ca1158cab48d5a82561474cd8d. * Remove packages MASS and optimx from Imports field to avoid the R CMD check note ``` Namespaces in Imports field not imported from: ‘MASS’ ‘optimx’ All declared Imports should be used. ``` * Rename the local functions passed by proj_linpred() and proj_predict() to proj_helper() (so that they now contain a midfix "_linpred_" and "_predict_", respectively). * Move proj_linpred_aux() and proj_predict_aux() out of proj_linpred() and proj_predict() (respectively) to make debugging easier. * Replace `proj_predict_local` by `onesub_fun` to avoid confusion with proj_predict(). * Be explicit about the origin of the objects used in onesub_fun() (for better readability, but also to avoid the R CMD check note `no visible binding for global variable`). * Fixup last commit f1c1545c900403001c6f015d3550da2505448c04 (" Be explicit about the origin of the objects used in onesub_fun() (for better readability, but also to avoid the R CMD check note `no visible binding for global variable`). "): Also handle proj_predict()'s argument `ndraws`. * Set the seed for proj_predict() inside of proj_predict(), not inside of proj_helper() (to show explicitly that proj_linpred() does not require a PRNG). * Avoid a clash between proj_predict()'s former arguments `ndraws` and `seed` with project()'s arguments `ndraws` and `seed`. * Fix test_misc.R (due to commit d9fea25a649f1290643415e44807ccceec1da20e). * Fix test_proj_pred.R (due to commit d9fea25a649f1290643415e44807ccceec1da20e). * Fix a test name. * Fix a NEWS entry (see <https://discourse.mc-stan.org/t/projpred-behavior-of-argument-search-terms/19132>). * Fix the documentation of argument `search_terms` in varsel() and cv_varsel() since `search_terms` is usually a character vector, not a list (it seems like `search_terms` can also be a list, but doesn't seem to be the usual case, see e.g. <https://discourse.mc-stan.org/t/projpred-behavior-of-argument-search-terms/19132> or <https://discourse.mc-stan.org/t/projection-predictive-feature-selection-for-multilevel-phylogenetic-models/21402>). Also mention that the intercept needs to be included explicitly. Use the same documentation of argument `search_terms` in varsel() and cv_varsel(). * Replace `source(file.path("helpers", "SW.R"))` by `source(testthat::test_path("helpers", "SW.R"))` to make debugging of tests easier. * Move the helper functions proj_linpred_aux() and proj_predict_aux() *below* their respective main function (proj_linpred() and proj_predict(), respectively). * Incorporate the fix from commit 387d73a1706ad07e7b84922ef0322c26bad03bc2 of branch fweber144:fix_nterms_proj_helper (see PR #110). * Add argument `nterms_filter` to proj_linpred() and proj_predict(). * Fixup commit c1f4c17d571199cb353dae451af58c99a2055015 for the case `inherits(object, "projection")`. * Replace `nterms_filter` by `filter_nterms` to avoid a clash with argument `nterms` which may be passed via `...`. * Fix a NEWS entry. * Fix a test name. * Improve the documentation for proj_linpred()'s and proj_predict()'s return value. * Fix a typo in the documentation of argument `offsetnew`. * Re-document. * Fix a comment. * Save memory by avoiding a separate definition of `p_sel <- search_path$p_sel`. * Break lines in long paste()-d test names. * Add another line break in a test name. * Re-indent the test files. * Reflow the test files to margin width 80. * Re-indent the R files. * Re-document. * Reflow the R files to margin width 80. * Re-document. * Apply `min(NCOL(refmodel$mu), [...])` also to user-specified `ndraws`, `nclusters`, `ndraws_pred`, and `nclusters_pred`. Remove unnecessary `ndraws <- ` assignments. * Pre-specify the defaults for `ndraws` and `ndraws_pred` in the function arguments. Update the documentation. * Use the same behavior for arguments `ndraws` and `nclusters` in project() as in varsel() and cv_varsel(). * Minor wording improvement in the documentation. * Update NEWS file. * Update NEWS file. * Fix the position (projpred version) of some NEWS entries added by PR #92. (#120) * refmodel: Remove brmsfit-specific code (#122) * Fix outdated tests (`ndraws` and `nclusters` are now internally cut off at `NCOL(refmodel$mu)`, so there is no error thrown by project() anymore). (#121) * Add checks for required arguments passed via the ellipsis. (#123) * Disallow no-intercept models, improvements for efficiency (#124) * Avoid calling pseudo_data() if not necessary. * Remove an unnecessary assignment. * Remove unused code. * Throw an error for a reference model without an intercept (see PR #100 and in particular <https://github.com/stan-dev/projpred/pull/100#issuecomment-809464069>). * Insert GitHub PR number. * hotfix: explicitly check formula object * hotfix: fix readme example * Fix `ndraws` remnants (#128) * Fix a bug caused by the fact that `ndraws` must be non-NULL now (since commit 16d5afd40908207ebdc6da32e3b6671bf6c581b5). * Fix a warning message. * Fix output of print.vselsummary() (either `ndraws` or `nclusters` is used, but not both) and take into account that `ndraws` must be non-NULL now (since commit 16d5afd40908207ebdc6da32e3b6671bf6c581b5). * Fix PR #128 (#129) * Suppress an expected warning when using as.matrix.projection() for a clustered projection. * Fix a bug introduced by PR #128. * vignettes/quickstart_glmm.Rmd: Remove the term `(1 | phylo)` from the Poisson model (since we have (#133) `identical(data_pois$phylo, paste0("sp_", data_pois$obs))`, so including both `obs` and `phylo` doesn't make sense). * Change arguments `size_sub` and `seed_sub` (#135) * Apply `size_sub` only to clustered projection. For non-clustered projection, all draws are used. * Update documentation. * Rename argument `seed_sub` to `seed_ppd`. * Update unit tests to the new handling of `size_sub`. * Rename argument `size_sub` to `nclusters_resample`. * Update NEWS.md * Use brms's naming convention also for submodels of class `"subfit"` (#132) * Resolve an S3 generic/method inconsistency (issued by R CMD check). * For submodels of class "subfit", make the column names of as.matrix.projection()'s output matrix consistent with other classes of submodels (by fixing coef.subfit()). * Update NEWS.md * Update unit tests in test_as_matrix.R to also reflect the use of brms's naming convention. * NEWS.md: Insert GitHub PR number #132. * Minor cleaning. * Move the project() calls out of SW(). * Avoid varsel() in the unit tests for as.matrix.projection() (for the sake of speed). * Re-add SW() for the binomial reference model (with > 1 trials), see GitHub issue #136. * Minor improvements for as.matrix.projection()'s unit tests. * Refactor the unit tests for as.matrix.projection() so they may be extended more easily. * as.matrix.projection(): Also test the case of a single (clustered) draw. * Reduce the number of draws to be tested (for the sake of speed). * Add unit tests for as.matrix.projection() in case of Gaussian and binomial reference models with multilevel but without additive terms. * Fix `as.matrix.projection()` for 1 (clustered) draw (#130) * Simplify as.matrix.projection(). * Fix a bug (hard-coded use of as.matrix.lm() for the case of a single draw after projection). * Add `@keywords internal` to as.matrix.projection(). * Export methods which were previously unexported (necessary to fix the recent commit "Simplify as.matrix.projection()."). * Further simplify as.matrix.projection(). * Remove the as.matrix.noquote() method (doesn't seem to be used and overwrites as.matrix.noquote() from the "base" package). * Resolve an S3 generic/method inconsistency (NOTEd by R CMD check). * Do not import the "tidyselect" package (already imported by "dplyr" and causing a NOTE in R CMD check since tidyselect:::where() is imported only indirectly). * Add as.matrix.glmerMod() which is currently redirecting to as.matrix.lmerMod(). * Add a comment to `%:::%`. * Use `%:::%` consistently. * Move `%:::%` from R/gamm4.R to R/misc.R * Throw an error for unavailable as.matrix.<class_name>() methods. * Add missing ellipses. * Add as.matrix.gamm4(). * Update NEWS.md * Fix typo in the documentation. * Fix `plot.vsel()` if there is just the intercept-only submodel (#138) * Fix a bug for `nterms_max` in plot.vsel() if there is just the intercept-only submodel. * Add a more informative error message in plot.vsel() if there is just the intercept-only submodel. * Update NEWS.md * Fix error for L1 search applied to an empty (i.e. intercept-only) reference model (#139) * Fix a bug for L1 search applied to an empty (i.e. intercept-only) reference model. * Update NEWS.md * Minor modifications (mostly no-op) (#140) * Fix a comment. * Throw a proper error message for nonsupported families. * Fix documentation of project()'s ellipsis. * Fix an error message. * Fix a bug in compute_lpd() (wrong margin taken for averaging across the posterior draws). But this bug shouldn't be relevant as the corresponding case (`integrated && !is.null(dim(lpd))`) should not occur. * compute_lpd(): Throw an error in the case `integrated && !is.null(dim(lpd))` as it should not occur. * Replace a hard-coded method by its generic. * Remove a line which is unnecessary due to replace_intercept_name(), replace_population_names() and the avoidance of "alpha". * Remove README.md from .Rbuildignore (to show the README also on the CRAN website). * Add warnings for argument `solution_terms` of project(). * Fix documentation. * Replace remaining occurrences of numeric `solution_terms` by character. * Update NEWS.md * Remove unused functions. * Improve documentation for `nterms_max` (state explicitly that it doesn't count the intercept). * Improve documentation for `nterms` (state explicitly that it doesn't count the intercept). * Move the common part of `varsel()` and `cv_varsel()`'s documentation out in a single file (a roxygen "template") and refer to that. * Fix a bug for L1 search applied to an empty (i.e. intercept-only) reference model. * Update NEWS.md * Add the `man-roxygen` folder to .Rbuildignore * Unify the documentation for argument `verbose`. * Fix an error message (previously giving `argument is missing, with no default`). * Fix documentation of `nterms_max`. * Improve formatting of some documentation. * Fix documentation of `method`. * Improve documentation of `method`. * Minor wording improvements in the documentation. * Fix strange list names. * Update NEWS.md * Clarify docs for `cvfun`/`cvfits` (`cvfits` takes precedence over `cvfun`). * Clarify docs for `cvfits` (mentioning attributes `"K"` and `"folds"`). * Remove a trailing comma. * Clarify docs for `cvfits` (mentioning sublist `"fits"`). * Minor wording improvement in the documentation of argument `object`. * Minor wording improvement in the documentation. * Improve documentation of argument `object` (in the "get-refmodel" documentation tag). * Improve documentation of argument `formula` (in the "get-refmodel" documentation tag). * Further improve documentation of argument `formula` (in the "get-refmodel" documentation tag). * Improve title and description in the "get-refmodel" documentation tag. * Improve documentation of several arguments in the "get-refmodel" documentation tag. * Improve documentation of several arguments in the "get-refmodel" documentation tag. * Add an error message for `is.null(extract_model_data)`. * Harmonize error messages. * Remove argument `fetch_data` of `get_refmodel.default()` as this argument seems to be a remnant of older code. * Remove argument `offset` of `get_refmodel.default()` as this argument seems to be a remnant of older code. * Set defaults for `get_refmodel.default()`'s arguments where possible. * Remove argument `wobs` of `get_refmodel.default()` as this argument seems to be a remnant of older code. * Update NEWS.md * NEWS.md: Insert GitHub PR number. * Fix PR #135 (#141) * Revert the order of `proj_linpred()`'s and `proj_predict()`'s arguments as it is on branch `master`. * Rename argument `nclusters_resample` to `nresample_clusters` to avoid a clash in partial matching when supplying `project()`'s argument `nclusters` via `...`. * Rename argument `seed_ppd` to `ppd_seed` to avoid a clash in partial matching when supplying `project()`'s argument `seed` via `...`. * Fix `extract_model_data()` in `get_refmodel.stanreg()` (#142) * Fix a bug in get_refmodel.stanreg(). * Update NEWS.md * Enhancements (and fixes) for `proj_linpred()` and `proj_predict()` (#143) * `test_datafit.R`: Improve indentation and remove a trailing comma. * Add brackets to if() to avoid a strange error in RStudio. * Avoid an unnecessary repetition of proj_linpred() (otherwise, `predd_list` would be unused). * `test_proj_pred.R`: Generalize the value supplied to `info`, making the local objects `i_inf` obsolete. * Improve indentation. * Generalize the iterator for `i` (always `seq_len(nfams)` with `nfams <- length(fam_nms)`). * Improve/correct test names and add test "output of proj_linpred is sensible with \"refmodel\" object as input". * Iterate over names, not indices (therefore also remove `nfams`). * Add expectations. * Use integer instead of numeric (to facilitate comparisons). * Speed up the tests from `test_proj_pred.R` by specifying (where necessary) `nclusters` and `nclusters_pred` in calls to varsel(), project(), proj_linpred(), and proj_predict(). * Fix an expectation after using `nclusters` and `nclusters_pred` more extensively. * Avoid the soft-deprecated argument `info` of `testthat::expect_equal()` and friends by following the advice from `?testthat::quasi_label` (see also `?"equality-expectations"`). * Remove old argument `intercept`. * Add object `nresample_clusters_tst` and fix an incorrect expectation. * Move projpred-related code out of SW() where warnings are not expected. * Remove unnecessary specifications of `nresample_clusters`. * Move some code for better readability. * Argument `nterms_max` doesn't count the intercept, so reduce to `nterms`. * Improve/correct test names and add test "output of proj_predict is sensible with \"refmodel\" object as input". * Use a consistent order of arguments `transform` and `integrated`. * Efficiency improvement in proj_linpred_aux(): Use `crossprod(x, y)` for `t(x) %*% y`. * Let `proj_linpred()` always return a S x N matrix, no matter the value of `integrated`. * Minor wording improvements in the documentation of `proj_linpred()`. * Add dim() tests for proj_linpred() and improve some of the existing tests. * Add test "proj_linpred's output is also correct in edge cases". * Remove duplicated test/expectation. * Extend the output structure (dimensionality) tests for `proj_predict()`. * Use a consistent test name structure. * Add test "proj_predict: output structure is also correct in edge cases". * Remove unnecessary uses of `y`. * Also test `ndraws` instead of `nclusters`. * `proj_linpred_aux()`: Do not transpose `pred` internally; only at the very end. This is needed for families where the link function and the inverse link function are no simple scalar functions. Such families may arise, e.g., in the augmented-data projection. * Remove remnant argument `integrated´ from `compute_lpd()`. * Average the LPD (across the projected posterior draws) instead of taking the LPD at the averaged linear predictors. * Simplify `log_weighted_mean_exp()`. * Docs: Use math mode. * Update NEWS.md * Enhance documentation. * Use `testthat::expect_named()`. * Fix an example (remnant occurrence of argument `nv`). * Enhance formatting and comments in the example for `proj_linpred()` and `proj_predict()`. * Insert GitHub PR number in NEWS.md * Offer the possibility to run `proj_linpred()` without supplying the response in `newdata`. Before, this reprex: ```r library(rstanarm) data(bball1970) options(mc.cores = parallel::detectCores(logical = FALSE)) myfit <- stan_glm( cbind(Hits, AB - Hits) ~ 1, data = bball1970, family = binomial("logit"), prior_intercept = normal(-1, 1), seed = 101 ) library(projpred) set.seed(36467) i_resampled <- sample.int(nrow(bball1970)) stopifnot(identical(sort(i_resampled), seq_len(nrow(bball1970)))) mynewdata <- bball1970[i_resampled, setdiff(names(bball1970), "Hits"), drop = FALSE] mypl <- proj_linpred(myfit, solution_terms = character(), nclusters = 2, newdata = mynewdata) ``` threw the following error: ``` Error in eval(formula[[2]], data, environment(formula)) : object 'Hits' not found ``` Now, it returns `NULL` for output element `lpd`. * Update NEWS.md * `proj-pred` docs: Use `\code{\link{<function_name>}}`. * `proj-pred` docs: Enhance the documentation of arguments `newdata`, `offsetnew`, `weightsnew`, `transform`, and `integrated`. * Fix documentation for `ppd_seed`. * Fix documentation of `weightsnew`. * `proj-pred` docs: Use "only." instead of "only:" to avoid ambiguities. * `proj-pred` docs: Minor improvements. * Re-document. * Documentation of `ppd_seed`: Clarify the meaning of the abbreviation "PPD". * Replace `ppd_seed` by `.seed`. * Minor modifications, mostly documentation (#153) * Add argument `extract_y` also to `extract_model_data()` in `get_refmodel.default()`. * Let the user pass arguments via the ellipsis (`...`) from `get_refmodel.default()` to `init_refmodel()`. Enhance documentation in `refmodel.R`. Minor cleaning in `refmodel.R`. * Re-document. * Minor cleaning in the docs of `project.R`. * Fix documentation (KL output of `project()`). * Update and fix documentation (output of `project()`). * Re-indent documentation (output of `project()`). * Re-document. * Minor enhancements in the documentation of `project()`. * Remove `glm_forward_c()` and `glm_forward()` (unused). * Improve documentation of argument `seed` and perform minor cleaning. * Re-document. * Add a missing ellipsis. * `linear_multilevel_mle()`: Remove unused argument `regul`. * Use a roxygen template for arguments `newdata`, `offsetnew`, `weightsnew`. * Enhance a comment. * Don't apply the inverse-link within `ref_predfun_datafit()`. * In `ref_predfun_datafit()`, the case `!is.null(fit)` should not occur (since a `"refmodel"` of additional class `"datafit"` is only possible if `object` (which is later used as `fit`) is `NULL`). * Simplify `ref_predfun_datafit()`. * Document that argument `ref_predfun` is degenerate ( i.e., ignored and the internal placeholder always returns `NA`) if argument `object` is `NULL` (i.e., for a `"refmodel"` of additional class `"datafit"`). * Minor cleaning in `varsel.R`. * Minor cleaning in `cv_varsel.R`. * Minor cleaning in `methods.R`. * Enhance documentation for `project()`'s argument `nterms`. * Remove `fit_glm_callback()` (commented). * Minor cleaning in `divergence_minimizers.R`. * `parse_args_varsel()`: Improve readability of `nterms_max`-related code. * `parse_args_varsel()`: Improve readability of `nterms_max`-related and `intercept`-related code. * `NEWS.md`: Shorten commit hashes to 7 characters. * Update `NEWS.md`. * fix #161: avoid Inf nclusters_pred when both NULL * fix #162: offsetnew not always used in predict.refmodel * fix #164: suggest_size consistently shouldn't include the intercept * fix #160: wrong handling of cvfits$omitted * fix #159: don't include intercept in solution_terms * fix #158: don't add main effects to match solution_terms * fix #157: provide informative error when projecting fit object with cv_search = FALSE * fix #144: refer to lme4::ranef * fix #146: error when expanding the formula with a naked group effect * Fix the fix for issue #159 (no. 2) (#166) * Fix the fix for issue #159: Consistently return type `character`. * Make sure `search_forward([...])$solution_terms` has type `character`. (This commit should only be relevant for models without an intercept.) * Simplify `fetch_submodel()` for the case `nterms == 0`. * Consistently use `head()`. * Fix an issue with `filter_nterms` caused by the fix for issue #159. * Generalize the fix for the `filter_nterms` issue caused by the fix for issue #159 and use this generalized fix when constructing the names for `proj_helper()`'s output object. * fix #169: set 0 relative penalty for the intercept * fix #167: add offset slot to d_test * fix #146: \\| * Fix `proj_predfun()` for GLMMs (#174) * `linear_multilevel_proj_predfun()`: Ensure a 1-column matrix as output if there is only 1 projected (clustered) draw, consistently with `linear_proj_predfun()` and `additive_proj_predfun()`. * `additive_proj_predfun()`: Remove a now obsolete `as.matrix()` call. * fix #171: l1 search is only supported for GLMs * Fix tests (#170) * Adopt `test_as_matrix.R` to the fix of issue #158. * Adopt `test_project.R` to the fix of issue #159. * Adopt `test_varsel.R` to the fix of issue #167. * fix #172: pass weigts[inds] * Compared to v1.0.6, Rcpp v1.0.7 introduced a minor change in the automatically created `src/RcppExports.cpp` file. (#175) * fix #167: add d_test offset to relevant places * Fix `proj_predfun()` for `"datafit"`s (where the case `is.null(beta)` is possible, so `predict.subfit()` needs to make sure to return a matrix in that case). (#177) * Fix the names of `summary.vsel()$selection` for `"vsel"` objects created by `varsel()` (where `is.null(cv_suffix)`). (#179) * fix search forward when search_terms are not consecutive in size * update rcpp * minor fix undefined variable * Fix PR #142 (#184) * Re-apply Rcpp PR #175 (i.e., revert commit 2529121133c1acf1b8a8905e611c79480c39bcf7). (#187) * Fix commit e529ec1080e45de2b27aaaa6efde8ff5285e0f90 ("minor fix undefined variable"). (#188) * Collection of minor changes (#189) * Avoid `R CMD check` NOTE "no visible global function definition for ‘tail’". * Add package optimx to `Suggests:` field (may be needed when running tests via `R CMD check`). * Comment `get_replace_response()` (seems to be unused). * Revert/adjust an incorrect comment modification introduced by PR #153. * `ref_predfun_datafit()`: Remove unused argument `offset`. * `print.vselsummary()`: Remove object `nterms_max` (and replace its later occurrence) to make it clear that `print.vselsummary()` has no purpose other than printing. * Minor cleaning. * Define `hf()` (from `kfold_varsel()`) globally. * Comment `.loo_subsample()` (seems to be unused). * Fix a typo in a comment. * Consistently use `inherits()`. * `project()`: Improve readability of a modification of `cv_search`. * Remove an unnecessary comment. * `parse_args_cv_varsel()`: Remove unnecessary redefinitions of `nclusters` and `nclusters_pred`. * `.get_sub_summaries()`: Remove unused `solution_terms` object. * Enhance a comment. * Replace the abbreviation `object$fam$` by `object$family$` to avoid ambiguities. * Remove unused object `w` in `predict.subfit()`. * After line `sub_fit <- refmodel$div_minimizer(`: Swap order of arguments `family` and `weights` (for consistency with the definitions of the divergence minimizers). * Package **rngtools** is not necessary (use `.Random.seed` as is already done at similar places). * Minor cleaning. * Fix commit e68c6fd6d19d68e9c121a6a94b740517bc085d3d. (#192) * Fix issue #185. (#193) * `select_possible_terms_size()`: Avoid that variables in an interaction term are ordered differently than in the reference model. (#191) * Fix `"datafit"`s with binomial family and two-column response (#194) * Improve a comment concerning `.get_standard_y()`. * `.get_standard_y()`: Add an error for a two-column response and a non-binomial family. Improve an existing error message. * Fix a bug introduced by PR #193 when calculating `mu` for `"datafit"`s with a binomial family and a two-column response. Also fix another (potential) bug (in `mu <- y / weights`, `weights` could have length zero). * Fix a bug in `.get_standard_y()` (the second column of a two-column response is the number of failures, not the number of trials). * Use `unname()` in `.get_standard_y()` consistently. * Use `+` instead of `rowSums()` to avoid converting integer to numeric. * Fix offset issues (#196) * Add a workaround for rstanarm issues stan-dev/rstanarm#541 and stan-dev/rstanarm#542. * Since `posterior_linpred()` is supposed to include the offsets in its result, subtract them in `ref_predfun()`. * Improve comments. * Adopt the tests to PR #196. (#197) * Enhance/fix `fit_glmer_callback()` (#201) * Improve indentation. * `fit_glmer_callback()`: Pass ellipsis (`...`) to `lme4::lmer()` and `lme4::glmer()`. * Avoid the `preprocess_data()` call in `fit_glmer_callback()` since: - the scaling of predictors changes their regression coefficients; - `preprocess_data()` is not adopted to categorical predictors; - `preprocess_data()`'s implementation for interactions is probably incorrect. * `fit_glmer_callback()`: Handle the error "pwrssUpdate did not converge in (maxit) iterations". * Fix PR #196 for `"brmsfit"`s. (#203) * Fix observation weights (#198) * Fix observation weights for L1 search and remove other code which is obsolete. * Don't normalize observation weights. * Fix a bug in `proj_linpred()` (for `integrated = TRUE` and `is.null(ynew)`). (#209) * `print.vselsummary()`: Don't print the formula environment. (#214) * Refactored tests (#217) * Clean-up code related to reference models (#219) * Remove argument `folds` (#220) * `init_refmodel()`: Remove argument `folds` (effectively unused). * Don't run the brms tests for brms version <= 2.16.1 (see the warning message added here for details). * Fixes for the binomial family (#221) * Improve warnings and errors for the binomial family. * Fix issue #136. * Adapt the tests. * Add test "binomial family with 1-column response and weights which are not all ones warns" * Fix in `print.vsel()` (argument `digits` ignored) (#222) * Miscellaneous modifications and fixes (#223) * Use the S3 system for `solution_terms()`. This also allows to have a `solution_terms.projection()` method. * Add warnings to `project()` when `cv_search` is modified. Improve error messages. Use the new `solution_terms()` S3 generic. * Adapt the tests. * Add a warning when `cv_search` is modified in `varsel()` or `cv_varsel()`. Make the defaults clear from the function signature. This also fixes a bug: `is.null(cv_search)` was not the default anymore (neither in `project()`, nor in `varsel()`, nor in `cv_varsel()`), especially not for `datafit`s where this condition is used to set `cv_search` to `FALSE`. * Fix a bug in `predict_multilevel_callback()` (concerning inheritance), namely: ```r library(rstanarm) data(bball1970) options(mc.cores = parallel::detectCores(logical = FALSE)) myfit <- stan_glmer( cbind(Hits, AB - Hits) ~ (1 | Player), data = bball1970, family = binomial("logit"), prior_intercept = normal(-1, 1), seed = 101 ) library(projpred) myprj <- project(myfit, solution_terms = c("(1 | Player)"), nclusters = 2) bball1970_new <- data.frame(Player = "newplayer1", Hits = 11, AB = 45) mylinpred <- proj_linpred(myprj, newdata = bball1970_new) ``` * Fix a typo. * Improve readability in `.loo_subsample_pps()`. * Add an error to `as.matrix.projection()` for objects based on `"datafit"`s. * Add an error to `project()` for `"datafit"` objects. * Adapt the tests. * Remove defaults from `parse_args_cv_varsel()` (should be set in `cv_varsel()` and then `cv_varsel()` should pass all these arguments to `parse_args_cv_varsel()`. * Clean up `parse_args_cv_varsel()`. Add a warning when `cv_method` is modified. Make the defaults clear from the function signature. * Adapt the tests. * `project()`: Move the error for `"datafit"`s to the beginning ( otherwise, `cv_search = FALSE` causes the error "Please provide an `object` of class "vsel" or use `cv_search = TRUE`." ). * Simplify `.init_kfold_refmodel()` and thereby resolve an inconsistency (which doesn't seem to have had any consequences up to now, though): Argument `obs` of `k_refmodel$fetch_data()` referred to the original dataset whereas argument `obs` of `k_refmodel$family$mu_fun()` referred to the k-fold dataset. This also affected how `newdata` was subsetted. * Minor cleaning in `.get_sub_summaries()`. * Use `as.data.frame()` in `fetch_data()`. * Remove argument `newdata` of `fetch_data_wrapper()` (which becomes `$fetch_data()`) since this was only used in `$mu_fun()` and can be solved more easily there. * `predict.refmodel()`: Replace `inherits(ynew, "numeric")` by `is.numeric(ynew)` (since `inherits(1L, "numeric")` is `FALSE` but `is.numeric(1L)` is `TRUE` ). * Adapt the tests. * `get_refmodel.stanreg()`: Add an input check for `object$data`. * Adapt the tests. * Move `fetch_data()` unchanged from divergence_minimizers.R to refmodel.R (the more appropriate place). * Improve documentation of `div_minimizer`. * Minor cleaning in `misc.R`. * Minor cleaning in `project.R`. * `project()`: Improve the warning message when dropping solution terms and notify the user of the possible solution terms. * Adapt the tests. * `as.matrix.projection()`: Check the class of all submodel fits (there are as many of them as there are clusters of draws), not just of the first one. * `predict.refmodel()`: Set a default of `newdata = NULL`. * `predict.refmodel()`: Simplify input checks and remove incorrect ones (`offsetnew` and `weightsnew` may be numeric vectors, for example). * `predict.refmodel()`: Remove some vertical whitespace. * `proj_helper()`: Minor cleaning. * `cv_varsel.R`: Minor cleaning. * `ref_predfun()` for `"datafit"`s: Minor cleaning. * Minor cleaning in `methods.R`. * Ensure that all calls to `extract_model_data()` have all arguments named. This is important for `extract_model_data()` functions which have `...` instead of explicit `wrhs` and `orhs` arguments (see, e.g., brms version <= 2.16.1). * Check the weights and offsets after all calls to `extract_model_data()` (and if necessary, adapt them). This is important for `extract_model_data()` functions which return weights and offsets of length 0 (see, e.g., brms version <= 2.16.1). * Comment out code related to Pareto k-values (currently not needed). * Minor wording improvement of the error message `ynew must be a numerical vector`. * In the `!validate_search` part of `loo_varsel()`, the call to `.get_sub_summaries()` is not needed. * `varsel()` and `kfold_varsel()`: Rename `p_sub` to `submodels` (for consistency with `cv_varsel()` and to avoid confusion with `p_sel`, `p_pred`, and `p_ref`). * Minor improvement of an error message for `nloo`. * `suggest_size.vsel()`: Replace `pct = 0.0` by `pct = 0`. * Remove argument `weights` (the observation weights) for prediction (not needed). This (#224) fixes issue #163. * Fixes for the predictive variance functions (#225) * Fix a bug in the binomial and Poisson `predvar()`s (currently not throwing an error in `var[, s, drop = FALSE]` (in `linear_mle()`) due to lazy evaluation, but unsafe since non-lazy evaluation might be needed in the future ). * Adapt the tests. * Always ensure a N x S matrix as output when calling `family$predvar()` for multiple (clustered) draws, even for N = 1. * Docs: Activate Markdown syntax (#226) * Turn on Markdown support for **roxygen2**. * Re-document with Markdown support enabled (changes indentation and prepends another `\` to `\` occurrences ). * Fix percentages after enabling Markdown support in the **roxygen2** documentation. * Handle issue stan-dev/rstanarm#546. (#227) * Fix a bug introduced by PR #196 (concerning `predict.refmodel()` in (#228) * Extend tests (#229) * Activate tests for GAMs and GAMMs. * Activate the tests for all families (`fam_nms`). * Activate `run_cvfits_all`. * Activate `run_brms` (based on `NOT_CRAN`). * Activate more `"brmsfit"` tests (more precisely, all `fam_nms`). * Activate more `"brmsfit"` tests (more precisely, all `mod_nms`). * For speed reasons: Revert commit "Activate more `"brmsfit"` tests (more precisely, all `mod_nms`)." * Activate more `"brmsfit"` tests (more precisely, the special formula). * Restructured and modified divergence minimizers (#230) * Minor improvements in the general package documentation (and start using some Markdown syntax). * Call the "callback" fitters in a consistent and safer way and use the ellipsis (`...`) consistently. This is a key step for letting `fit_glmer_callback()` pass `var` and `regul` to `fit_glm_ridge_callback()`. * Pass `regul` via `...`. * Pass the correct `var` from `fit_glmer_callback()` to `fit_glm_ridge_callback()`. Improve documentation of arguments `ref_predfun`, `proj_predfun`, and `div_minimizer`. * `divergence_minimizers.R`: Exclude arguments from `...` which cannot be passed to the final submodel fitters. * Remove `suppressMessages()` and `suppressWarnings()` where those calls might hide important problems to the user (apart from GAMs and GAMMs; they need to be checked separately). * After avoiding `suppressMessages()`, explicitly ignore singularity problems in the **lme4** fitters (by default) to avoid messages for these problems again. * Adapt the tests (now, `regul` can also have an effect for GLMMs since `fit_glmer_callback()` passes `regul` via `...` to `fit_glm_ridge_callback()` ). * `divergence_minimizers.R`: Improve some comments. * `validate_response_formula()`: Always return a list. This makes it possible to simplify the divergence minimizers. * Adapt the tests to the modified `validate_response_formula()` (and use `nobsv` predictive variances in `var_crr` of `test_div_minimizer.R`). * `proj_predfun` functions: Now that `validate_response_formula()` always returns a list (and the divergence minimizers have been adapted accordingly): Always expect a list (and simplify the `proj_predfun` functions accordingly). * `predict_multilevel_callback()` is only used once, so avoid an explicit definition (to keep debugging simple). * Re-introduce `suppressMessages()` and `suppressWarnings()` when calling the **lme4** fitters (because of `Warning in pwrssUpdate([...]): non-integer x = [...]`). * Simplify the divergence minimizers so that `lapply()` is used around the call to `refmodel$div_minimizer()` in `project_submodel()`. This simplifies the introduction of `foreach()` later when parallelizing the projection (even though this simplification is not necessary for the parallelization). * Introduce a separate formula argument excluding multilevel terms for additive models. This is necessary to avoid a breaking change for users with a custom divergence minimizer for additive models. * Fix `test_div_minimizer.R` * Restructure `test_div_minimizer.R` * `proj_predfun` docs: The return value must always be a matrix. * Minor improvement in the docs for the return value of `div_minimizer`. * Improve readability of `split_formula_random_gamm4()`. * Determine the `div_minimizer`s and `proj_predfun`s based on the submodel's formula, not the reference model's formula. Also simplify the definitions and calls to the `fit_<...>_callback()` functions by passing as many arguments as possible via `...`. * `fit_gamm_callback()`: Move `suppressMessages(suppressWarnings())` to the innermost place (to avoid suppressing messages and warnings from code outside of the `gamm4()` call. * Ensure that `fit_glmer_callback()` doesn't get lost in an infinite loop. * Omit the offsets for the "brnll" `fam_nm` to have some scenarios without offsets (apart from the workarounds for rstanarm GAM and GAMM fits). (#231) * Refactor `search_forward()` (#232) * `search_forward()`: Remove unnecessary call to `min()`. * Simplify (and improve readability of) `search_forward()`. * Remove obsolete name `"sub_fit"` (that name was confusing because it was repeated for all model sizes, so the list `<vsel_object>$search_path$sub_fits` had only duplicated names in case of forward search). * Major enhancements in the docs (#233) * Update roxygen2 version. * Consistently use `#'` (not `##'`) for roxygen2 comments. * Speed up the examples in the documentation (and minor other improvements in the docs). * Major enhancements in the documentation. * Move documentation for `extract_model_data` to section "Details". * Enhance documentation (continued). * Fix issue #190 (note in documentation concerning interactions in L1 search). * Improve formatting in the `CITATION` file (make it more consistent). * Enhance documentation (continued). * Add `print()` to examples (otherwise, only output from the last line will be printed). * Add documentation for `as.matrix.projection()`. * Show usage of the `posterior` package (in the `as.matrix.projection()` example). * Argument `regul` is only relevant for submodels which are GLMs. Unify the documentation for `regul` (two occurrences). * Show usage of **bayesplot** in the example for `as.matrix.projection()`. * Enhance documentation (continued; `d_test`). * Use `@inheritParams varsel` instead of `@template args-vsel`. * Use a consistent (and enhanced) documentation for arguments `seed` and `.seed`. * Enhance documentation (continued; fix bugs). * Use roxygen2 syntax also in a non-roxygen2 comment (to avoid inconsistencies in case that comment is turned into a roxygen2 comment). * Use roxygen2 syntax also in an invisible documentation (to avoid inconsistencies in case that documentation is turned on). * In `DESCRIPTION`, put the `Package:` field at the beginning because some search routines require this. * Enhance the `Description:` field in `DESCRIPTION`. * Re-order the `DESCRIPTION` fields (affects the ordering in the PDF manual). * Use Markdown syntax consistently. * Resolve minor inconsistencies in the documentation. * Resolve minor inconsistencies regarding the abbreviation "CV". * Enhance documentation (continued; indentation and other formatting). * Enhance documentation (continued; `man-roxygen/args-newdata.R`). * Enhance documentation (continued; mainly `refmodel-init-get`; this probably also solves issue #156). * Enhance documentation (continued; `predict.refmodel()`). * Enhance documentation (continued; mainly `project()`; this probably also solves a part of issue #154 (namely the documentation) for now). * Document the now consistent behavior of `suggest_size()` (see issue #164). * Minor wording improvements. * Enhance documentation (continued; `projection-linpred-predict`). * Rename documentation tag `projection-linpred-predict` to `pred-projection`. * Enhance documentation (continued; `pred-projection`). * Enhance documentation (continued; mainly `varsel()`). * Enhance documentation (continued; mainly `cv_varsel()`). * Use a consistent language for "response" and "predictors". * Minor improvements in `data.R`. * Minor improvement for `acc`/`pctcorr`. * Minor wording improvement. * Make journal volume numbers bold (as in other R docs). * Add references for PSIS-LOO CV. * Fix a typo in a reference. * Examples: Add a note concerning `chains = 2, iter = 500`. * Add an example for a custom reference model via `init_refmodel()` (which partly solves issue #125) and perform minor improvements in related example code. * `DESCRIPTION`: Add the `Date:` field (important for the `CITATION` file). * Fix a typo in the docs (`as.matrix.projection()` example). * Minor wording improvement in the docs (`as.matrix.projection()` example). * Docs: Remove "[stat<...>]" for arXiv links, according to the CRAN convention. * Docs: Add a comment concerning the relevance of `ndraws<_pred>` and `nclusters<_pred>` in case of `cv_search = FALSE`. * Docs: Use the terms "search step" and "evaluation step" (as a repeating pattern, this should be easier to recognize for readers). * Docs: Avoid the `@details [...] Notes: <list>` construction (looks strange in the final documentation). Unfortunately, `@note` can't be used since otherwise, `cv_varsel()`'s documentation couldn't inherit that section from `varsel()`'s documentation. * There are no Gaussian-only `stats` (at least it isn't implemented this way). Therefore, adapt the documentation. * Docs: Add more details concerning argument `method`. * Minor wording improvements in the general package documentation (help topic `projpred-package`). * References in the docs: Replace "doi:[<...>](<...>)" by "DOI: [<...>](<...>)". In the docs, this does not seem to be necessary, but in the vignette, the former way caused the DOI hyperlinks to be rendered incorrectly. * Docs: Handle section "References" consistently. * Replace "arxiv" by "arXiv". * Docs: Replace "search step" and "evaluation step" by "search part" and "evaluation part", respectively. The reason is a possible confusion of the term "search step" with the steps of a stepwise variable selection, see e.g. `?step`. The second reason is that the term "step" might suggest an EM-like alternation of steps (which is not the case here, since the search is completed first and only after that, the evaluation takes place). * Minor wording improvements in the general package documentation (available at ``?`projpred-package` ``). * Fix a reflow bug in the general package documentation (available at ``?`projpred-package` ``). * `DESCRIPTION`: Minor cleaning. * `DESCRIPTION`: Reflow to margin width 80. * Activate the examples in Travis CI checks. * Miscellaneous modifications and fixes (again) (#234) * Remove `as.matrix.list()` (neither used nor needed). * Define the default `baseline` in the `vsel` methods directly (instead of using `NULL`) and remove the `is.null()` case in `.validate_baseline()`. * Docs: Mention that for argument `baseline`, the best submodel is decided in terms of `stats[1]`. * Docs of `summary.vsel()`: Minor wording improvements for argument `type`. * Docs of argument `baseline`: Mention the relevance. * Simplify `cvfun()` in `get_refmodel.stanreg()`. * Set a visible default for argument `K` of `cv_varsel()`. * Improve y-axis label in `plot.vsel()`. * There are no Gaussian-only `stats` (at least it isn't implemented this way). Therefore, adapt the tests. * Fix issue #210. * Adapt the tests to the fix for issue #210. * Harmonize definitions and usages of `%<name>%` binary infix operators. * Unify the equivalent special binary (infix) operators `%||%` and `%ORifNULL%`. * Avoid function definitions without curly braces because they're not shown in RStudio's document outline. * Remove `.get_proj_handle()` (not necessary and had an incorrect default for `regul`). * `search_forward()`: Remove unused argument `increasing_order`. * `projfun.R`: Enhance comments. * `predict.subfit()`: Don't access `subfit$x` if not necessary. * Minor cleaning. * Remove more unused methods. * `get_refmodel.stanreg()`: Throw an error if **rstanarm** is not installed. * Replace "search path" by "solution path" (at least where relevant and easily possible) to avoid a (purely optical) confusion with "search part". * Parallelization of the projection (#235) * Parallelize the projection (i.e., parallelize across the projected draws). A related discussion may be found in issue #77. * Introduce global option `projpred.nprjdraws_parallel`. * Extend docs for parallelization. * Improve the documentation for global option `projpred.nprjdraws_parallel` (and adapt the code correspondingly). * Increase the default for option `projpred.nprjdraws_parallel` from 100 to 200 to ensure a speed improvement even for `devtools::load_all()` instead of `library(projpred)` (for `devtools::load_all()`, parallelization seems to be a bit slower). * Iterate over the objects in the `foreach()` loop themselves instead of their index. This should avoid the export of those (large) objects. * Use the `.export` argument in the `foreach()` call to handle exports manually. This could avoid the unnecessary export of (large) objects which are unused. * Use the `.noexport` argument in the `foreach()` call to handle exports manually. This could avoid the unnecessary export of (large) objects which are unused. * Check the most important (largest) objects which should not be exported. * Set `.noexport` manually. * Explain why we have the global option `projpred.nprjdraws_parallel`. * Set `.export` manually (necessary for the `doFuture` package with `options(doFuture.foreach.export = ".export")`). * Make the parallelization of the projection an optional feature. * Add comments concerning the parallelization. * Rename option `projpred.nprjdraws_parallel` to `projpred.prll_prj_trigger`. * Throw errors if packages **foreach** and **iterators** are not installed. * Add package **parallel** to `Suggests:`. * Add package **doFuture** to `Suggests:` (only needed for the parallel tests, though, which are not run on CRAN by default). * Add packages **future** and **future.callr** to `Suggests:`. * `bootstrap()`: Remove argument `oobfun` which is not used within **projpred** (and makes future modifications of `bootstrap()` more complicated). * `bootstrap()`: Replace `seq.int()` by `seq_len()`, as recommended in `?seq.int`. * `bootstrap()`: Improve a comment. * Parallelize `bootstrap()`. * Extend the docs for the parallelized projection (in ``?`projpred-package` ``). * Test the `"noclust"` setting (for projecting from a reference model) more often. * Avoid a failing "`offsetnew` works" test due to numerical inaccuracies for extreme values. * Add test "PRNG is not taking place where not expected". * Simplify the test "PRNG is not taking place where not expected". * Rename test "PRNG is not taking place where not expected" to "non-clustered projection does not require a seed". * Add tests for parallelization. * Move the code for the sequential `bootstrap()` run from `setup.R` to `test_parallel.R`. * Revert the parallelization of the bootstrap. The first reason is that in order for the parallel test results to match the sequential ones exactly, we would have to use doRNG::`%dorng%`() also in the sequential case of `bootstrap()` and then, for the sequential results in the tests, not register any **foreach** backend at all or use `registerDoSEQ()`. The second reason is that, because of the parallelization overhead, the parallelized bootstrap didn't provide a speed improvement, at least not up to the number of bootstrap samples used by default (2000). * Even if not on CRAN, `R CMD check` imposes a limit of 2 cores (at least by the defaults used in RStudio). * `vignettes/quickstart_glmm.Rmd`: Replace `ndraws = 10` by `nclusters = 10` (not necessary, but didactically better since an `ndraws` <= 20 is passed to `nclusters`). * `vignettes/quickstart_glmm.Rmd`: Fix typo. * `vignettes/quickstart_glmm.Rmd`: Remove unused argument `ns`. * `vignettes/quickstart_glmm.Rmd`: Fix the parameter names supplied to `mcmc_areas()`. * vignettes/children/SETTINGS-knitr.txt: Improve the check for the `params$eval` parameter. * Use `nclusters = 20` (or `nclusters_pred = 20`). Also improve the wording of some text. * Further reduce the runtime for the vignette. * vignettes/quickstart_glmm.Rmd: Error corrected ( but `(1 | obs)` and `(1 | phylo)` should be equally important as they are both somehow estimated to have a hierarchical SD of exactly 1 in the submodels; this needs to be inspected ). * vignettes/quickstart_glmm.Rmd: Minor wording improvements. * vignettes/quickstart_glmm.Rmd: Remove the term `(1 | phylo)` from the model (since we have `identical(data_pois$phylo, paste0("sp_", data_pois$obs))`, so including both `obs` and `phylo` doesn't make sense). * Further reduce the runtime for the vignettes. * vignettes/quickstart.Rmd: Remove unused argument `ns`. * Add `rmarkdown::` to the output type of the vignettes. * Vignettes: Activate the "vignette" YAML field. * Adapt ignore lists to vignettes (as done by `usethis::use_vignette("<vignette_name>")`). * vignettes/quickstart_glmm.Rmd: Further reduce the runtime by reducing the number of individuals in the `VerbAgg` dataset. * Add package "tidyr" to `Suggests` field (needed for building the GLMM vignette). * Add package "nlme" to `Suggests` field (needed for building the GLMM vignette). * Remove the html files from folder "vignettes" since CRAN takes the vignettes from "inst/doc" (created automatically by devtools::build(), for example). * vignettes/quickstart.Rmd: Further reduce the runtime by restricting `nterms_max`. * Add `rmarkdown` to the `VignetteBuilder` field in the DESCRIPTION file (see <https://cran.r-project.org/doc/manuals/r-release/R-exts.html>). * Fix indentation. * Replace the two former vignettes (GLM and GLMM) by a single new one which uses `datasets::CO2` as the only example. * Update the link in the documentation to the vignette. * Minor wording improvement after the two vignettes have been replaced by a single one. * For the new vignette: Add package `datasets` (part of basic R installation) to `Suggests:` and remove packages `tidyr` and `nlme`. * Update .gitignore after the two vignettes have been replaced by a single one. * Remove unused `SETTINGS-<...>.txt` files. * Change figure size. * Vignette: Fix a bug (`cv_varsel()` code was not shown). * Update `.Rbuildignore`. * Use `within()` for `CO2`. * Use cache only where necessary and reasonable (see <https://bookdown.org/yihui/rmarkdown-cookbook/cache.html>). * Since `SETTINGS-knitr.txt` is only used once, we can move its content to the place where it is used. * Simplify the setup chunk. * Minor fixes and improvements in the vignette. * Handle the encoding as suggested by `usethis::use_vignette()`. * Use the default `comment = '##'` instead of `comment = NA`. * Add section "Troubleshooting" and explain `varsel()` a bit more. * Use the terms "search step" and "evaluation step" (as a repeating pattern, this should be easier to recognize for readers). * Add more info concerning the failing projected posterior predictive check (PPPC). * Attach the **rstanarm** package in a separate chunk (needed for caching to work). * Minor wording improvements. * Switch back to the GLM example for the vignette (i.e., don't use the `CO2` GLMM example anymore) and introduce parallelization. * Use the term "MCMC" more consistently. * Fix incorrectly rendered DOI hyperlinks. * Vignette: Replace "search step" and "evaluation step" by "search part" and "evaluation part", respectively. The reason is a possible confusion of the term "search step" with the steps of a stepwise variable selection, see e.g. `?step`. The second reason is that the term "step" might suggest an EM-like alternation of steps (which is not the case here, since the search is completed first and only after that, the evaluation takes place). * Fix a URL. * Minor wording improvement in the vignette. * Fix the ordering of the references in the vignette. * Adapt the package description to the new vignette. * Update `README.md` to the replacement of the two former vignettes by a single new one. * Activate the vignette in `R CMD check` and `R CMD build`. * Activate the vignette in Travis CI checks (and builds). * Don't use markup in the "title" YAML field because this causes the following warning when building the **pkgdown** website (see also r-lib/pkgdown#989); note that setting the option mentioned in the warning below only helps when building the standalone vignette, not when building the **pkgdown** website: ``` The vignette title specified in \VignetteIndexEntry{} is different from the title in the YAML metadata. The former is "projpred: Projection predictive feature selection", and the latter is "**projpred**: Projection predictive feature selection". If that is intentional, you may set options(rmarkdown.html_vignette.check_title = FALSE) to suppress this check. ``` * Add newlines after figures (necessary for the **pkgdown** website). * Fix <https://github.com/stan-dev/projpred/pull/237#discussion_r732711453>. * Mention other possible types of reference models. Minor improvements in some other sections, too. * Additive models: Add notes that this is currently only an experimental feature. * Move the content of the setup chunk back to its own file `vignettes/children/SETTINGS-knitr.Rmd`. This is done for consistency with other Stan-related R packages. However, note that these "children" files differ slightly between the Stan-related R packages. Here, the templates were **cmdstanr**, **bayesplot**, and **rstanarm**. * Analogously to **cmdstanr** and **bayesplot**, add a TOC to the vignette. * Wording improvements (see PR review). * Update `README` (#245) * Docs: Follow-up for PR #233 etc. (#246) * Docs: Replace the term "credible" (intervals/bounds) by "confidence" (intervals/bounds), following the discussion at PR #233. Also perform some more or less related (minor) improvements. * Docs: Add reference for Gelman and Hill (2006) (used to be Gelman and Hill, 2007), following the discussion at PR #233. * Doc topic `refmodel-init-get`: Ensure that sections don't seem like subsections of section "Value". * Refactor `summary.vsel()` to improve readability. (#247) * Tests: Minor enhancements (#248) * Tests: Adapt a special case to the fact that the submodel fitter is now decided based on the submodel's (not the reference model's) formula. * Minor improvements in `run_snaps` and `run_prll` (and the associated comments). * Minor improvements in the comments for `run_snaps` and `run_prll`. * Handle issue stan-dev/rstanarm#551 and other `cvfun` changes (#249) * Remove `...` from `cvfun` definitions since all calls to `cvfun()` (which is currently only a single one) currently only use argument `folds`. * Improve an error message related to `cvfun`. * `cvfun` in `get_refmodel.stanreg()`: Force `cores = 1` in the call to `rstanarm::kfold()` (see issue stan-dev/rstanarm#551). * Vignette: Set `options(mc.cores = ncores)` to make use of parallelization in `rstanarm::kfold()`. * Offsets: Wrapper for `ref_predfun` (#250) * Move the subtraction of the offsets out of the default `ref_predfun` by introducing a wrapper function around the user-specified (or default) `ref_predfun`. This has the advantage of avoiding a breaking change for users with custom `ref_predfun`s. * Fix a bug when calling `ref_predfun()` with actual `newdata` on a `stanreg` object with offsets. This was already anticipated in PR #196 and is now necessary because the subtraction of the offsets was moved out of the default `ref_predfun` and into a wrapper function. However, this fix is still just a fragile quick-and-dirty solution. It is more desirable to adopt the **brms** way, as discussed in PR #196 and issue #218. * Docs, comments, defaults (#251) * Argument `transform` of `posterior_linpred()` has a default of `FALSE`. * `misc.R`: Add a comment concerning the origin of `.is.wholenumber()`. * Minor wording improvement in docs and a comment. * Reformat and reword some larger comments in `.get_refdist()`. * `fetch_data()`: Argument `newdata` has a default of `NULL`. * Warnings for experimental features (#252) * Add a warning for the experimental `Student_t()` family (see issue #233). * Add a warning for the experimental subsampled LOO CV (see issue #94). * Adapt the tests to the warning for subsampled LOO CV. * Additive models: Add a note in the docs and a warning in `init_refmodel()` that this is currently only an experimental feature. See the comment at <github.com/stan-dev/projpred/pull/237#issuecomment-961151653>. * Tests: Exclude additive models (GAMs and GAMMs) for now, following commit "Additive models: Add a note in the docs and a warning in `init_refmodel()` that [...]". * Add a warning for the Gamma family (support for this is only experimental; see issue #240). * Fix issue #242 (#253) * Fix issue #242 (Refactors and bugs in `.init_submodel()`), i.e., perform th…

fweber144 added 30 commits October 7, 2021 17:33

Use the S3 system for solution_terms(). This also allows to have a …

e65fe2f

…`solution_terms.projection()` method.

Add warnings to project() when cv_search is modified. Improve err…

f824f03

…or messages. Use the new `solution_terms()` S3 generic.

Adapt the tests.

23f5f0a

Fix a typo.

ffc831e

Improve readability in .loo_subsample_pps().

08dc3dc

Add an error to as.matrix.projection() for objects based on `"dataf…

980ee30

…it"`s.

Add an error to project() for "datafit" objects.

64f6bb0

Adapt the tests.

113f319

Remove defaults from parse_args_cv_varsel() (should be set in `cv_v…

eae9303

…arsel()` and then `cv_varsel()` should pass all these arguments to `parse_args_cv_varsel()`.

Clean up parse_args_cv_varsel(). Add a warning when cv_method is …

b6d85d3

…modified. Make the defaults clear from the function signature.

Adapt the tests.

40c229c

project(): Move the error for "datafit"s to the beginning (

4b20dfd

otherwise, `cv_search = FALSE` causes the error "Please provide an `object` of class "vsel" or use `cv_search = TRUE`." ).

Minor cleaning in .get_sub_summaries().

2404030

Use as.data.frame() in fetch_data().

6aba637

Remove argument newdata of fetch_data_wrapper() (which

c68bca7

becomes `$fetch_data()`) since this was only used in `$mu_fun()` and can be solved more easily there.

predict.refmodel(): Replace inherits(ynew, "numeric") by `is.nume…

6db36ee

…ric(ynew)` (since `inherits(1L, "numeric")` is `FALSE` but `is.numeric(1L)` is `TRUE` ).

Adapt the tests.

1600f19

get_refmodel.stanreg(): Add an input check for object$data.

f63f52a

Adapt the tests.

a894c63

Move fetch_data() unchanged from divergence_minimizers.R to refmode…

1e2a22a

…l.R (the more appropriate place).

Improve documentation of div_minimizer.

b4f267d

Minor cleaning in misc.R.

1dc80fd

Minor cleaning in project.R.

0aacd87

project(): Improve the warning message when dropping solution terms…

bdc4739

… and notify the user of the possible solution terms.

Adapt the tests.

2d9c4a1

as.matrix.projection(): Check the class of all submodel fits (there…

43a3272

… are as many of them as there are clusters of draws), not just of the first one.

predict.refmodel(): Set a default of newdata = NULL.

e507c6c

fweber144 added 14 commits October 7, 2021 17:56

predict.refmodel(): Simplify input checks and remove incorrect ones…

6e339b3

… (`offsetnew` and `weightsnew` may be numeric vectors, for example).

predict.refmodel(): Remove some vertical whitespace.

c0870c5

proj_helper(): Minor cleaning.

34e7325

cv_varsel.R: Minor cleaning.

4b6e1bb

ref_predfun() for "datafit"s: Minor cleaning.

1019b15

Minor cleaning in methods.R.

38ea7b3

Ensure that all calls to extract_model_data() have all arguments na…

c3208cd

…med. This is important for `extract_model_data()` functions which have `...` instead of explicit `wrhs` and `orhs` arguments (see, e.g., brms version <= 2.16.1).

Check the weights and offsets after all calls to `extract_model_data(…

72cf5ca

…)` (and if necessary, adapt them). This is important for `extract_model_data()` functions which return weights and offsets of length 0 (see, e.g., brms version <= 2.16.1).

Comment out code related to Pareto k-values (currently not needed).

084ddbe

Minor wording improvement of the error message `ynew must be a numeri…

1bb8207

…cal vector`.

In the !validate_search part of loo_varsel(), the call to

8ad1735

`.get_sub_summaries()` is not needed.

varsel() and kfold_varsel(): Rename p_sub to submodels (for c…

7fae16b

…onsistency with `cv_varsel()` and to avoid confusion with `p_sel`, `p_pred`, and `p_ref`).

Minor improvement of an error message for nloo.

b7851db

suggest_size.vsel(): Replace pct = 0.0 by pct = 0.

38c5710

AlejandroCatalina merged commit c22263a into stan-dev:develop Oct 7, 2021

fweber144 deleted the misc_mods_fixes branch October 7, 2021 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscellaneous modifications and fixes #223

Miscellaneous modifications and fixes #223

fweber144 commented Oct 7, 2021

fweber144 commented Oct 7, 2021

Miscellaneous modifications and fixes #223

Miscellaneous modifications and fixes #223

Conversation

fweber144 commented Oct 7, 2021

fweber144 commented Oct 7, 2021