Support augmented-data projection in projpred #1292

fweber144 · 2022-02-02T16:52:45Z

This PR should perform the final changes which are necessary to support the augmented-data projection in projpred. Note that I partly reverted some of my changes in get_refmodel.brmsfit() introduced earlier (by PR #1159 and PR #1223). The reason is that I found a more elegant way of handling the preparations for the augmented-data projection, namely by using a wrapper function for the user-specified ref_predfun within projpred. (Btw, this wrapper also handles the subtraction of the offsets, which is why we didn't have to subtract offsets in the brms-specific ref_predfun in PR #1287.) Sorry for the changes in get_refmodel.brmsfit() introduced earlier which are now partly reverted. But I think the new handling is much better, especially because it requires much less changes to brms and because it avoids exporting projpred functions which would only be needed by brms.

The development version of projpred currently does not support the augmented-data projection yet. If you try to create a reference model with the categorical() or an ordinal family such as cumulative() (in order to perform an augmented-data projection), projpred will throw an appropriate error (not only the development version, but also projpred's CRAN version). That's also why I did not create a NEWS entry here in this PR.

I noticed that get_refmodel.brmsfit() has argument newdata. Currently, I can't think of a situation where this would be necessary, but I left it in case there is indeed a reasonable use case. However, if there are no use cases, this argument might confuse users, so then it might be worth removing it.

I noticed that you recently replaced ilink by inv_link (in all of the code, it seems). Some time ago, I decided to use the suffix ilink in projpred, so with this PR, you get a new occurrence of ilink which, however, should not be replaced by inv_link (otherwise, we would have to change projpred code).

…arations for the augmented-data matrix (in particular, the coercion to an augmented-rows matrix) in a wrapper function for the user-specified (or default) `ref_predfun`, the `ref_predfun` defined here in brms for the augmented-data projection case may be simplified.

default of `FALSE`. This doesn't have any consequences since `link_categorical()` is not exported anyway. And it's also not used internally. (It simply exists to be passed to projpred.)

…flexibility in specifying the reference category).

…ly()` which was performed so that `...` may be used in generics like `varsel()`, `cv_varsel()`, and perhaps a future `project()` generic to pass arguments to `get_refmodel()` *as well as* down to the divergence minimizer.

paul-buerkner · 2022-02-02T17:14:53Z

Thank you! Reviewing this now.

As for the newdata argument. This allows to do projpred selection on entirely new data, or subsets of the original data, which may sometimes help with computation time. Or am I overlooking something?

paul-buerkner

Looks good. Just have two minor comments/questions.

R/distributions.R

R/projpred.R

fweber144 · 2022-02-02T19:31:35Z

As for the newdata argument. This allows to do projpred selection on entirely new data, or subsets of the original data, which may sometimes help with computation time. Or am I overlooking something?

At first, I also thought this would be newdata's purpose, but wouldn't the reference model have to be refitted if the data changes? As far as I can tell from quickly inspecting the source code of current_data(), such a refit does not take place.

<paul-buerkner#1292 (comment)>).

paul-buerkner · 2022-02-02T22:24:50Z

The reference model could serve as a well predicting model also for new data so I see no reason not to have a newdata argument.

paul-buerkner · 2022-02-02T22:25:19Z

Thank you for this PR! Looks good to me so merging now.

fweber144 · 2022-02-03T07:31:19Z

The reference model could serve as a well predicting model also for new data

Yes, but if it's only for prediction, then there's the projpred::predict.refmodel() method which also has a newdata argument. What I'm not sure about is whether the creation of a reference model where fit and data don't match complies with projpred's understanding of a reference model. I have to think about this again and maybe try to construct a use case. Currently, I still think a refit would be necessary, but I might be missing something. In any case, thank you for your feedback!

fweber144 · 2022-02-03T07:32:07Z

Thank you for this PR! Looks good to me so merging now.

Thanks for merging!

…()` was applied accidentally).

…ly (`link` was set accidentally as argument to `link_categorical()` `inv_link_categorical()`).

Fix PR #1292

fweber144 added 10 commits February 2, 2022 11:44

Activate the ref_predfun for ordinal models.

d27b6af

Further simplify the ref_predfun for ordinal models.

f12e828

Update to the new handling of the augmented-data projection in projpred.

bd06e0a

For link_categorical()'s argument return_refcat, use a

f640c5d

default of `FALSE`. This doesn't have any consequences since `link_categorical()` is not exported anyway. And it's also not used internally. (It simply exists to be passed to projpred.)

Replace insert_refcat_fam by refcat_obj (the aim is to have more …

c50481e

…flexibility in specifying the reference category).

Reflow internal docs for inv_link_categorical().

43bdf01

Argument refcat_obj: Allow for "first" which is also set as default.

78e9488

Cleaning (concerning ref_predfun).

11dd214

paul-buerkner reviewed Feb 2, 2022

View reviewed changes

R/distributions.R Outdated Show resolved Hide resolved

R/projpred.R Show resolved Hide resolved

Replace identical() by is_equal() (see

46ebaa6

<paul-buerkner#1292 (comment)>).

paul-buerkner merged commit b34093c into paul-buerkner:master Feb 2, 2022

fweber144 deleted the projpred_augdat branch February 3, 2022 07:00

fweber144 added a commit to fweber144/brms that referenced this pull request Feb 4, 2022

Fix a bug in PR paul-buerkner#1292 for the categorical() family (`t…

ac082d7

…()` was applied accidentally).

fweber144 added a commit to fweber144/brms that referenced this pull request Feb 4, 2022

Fix another bug in PR paul-buerkner#1292 for the categorical() fami…

fa7997a

…ly (`link` was set accidentally as argument to `link_categorical()` `inv_link_categorical()`).

fweber144 mentioned this pull request Feb 4, 2022

Fix PR #1292 #1294

Merged

paul-buerkner added a commit that referenced this pull request Feb 4, 2022

Merge pull request #1294 from fweber144/PR1292_fix

d795bf8

Fix PR #1292

fweber144 mentioned this pull request Mar 12, 2022

projpred: Make extract_model_data() return a factor y if necessary #1314

Merged

fweber144 mentioned this pull request May 23, 2023

projpred-related seed fix and docs/NEWS #1502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support augmented-data projection in projpred #1292

Support augmented-data projection in projpred #1292

fweber144 commented Feb 2, 2022 •

edited

paul-buerkner commented Feb 2, 2022

paul-buerkner left a comment

fweber144 commented Feb 2, 2022

paul-buerkner commented Feb 2, 2022

paul-buerkner commented Feb 2, 2022

fweber144 commented Feb 3, 2022

fweber144 commented Feb 3, 2022

Support augmented-data projection in projpred #1292

Support augmented-data projection in projpred #1292

Conversation

fweber144 commented Feb 2, 2022 • edited

paul-buerkner commented Feb 2, 2022

paul-buerkner left a comment

Choose a reason for hiding this comment

fweber144 commented Feb 2, 2022

paul-buerkner commented Feb 2, 2022

paul-buerkner commented Feb 2, 2022

fweber144 commented Feb 3, 2022

fweber144 commented Feb 3, 2022

fweber144 commented Feb 2, 2022 •

edited