Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support augmented-data projection in projpred #1292

Merged
merged 11 commits into from
Feb 2, 2022

Conversation

fweber144
Copy link
Contributor

@fweber144 fweber144 commented Feb 2, 2022

This PR should perform the final changes which are necessary to support the augmented-data projection in projpred. Note that I partly reverted some of my changes in get_refmodel.brmsfit() introduced earlier (by PR #1159 and PR #1223). The reason is that I found a more elegant way of handling the preparations for the augmented-data projection, namely by using a wrapper function for the user-specified ref_predfun within projpred. (Btw, this wrapper also handles the subtraction of the offsets, which is why we didn't have to subtract offsets in the brms-specific ref_predfun in PR #1287.) Sorry for the changes in get_refmodel.brmsfit() introduced earlier which are now partly reverted. But I think the new handling is much better, especially because it requires much less changes to brms and because it avoids exporting projpred functions which would only be needed by brms.

The development version of projpred currently does not support the augmented-data projection yet. If you try to create a reference model with the categorical() or an ordinal family such as cumulative() (in order to perform an augmented-data projection), projpred will throw an appropriate error (not only the development version, but also projpred's CRAN version). That's also why I did not create a NEWS entry here in this PR.

I noticed that get_refmodel.brmsfit() has argument newdata. Currently, I can't think of a situation where this would be necessary, but I left it in case there is indeed a reasonable use case. However, if there are no use cases, this argument might confuse users, so then it might be worth removing it.

I noticed that you recently replaced ilink by inv_link (in all of the code, it seems). Some time ago, I decided to use the suffix ilink in projpred, so with this PR, you get a new occurrence of ilink which, however, should not be replaced by inv_link (otherwise, we would have to change projpred code).

…arations for

the augmented-data matrix (in particular, the coercion to an augmented-rows matrix) in a wrapper function for the user-specified (or default) `ref_predfun`, the `ref_predfun` defined here in brms for the augmented-data projection case may be simplified.
default of `FALSE`. This doesn't have any consequences since
`link_categorical()` is not exported anyway. And it's also
not used internally. (It simply exists to be passed to
projpred.)
…flexibility in

specifying the reference category).
…ly()` which

was performed so that `...` may be used in generics like `varsel()`, `cv_varsel()`, and perhaps a future `project()` generic to pass arguments to `get_refmodel()` *as well as* down to the divergence minimizer.
@paul-buerkner
Copy link
Owner

Thank you! Reviewing this now.

As for the newdata argument. This allows to do projpred selection on entirely new data, or subsets of the original data, which may sometimes help with computation time. Or am I overlooking something?

Copy link
Owner

@paul-buerkner paul-buerkner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just have two minor comments/questions.

R/distributions.R Outdated Show resolved Hide resolved
R/projpred.R Show resolved Hide resolved
@fweber144
Copy link
Contributor Author

As for the newdata argument. This allows to do projpred selection on entirely new data, or subsets of the original data, which may sometimes help with computation time. Or am I overlooking something?

At first, I also thought this would be newdata's purpose, but wouldn't the reference model have to be refitted if the data changes? As far as I can tell from quickly inspecting the source code of current_data(), such a refit does not take place.

@paul-buerkner
Copy link
Owner

The reference model could serve as a well predicting model also for new data so I see no reason not to have a newdata argument.

@paul-buerkner
Copy link
Owner

Thank you for this PR! Looks good to me so merging now.

@paul-buerkner paul-buerkner merged commit b34093c into paul-buerkner:master Feb 2, 2022
@fweber144 fweber144 deleted the projpred_augdat branch February 3, 2022 07:00
@fweber144
Copy link
Contributor Author

The reference model could serve as a well predicting model also for new data

Yes, but if it's only for prediction, then there's the projpred::predict.refmodel() method which also has a newdata argument. What I'm not sure about is whether the creation of a reference model where fit and data don't match complies with projpred's understanding of a reference model. I have to think about this again and maybe try to construct a use case. Currently, I still think a refit would be necessary, but I might be missing something. In any case, thank you for your feedback!

@fweber144
Copy link
Contributor Author

Thank you for this PR! Looks good to me so merging now.

Thanks for merging!

fweber144 added a commit to fweber144/brms that referenced this pull request Feb 4, 2022
fweber144 added a commit to fweber144/brms that referenced this pull request Feb 4, 2022
…ly (`link` was set

accidentally as argument to `link_categorical()` `inv_link_categorical()`).
@fweber144 fweber144 mentioned this pull request Feb 4, 2022
paul-buerkner added a commit that referenced this pull request Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants