Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes and other improvements for argument d_test of varsel() #341

Merged
merged 16 commits into from Jul 14, 2022

Conversation

fweber144
Copy link
Collaborator

As stated in the new NEWS.md entries, this PR:

  • Fixes argument d_test of varsel(): Not only the predictive performance of the reference model needs to be evaluated on the test data, but also the predictive performance of the submodels.
  • Does not consider argument d_test of varsel() as an internal feature anymore. This was possible after fixing the bug for d_test mentioned above.
  • Ensures that the order of the observations in the subelements of <vsel_object>$summaries and <vsel_object>$d_test now always corresponds to the order of the observations in the original dataset (except if <vsel_object> was created by a call to varsel([...], d_test = <non-NULL_d_test_object>), in which case the order of the observations in those subelements corresponds to the order of the observations in <non-NULL_d_test_object>). The only case not following this rule up to now was K-fold CV.

Apart from that, the existing d_test tests is enhanced and a new test is added which tests that when d_test is set to actual test data, the <vsel_object>summaries$sub results can be reproduced by proj_linpred() and the <vsel_object>summaries$ref results can be reproduced by posterior_epred() and log_lik().

Some refactoring of d_test-related code is also performed, to enhance readability and consistency.

submodels also needs to be evaluated on the test data (not only the predictive
performance of the reference model).
even cause confusion and be used incorrectly, given the new
`.get_sub_summaries()` arguments).
`test_points`, namely `NULL` (see argument `obs` of `fetch_data()`).
tedious and not necessary. So add that element internally.
`<vsel_object>$summaries` and `<vsel_object>$d_test` as in the original dataset (or test dataset, in case of a non-`NULL` argument `d_test` of `varsel()`, but that doesn't concern K-fold CV).
`<vsel_object>summaries$sub` and `proj_linpred()` as well as between
`<vsel_object>summaries$ref` and `posterior_epred()` / `log_lik()` in case of
a non-`NULL` object `d_test`.
@fweber144 fweber144 merged commit 3142d8d into stan-dev:master Jul 14, 2022
@fweber144 fweber144 deleted the d_test_sub_fix branch July 14, 2022 20:54
fweber144 added a commit to fweber144/projpred that referenced this pull request Jul 15, 2022
The re-ordering of the `summaries` results according to the original order of observations (see PR stan-dev#341) needs special care in case of augmented-length vectors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant