Fixes and other improvements for argument `d_test` of `varsel()` #341

fweber144 · 2022-07-14T20:53:38Z

As stated in the new NEWS.md entries, this PR:

Fixes argument d_test of varsel(): Not only the predictive performance of the reference model needs to be evaluated on the test data, but also the predictive performance of the submodels.
Does not consider argument d_test of varsel() as an internal feature anymore. This was possible after fixing the bug for d_test mentioned above.
Ensures that the order of the observations in the subelements of <vsel_object>$summaries and <vsel_object>$d_test now always corresponds to the order of the observations in the original dataset (except if <vsel_object> was created by a call to varsel([...], d_test = <non-NULL_d_test_object>), in which case the order of the observations in those subelements corresponds to the order of the observations in <non-NULL_d_test_object>). The only case not following this rule up to now was K-fold CV.

Apart from that, the existing d_test tests is enhanced and a new test is added which tests that when d_test is set to actual test data, the <vsel_object>summaries$sub results can be reproduced by proj_linpred() and the <vsel_object>summaries$ref results can be reproduced by posterior_epred() and log_lik().

Some refactoring of d_test-related code is also performed, to enhance readability and consistency.

submodels also needs to be evaluated on the test data (not only the predictive performance of the reference model).

even cause confusion and be used incorrectly, given the new `.get_sub_summaries()` arguments).

… etc.

`test_points`, namely `NULL` (see argument `obs` of `fetch_data()`).

tedious and not necessary. So add that element internally.

use only). Also update `NEWS.md`.

`<vsel_object>$summaries` and `<vsel_object>$d_test` as in the original dataset (or test dataset, in case of a non-`NULL` argument `d_test` of `varsel()`, but that doesn't concern K-fold CV).

`<vsel_object>summaries$sub` and `proj_linpred()` as well as between `<vsel_object>summaries$ref` and `posterior_epred()` / `log_lik()` in case of a non-`NULL` object `d_test`.

The re-ordering of the `summaries` results according to the original order of observations (see PR stan-dev#341) needs special care in case of augmented-length vectors.

fweber144 added 16 commits July 14, 2022 12:13

Simplify the creation of d_test in case of is.null(d_test).

2424a48

Further simplify d_test-related code (now d_type).

5596d76

Fix argument d_test of varsel(): The predictive performance of the

efb67f2

submodels also needs to be evaluated on the test data (not only the predictive performance of the reference model).

Element test_points of d_test is actually not needed (and could

667725c

even cause confusion and be used incorrectly, given the new `.get_sub_summaries()` arguments).

Use a consistent order of arguments like newdata, offset, wobs,…

cc4a378

… etc.

varsel(): Pick an even more straightforward value for

ec44c09

`test_points`, namely `NULL` (see argument `obs` of `fetch_data()`).

Define d_test consistently (see also varsel()).

246e949

Requiring the user to always specify d_test$type = "test" is

1de3c22

tedious and not necessary. So add that element internally.

Docs: Declare d_test as available for everyone (not for internal

24c6d0b

use only). Also update `NEWS.md`.

Also in case of K-fold CV: Order the subelements of

97f5244

`<vsel_object>$summaries` and `<vsel_object>$d_test` as in the original dataset (or test dataset, in case of a non-`NULL` argument `d_test` of `varsel()`, but that doesn't concern K-fold CV).

Update the tests.

c0540b8

Improve the structure of test_varsel.R.

a801569

Improve the d_test test.

7591367

Generate independent test data for the tests.

b7ac09e

Use the independent test data for testing equivalence between

fc39bba

`<vsel_object>summaries$sub` and `proj_linpred()` as well as between `<vsel_object>summaries$ref` and `posterior_epred()` / `log_lik()` in case of a non-`NULL` object `d_test`.

Simplify small parts of the d_test tests.

3aba937

fweber144 merged commit 3142d8d into stan-dev:master Jul 14, 2022

fweber144 deleted the d_test_sub_fix branch July 14, 2022 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes and other improvements for argument `d_test` of `varsel()` #341

Fixes and other improvements for argument `d_test` of `varsel()` #341

fweber144 commented Jul 14, 2022

Fixes and other improvements for argument d_test of varsel() #341

Fixes and other improvements for argument d_test of varsel() #341

Conversation

fweber144 commented Jul 14, 2022

Fixes and other improvements for argument `d_test` of `varsel()` #341

Fixes and other improvements for argument `d_test` of `varsel()` #341