Add loo_subsample and loo_approximate_posterior #113

MansMeg · 2019-07-05T20:03:28Z

Hi!

Here come loo_subsample and loo_approximate_posterior functionality.

Implementation question/issues

I have not implemented everything we may want to need now. What is currently missing is approximations with truncated IS and subsample_loo for loglik arrays and matrices. Those should be easy to implement, although the current version has the basics.
I have now implemented ndraws() and nparameters() functions for loo_subsample(). Those may be of more general use and can the be moved somewhere else. Now they are in loo_subsample.R.
I have not profiled the code so it may be possible to improve performance.
I now put most in the loo_subsample.R and the loo_approximate_posterior.R files.
I have not bumped the DESCRIPTION file

Technical questions/issues

I have now removed r_eff from loo_approximate_posterior(). I could not think of a situation with approximate posteriors and autocorrelation?
Now r_eff is implemented as in ordinary loo, but it makes sense to only use r_eff for the subsampled observations. But I could not find a good way of doing this.
I'm currently not handling MCSE correctly. Should I remove it for subsample_loo() return it only for the subsampled observations or for the totals? I'm not 100% sure what to do there.
I would love to have a code review of how I implemented the naive diff SE estimation (for different subsamples) at row 77-96 in loo_compare.psis_loo_ss_list.R and how I compute looic at row 1083-1085 in loo_subsample.R . Maybe @avehtari could look at this?

paul-buerkner · 2019-08-06T15:33:07Z

I think we are getting very close to having this PR ready. With the current loo_subsample method of brms I see for some models the following error:

Error: 'observations' is larger than total sample size.

I cannot tell whether this happens because of brms or loo but I think it would be good to figure this out before merging. Here is a reproducible example:

devtools::install_github("paul-buerkner/brms", ref = "ssloo")
library(brms)
fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
             data = epilepsy, family = poisson())
loo_subsample(fit1)

MansMeg · 2019-08-06T18:33:31Z

Great. I just pushed some final fixes for the documentation, generics etc. Hopefully this will also fix the two failing test cases, but that remains to be seen.

The problem you encountered is that the default subsample size is 400 and the epilepsy dataset contain roughly 250 observations. Now it throws an error. I tried to make the error more clear. But maybe we should just set the observation to the full dataset and trow a warning instead?

paul-buerkner · 2019-08-06T18:53:27Z

Ah, this makes sense. Sorry this was my bad. I think the error is clear. Perhaps one could say Argument 'observations' to make clear this is something the user can simply change? Or also point out the number i.e. 400 and 250 in this case?

paul-buerkner · 2019-08-07T08:04:13Z

Ok, the PR looks good to me now and is ready to be merged in my opinion.

@jgabry, @avehtari any objections to merging this?

codecov-io · 2019-08-07T08:22:19Z

Codecov Report

Merging #113 into master will decrease coverage by 3.1%.
The diff coverage is 91.46%.

@@            Coverage Diff             @@
##           master     #113      +/-   ##
==========================================
- Coverage    98.4%   95.29%   -3.11%     
==========================================
  Files          16       19       +3     
  Lines        1252     2020     +768     
==========================================
+ Hits         1232     1925     +693     
- Misses         20       95      +75

Impacted Files	Coverage Δ
R/loo_compare.R	`94.79% <100%> (+0.6%)`	⬆️
R/psis_approximate_posterior.R	`71.42% <59.37%> (-24.58%)`	⬇️
R/loo.R	`91.87% <70.73%> (-5.63%)`	⬇️
R/loo_compare.psis_loo_ss_list.R	`84.82% <84.82%> (ø)`
R/loo_approximate_posterior.R	`92.1% <92.1%> (ø)`
R/print.R	`96.96% <94.11%> (-1.15%)`	⬇️
R/loo_subsample.R	`95.4% <95.4%> (ø)`
R/effective_sample_sizes.R	`97.27% <0%> (-0.23%)`	⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1f21dc...01decad. Read the comment docs.

paul-buerkner · 2019-08-08T06:10:57Z

After @avehtari gave his ok yesterday, we are ready to merge this PR.

Thank you @MansMeg for your work! I think this is a really nice addition to the loo package!

jgabry · 2019-09-23T18:13:37Z

I forgot to answer this good question from @MansMeg:

@jgabry What was your reason of requring users to compute r_eff themselves instead of having it computed internally in loo? The latter seems easily possible as we have the log_lik values available internally regardless of whether we do use the .function or .matrix approach.

It’s only possible if we know the chain_id for each draw, which we do in the array case (possibly the function case), but not the matrix case without the user supplying a chain_id variable. So I think @avehtari and I decided that loo methods in other packages like rstanarm and brms can compute r_eff automatically but the primary loo methods in loo would take r_eff as an input.

We could change this (I think without breaking backwards compatibility) if there’s a good reason.

MansMeg · 2019-09-25T07:12:36Z

Ah. Of course, it totally makes sense. I assumed we "knew" the chain structure.

MansMeg added 30 commits June 24, 2019 10:21

Matrix and array aploo behaves as they should - testsuites pass

9d803e6

Ad aploo.function + test suite - passes tests

ada5b70

Working refactoring of correction of log ratios

e3c5d2b

Removed unnessecary dependency

77b2c67

Solving class issues

412db2d

Renaming

05b6e1a

Refactoring of parallel handling of loo function -pass test suites

a84dd70

Changed argument name from log_q to log_g

ba2ca78

Change api

35811f2

Rename

5a6d9de

File rename

632113d

Init ssloo

19f898f

Roxygen fixes

f2ec07f

Fixed bugs

85901dd

Small bugg fix

b3d9711

Fixe update subsample loo

b4644d8

Added large data references

c7da94b

Handled final stuff

2edbcb5

Fixes in tests

2fed0c7

Added init test suite

5e47e5e

Bug fixes and updated test suite

0e14176

Refactored ndraws for draws objects

869a66f

Small refactoring for clarity

dd62516

Bug fix and addition tests

b48c3ed

Added tests and fixed bugs

82818aa

Small additional fix

f62e87e

Added tests

7c2bad0

Added tests

42ee8d1

Full test suite passes

fb26a81

Fixed HH test suites

e3bd2a9

MansMeg added 7 commits August 6, 2019 11:36

Minor fixes of argument shortening

e19a4c7

Changed thin_draws to .thin_draws and exported the generic

d894efe

Changed to .compute_point_estimate and exported the generic

d13a903

Final addition

0f25f89

Small namespace and doc fixes

9f51a64

Doc fixes

b6bbbc2

Remove nparameters

fd58faf

MansMeg added 2 commits August 6, 2019 21:20

Minor NAMESPACE fix

5df479e

Doc updates

3a319da

MansMeg added 11 commits August 6, 2019 22:35

Added newline to force Travis rebuild

bc5a226

Small fix and comment out failing test case

96d1a36

caching packages

e3386ed

Added release to Travis to cache packages

332c4c5

Disable tests to cache devel packages

2eaede0

Newline to start Travis

bcd2c9b

rm newline... restart Travis

38d5c67

And the add the tests

a0cbd37

And finally, add covr

c7f0490

Bump version

0d87763

Comment out dependencies

01decad

paul-buerkner merged commit ccce220 into stan-dev:master Aug 8, 2019

jgabry mentioned this pull request Nov 14, 2019

Faster loo estimate for large n using random sample loo #87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add loo_subsample and loo_approximate_posterior #113

Add loo_subsample and loo_approximate_posterior #113

MansMeg commented Jul 5, 2019

paul-buerkner commented Aug 6, 2019

MansMeg commented Aug 6, 2019

paul-buerkner commented Aug 6, 2019 •

edited

paul-buerkner commented Aug 7, 2019 •

edited

codecov-io commented Aug 7, 2019 •

edited

paul-buerkner commented Aug 8, 2019

jgabry commented Sep 23, 2019 •

edited

MansMeg commented Sep 25, 2019

Add loo_subsample and loo_approximate_posterior #113

Add loo_subsample and loo_approximate_posterior #113

Conversation

MansMeg commented Jul 5, 2019

paul-buerkner commented Aug 6, 2019

MansMeg commented Aug 6, 2019

paul-buerkner commented Aug 6, 2019 • edited

paul-buerkner commented Aug 7, 2019 • edited

codecov-io commented Aug 7, 2019 • edited

Codecov Report

paul-buerkner commented Aug 8, 2019

jgabry commented Sep 23, 2019 • edited

MansMeg commented Sep 25, 2019

paul-buerkner commented Aug 6, 2019 •

edited

paul-buerkner commented Aug 7, 2019 •

edited

codecov-io commented Aug 7, 2019 •

edited

jgabry commented Sep 23, 2019 •

edited