Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loo_subsample and loo_approximate_posterior #113

Merged
merged 76 commits into from Aug 8, 2019

Conversation

MansMeg
Copy link
Collaborator

@MansMeg MansMeg commented Jul 5, 2019

Hi!

Here come loo_subsample and loo_approximate_posterior functionality.

Implementation question/issues

  1. I have not implemented everything we may want to need now. What is currently missing is approximations with truncated IS and subsample_loo for loglik arrays and matrices. Those should be easy to implement, although the current version has the basics.
  2. I have now implemented ndraws() and nparameters() functions for loo_subsample(). Those may be of more general use and can the be moved somewhere else. Now they are in loo_subsample.R.
  3. I have not profiled the code so it may be possible to improve performance.
  4. I now put most in the loo_subsample.R and the loo_approximate_posterior.R files.
  5. I have not bumped the DESCRIPTION file

Technical questions/issues

  1. I have now removed r_eff from loo_approximate_posterior(). I could not think of a situation with approximate posteriors and autocorrelation?
  2. Now r_eff is implemented as in ordinary loo, but it makes sense to only use r_eff for the subsampled observations. But I could not find a good way of doing this.
  3. I'm currently not handling MCSE correctly. Should I remove it for subsample_loo() return it only for the subsampled observations or for the totals? I'm not 100% sure what to do there.
  4. I would love to have a code review of how I implemented the naive diff SE estimation (for different subsamples) at row 77-96 in loo_compare.psis_loo_ss_list.R and how I compute looic at row 1083-1085 in loo_subsample.R . Maybe @avehtari could look at this?

@paul-buerkner
Copy link
Contributor

I think we are getting very close to having this PR ready. With the current loo_subsample method of brms I see for some models the following error:

Error: 'observations' is larger than total sample size.

I cannot tell whether this happens because of brms or loo but I think it would be good to figure this out before merging. Here is a reproducible example:

devtools::install_github("paul-buerkner/brms", ref = "ssloo")
library(brms)
fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
             data = epilepsy, family = poisson())
loo_subsample(fit1)

@MansMeg
Copy link
Collaborator Author

MansMeg commented Aug 6, 2019

Great. I just pushed some final fixes for the documentation, generics etc. Hopefully this will also fix the two failing test cases, but that remains to be seen.

The problem you encountered is that the default subsample size is 400 and the epilepsy dataset contain roughly 250 observations. Now it throws an error. I tried to make the error more clear. But maybe we should just set the observation to the full dataset and trow a warning instead?

@paul-buerkner
Copy link
Contributor

paul-buerkner commented Aug 6, 2019

Ah, this makes sense. Sorry this was my bad. I think the error is clear. Perhaps one could say Argument 'observations' to make clear this is something the user can simply change? Or also point out the number i.e. 400 and 250 in this case?

@paul-buerkner
Copy link
Contributor

paul-buerkner commented Aug 7, 2019

Ok, the PR looks good to me now and is ready to be merged in my opinion.

@jgabry, @avehtari any objections to merging this?

@codecov-io
Copy link

codecov-io commented Aug 7, 2019

Codecov Report

Merging #113 into master will decrease coverage by 3.1%.
The diff coverage is 91.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #113      +/-   ##
==========================================
- Coverage    98.4%   95.29%   -3.11%     
==========================================
  Files          16       19       +3     
  Lines        1252     2020     +768     
==========================================
+ Hits         1232     1925     +693     
- Misses         20       95      +75
Impacted Files Coverage Δ
R/loo_compare.R 94.79% <100%> (+0.6%) ⬆️
R/psis_approximate_posterior.R 71.42% <59.37%> (-24.58%) ⬇️
R/loo.R 91.87% <70.73%> (-5.63%) ⬇️
R/loo_compare.psis_loo_ss_list.R 84.82% <84.82%> (ø)
R/loo_approximate_posterior.R 92.1% <92.1%> (ø)
R/print.R 96.96% <94.11%> (-1.15%) ⬇️
R/loo_subsample.R 95.4% <95.4%> (ø)
R/effective_sample_sizes.R 97.27% <0%> (-0.23%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1f21dc...01decad. Read the comment docs.

@paul-buerkner paul-buerkner merged commit ccce220 into stan-dev:master Aug 8, 2019
@paul-buerkner
Copy link
Contributor

After @avehtari gave his ok yesterday, we are ready to merge this PR.

Thank you @MansMeg for your work! I think this is a really nice addition to the loo package!

@jgabry
Copy link
Member

jgabry commented Sep 23, 2019

I forgot to answer this good question from @MansMeg:

@jgabry What was your reason of requring users to compute r_eff themselves instead of having it computed internally in loo? The latter seems easily possible as we have the log_lik values available internally regardless of whether we do use the .function or .matrix approach.

It’s only possible if we know the chain_id for each draw, which we do in the array case (possibly the function case), but not the matrix case without the user supplying a chain_id variable. So I think @avehtari and I decided that loo methods in other packages like rstanarm and brms can compute r_eff automatically but the primary loo methods in loo would take r_eff as an input.

We could change this (I think without breaking backwards compatibility) if there’s a good reason.

@MansMeg
Copy link
Collaborator Author

MansMeg commented Sep 25, 2019

Ah. Of course, it totally makes sense. I assumed we "knew" the chain structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants