Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 267 ecdf plots with confidence bands #282

Conversation

TeemuSailynoja
Copy link
Collaborator

@TeemuSailynoja TeemuSailynoja commented Jan 21, 2022

A reworked implementation of the ECDF plots mentioned in issue #267 to replace my earlier pull request.

Main inclusions:
mcmc_rank_ecdf: rank ecdf plot with confidence bands for assessing if two or more chains sample the same distribution.
ppc_pit_ecdf and ppc_pit_ecdf_grouped: PIT ECDF plot with confidence bands to assess if y and yrep contain samples from the same distribution.

Both functions allow computation of the confidence bands and interpolation of these bands from precomputed values.

Copy link
Collaborator

@paul-buerkner paul-buerkner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks already quite good. I like the code structure and readability. However, I think there are not yet any unit tests and both the internal and user-facing documentation is a bit short at times and may need some fixes as described in my comments.

R/helpers-ppc.R Outdated Show resolved Hide resolved
R/helpers-ppc.R Outdated Show resolved Hide resolved
R/helpers-ppc.R Outdated Show resolved Hide resolved
R/helpers-ppc.R Outdated Show resolved Hide resolved
R/helpers-ppc.R Show resolved Hide resolved
R/ppc-distributions.R Show resolved Hide resolved
R/ppc-distributions.R Outdated Show resolved Hide resolved
R/ppc-distributions.R Outdated Show resolved Hide resolved
R/ppc-distributions.R Show resolved Hide resolved
NAMESPACE Show resolved Hide resolved
Copy link
Collaborator

@paul-buerkner paul-buerkner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thank you! The only remaining issue is the storage of a couple of .svg files which I am not sure should really be tracked by git.

tests/testthat/test-ppc-distributions.R Outdated Show resolved Hide resolved
@paul-buerkner
Copy link
Collaborator

paul-buerkner commented Mar 31, 2022

Thanks! Looks good to me. Checks are now in progress.

@TeemuSailynoja
Copy link
Collaborator Author

There was one final bug introduced in a merge and I had apparently run the test without loading the merged files in. I have now fixed that in the ppc_pit_ecdf functions and all the tests should pass.

@codecov-commenter
Copy link

codecov-commenter commented Mar 31, 2022

Codecov Report

Merging #282 (f5c0dc3) into master (0f931bc) will decrease coverage by 0.75%.
The diff coverage is 87.34%.

@@            Coverage Diff             @@
##           master     #282      +/-   ##
==========================================
- Coverage   98.47%   97.72%   -0.76%     
==========================================
  Files          35       35              
  Lines        4661     4969     +308     
==========================================
+ Hits         4590     4856     +266     
- Misses         71      113      +42     
Impacted Files Coverage Δ
R/helpers-ppc.R 84.68% <76.76%> (-12.10%) ⬇️
R/ppc-distributions.R 98.24% <95.20%> (-1.76%) ⬇️
R/mcmc-traces.R 99.15% <95.38%> (-0.85%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f931bc...f5c0dc3. Read the comment docs.

@paul-buerkner
Copy link
Collaborator

There are still some checks failing. Can you take a look and fix them?

@avehtari
Copy link
Contributor

This PR has been approved a long time again, but some checks are failing. Based on a quick lookm at least one failure is due to a typo, so it would be good to check the failure messages

@TeemuSailynoja
Copy link
Collaborator Author

I fixed the typo causing the error in testing.

@paul-buerkner
Copy link
Collaborator

Thanks! Checks are running now and I will merge as soon as they pass.

@paul-buerkner
Copy link
Collaborator

Still fails. Do you see why?

@avehtari
Copy link
Contributor

Windows:

Error: Error: processing vignette 'visual-mcmc-diagnostics.Rmd' failed with diagnostics:
invalid connection

Ubuntu:

  Error in get_interpolation_values(N, K, L, prob) : 
    No precomputed values to interpolate from for sample length of 93.
  Please use a subsample of length 100 or larger, or consider setting 'interpolate_adj' = FALSE.
  Calls: ppc_pit_ecdf_grouped ... adjust_gamma -> interpolate_gamma -> get_interpolation_values
  Execution halted

@TeemuSailynoja
Copy link
Collaborator Author

I fixed the Ubuntu issue. This windows one is on a seemingly unrelated Vignette, and I wasn't able to parse from the error message, what exactly is failing in the processing of the Rmd file.

@paul-buerkner
Copy link
Collaborator

Okay, let's try again.

@TeemuSailynoja
Copy link
Collaborator Author

Silly mistake of failing to remember to commit the updated Rd files after changing the documentation in the .R files. Still, the windows test seems to fail while compiling a Stan model from a Vignette.

@paul-buerkner
Copy link
Collaborator

All the checks that actually run (and don't fail because of unrelated reasons) have passed and the PR also passed for me locally. Accordingly, I will merge this PR now. Thank you for your work!

@paul-buerkner paul-buerkner merged commit 5a5a69f into stan-dev:master Jun 21, 2022
jgabry added a commit that referenced this pull request Nov 10, 2022
* fix r cmd check issues from #282
* fix ggplot test issues (closes #289)
* add @TeemuSailynoja as contributor
@jgabry jgabry mentioned this pull request Nov 10, 2022
@jgabry
Copy link
Member

jgabry commented Nov 10, 2022

I'm finally doing another bayesplot release so I will include this in the release. Thanks again @TeemuSailynoja for implementing this and @paul-buerkner and @avehtari for reviewing!

I noticed a few complaints from R cmd check still

❯ checking dependencies in R code ... NOTE
  There are ::: calls to the package's namespace in its code. A package
    almost never needs to use ::: for its own objects:
    ‘gamma_adj’

❯ checking R code for possible problems ... NOTE
  mcmc_rank_ecdf: no visible binding for global variable ‘parameter’
  mcmc_rank_ecdf: no visible binding for global variable ‘chain’
  ppc_pit_ecdf: no visible binding for global variable ‘y_id’
  ppc_pit_ecdf_grouped: no visible binding for global variable ‘y_id’
  Undefined global functions or variables:
    chain parameter y_id

but I think I have fixed these in #291.

@TeemuSailynoja
Copy link
Collaborator Author

Hi @jgabry, last two edits I would like to include for the ECDF plots would be to

  1. change the default behaviour to plot the regular ECDF instead of ECDF-difference.
    • Users are likely to be more familiar with this plot even if with large samples the difference shows a more dynamic range.
  2. change the default behaviour from always attempting to use interpolateion to compute the envelope to only interpolating for large samples.
    • The speed-up for small samples is minor and there is a chance of interpolation failing due to lack of precomputed data.

I already have these edits ready, if you think they can make it to the release. The question is, should I commit them to a separate branch, here, or to patch-for-cran for review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants