Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add order statistic warning #230

Merged
merged 6 commits into from Nov 15, 2023
Merged

Conversation

yannmclatchie
Copy link
Contributor

@yannmclatchie yannmclatchie commented Sep 29, 2023

This PR adds a warning when we estimate that the elpd difference of a collection of models is purely due to chance, developed with @avehtari.

If more than 11 models are compared, then the median model by elpd is taken as the baseline model, and the risk of the difference in predictive performance being due to random noise is estimated as described by McLatchie and Vehtari (Section 3.2, 2023). This will flag a warning if it is deemed that there is a risk of over-fitting due to the selection process, and users are recommended to avoid model selection based on LOO-CV, and instead to favour of model averaging/stacking or projection predictive inference.

Copy link
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yannmclatchie Great, thank you for the PR! I put a few suggestions and questions in the review comments.

R/loo_compare.R Outdated Show resolved Hide resolved
R/loo_compare.R Outdated Show resolved Hide resolved

if (max(elpd_diff) <= order_stat) {
# flag warning if we suspect no model is theoretically better than the baseline
warning("Difference in performance potentially due to chance.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we point users to the paper in the warning message (e.g., "See McLatchie and Vehtari (2023) for details".)? @avehtari what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now added this

R/loo_compare.R Outdated
Comment on lines 45 to 46
#' If more than \eqn{11} models are compared, then the worst model by elpd is
#' taken as the baseline model, and the risk of the difference in predictive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused by this because the printed loo_compare results will still show the model with the best ELPD as the baseline (i.e. all elpd_diff are relative to that one), right?

If that's still true, then I think we should rephrase this to indicate that this is just happening in an internal check and doesn't affect which model is considered as the baseline for computing elpd_diff in the output. Does that make sense? Or am I just confused (totally possible)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now updated the docstring, let me know if it is now clearer 🙏

@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2023

Codecov Report

Merging #230 (62a1adc) into master (33854f6) will increase coverage by 0.04%.
Report is 7 commits behind head on master.
The diff coverage is 100.00%.

❗ Current head 62a1adc differs from pull request most recent head 3526fbf. Consider uploading reports for the commit 3526fbf to get more accurate results

@@            Coverage Diff             @@
##           master     #230      +/-   ##
==========================================
+ Coverage   92.96%   93.00%   +0.04%     
==========================================
  Files          30       30              
  Lines        2770     2788      +18     
==========================================
+ Hits         2575     2593      +18     
  Misses        195      195              
Files Coverage Δ
R/loo_compare.R 96.49% <100.00%> (+0.65%) ⬆️

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@yannmclatchie
Copy link
Contributor Author

Thanks for the feedback @jgabry ! I've gone through your suggestions and made the changes with also a small change in the logic: we use the median model rather than the worst model to take the internal differences and compute the order statistic.

I have now also updated the documentation to make it clear that this second layer of elpd differences is different to what the user will eventually see and is strictly internal, as you noted. Let me know if there are any further changes that could be made to clarify

Copy link
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yannmclatchie Sorry for the delay on this! I've been super busy recently. I added one more comment suggesting minor edits to the documentation, but otherwise this looks good. After you take a look at my suggestion and approve it I can merge this PR!

@avehtari Does this look good to you too?

R/loo_compare.R Outdated Show resolved Hide resolved
Co-authored-by: Jonah Gabry <jgabry@gmail.com>
@jgabry
Copy link
Member

jgabry commented Nov 10, 2023

Thanks for updating the doc @yannmclatchie (and thanks for adding this to the package!). @avehtari if you're happy with this I think we can go ahead and merge it once all the automated tests are run again.

@avehtari
Copy link
Collaborator

I can check this at latest on Tuesday (too late today)

@jgabry
Copy link
Member

jgabry commented Nov 10, 2023

Sounds good, thanks

@avehtari
Copy link
Collaborator

OK for me

Copy link
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging now. Thanks again!

@jgabry jgabry merged commit 426c2d8 into stan-dev:master Nov 15, 2023
6 checks passed
@yannmclatchie yannmclatchie deleted the order-stat-warning branch November 16, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants