Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning levels #309

Open
mjskay opened this issue Nov 2, 2023 · 7 comments
Open

Warning levels #309

mjskay opened this issue Nov 2, 2023 · 7 comments

Comments

@mjskay
Copy link
Collaborator

mjskay commented Nov 2, 2023

Pinging off discussion in #306, it might be good to have warning levels to set the verbosity of warnings for common things (like needing to merge chains). We would need to know:

  • In what situations do we currently generate warnings?
  • In what situations do we not generate warnings but perhaps should?
  • How severe are these situations?

We might then either want ordered warning levels (1, 2, ...) and/or warning categories/sub-categories that can be turned on and off via options.

@n-kall
Copy link
Collaborator

n-kall commented Jan 16, 2024

I had a look at the existing warnings and messages.

warnings:

as_draws_df, no user control, controlled internally by warn arg:
warning_no_call("Dropping 'draws_df' class as required metadata was removed.")

as_draws, no control:
warning_no_call(to," does not support non-numeric variables (e.g., factors). Converting non-numeric variables to numeric.")

convergence, no control:
warning_no_call("The ESS has been capped to avoid unstable estimates.")

merge_chains, controlled by posterior.warn_on_merge_chains:
warning_no_call( "Chains were dropped", switch(type, ".", match = " due to chain information not matching.", index = " due to manually indexing draws." ) )

pareto_smooth, no control:
warning_no_call("Input contains infinite or NA values, or is constant. Fitting of generalized Pareto distribution not performed.")

warning("Number of tail draws cannot be more than half ", "the total number of draws if both tails are fit, ",

warning("Number of tail draws cannot be less than 5. ", "Changing to ", 5, ".")

summarise_draws, no control:
warning_no_call( "The draws object contained no variables with unreserved names. ", "No summaries were computed." )

messages:

startup message:
packageStartupMessage("This is posterior version ", ver)

thin_draws, no control:
message("Automatically thinned by ", round(thin, 1), " based on ESS.")

subset_draws, no control:
message("Merging chains in order to subset via 'draw'.")

pareto_smooth, controlled with verbose arg:

" Mean does not exist, making empirical mean estimate of the draws not applicable."

" Sample size is too small, for given Pareto k-hat. Sample size larger than ", round(min_ss, 0), " is needed for reliable results.\n"
" Bias dominates when k-hat > 0.7, making empirical mean estimate of
the Pareto-smoothed draws unreliable.\n"
" Pareto khat for weights is high (", round(khat, 1) ,"). This indicates a single or few weights dominate.\n", "Inference based on weighted draws will be unreliable.\n")
"Pareto k-hat = ", round(khat, 2)

@n-kall
Copy link
Collaborator

n-kall commented Feb 9, 2024

Related to this, we've been discussing with @avehtari that it would be useful if messages/warnings related to e.g. convergence diagnostics would be available when the diagnostic is printed, not just when it is calculated.

One idea that we came up with would be to save the messages / warnings generated by the summary functions when calling summarise_draws, and include those as an attribute (or even another column) of the draws_summary object. These messages/warnings could then be pretty-printed as footnotes.

For example:

# A tibble: 10 × 3
   variable  rhat ess_bulk note
   <chr>    <dbl>    <dbl> 
 1 mu        1.02     558.  1
 2 tau       1.01     246.  1
 3 theta[1]  NA       400.  2
 4 theta[2]  1.02     564.
 5 theta[3]  1.01     312.
 6 theta[4]  1.02     695.
 7 theta[5]  1.01     523.  1
 8 theta[6]  1.02     548.
 9 theta[7]  1.00     434.  1
10 theta[8]  1.02     355.

Notes:
1. ESS capped to avoid unstable estimates.
2. NA or constant values in chains

@avehtari
Copy link
Collaborator

Can we get some progress with this issue?

Last week I did get annoyed when doing something like

for (s in 1:S) {
   somefunction(..., subset_draws(mydraws, draw=s))
}

with S=1000, and had to use capture.output(..., type="message") . I think that that specific message

message("Merging chains in order to subset via 'draw'.")

could be removed completely, as I did explicitly ask for draws and not iterations, but at least it would be nice to be able to suppress that without capture.output()

@n-kall
Copy link
Collaborator

n-kall commented Mar 14, 2024

rOpenSci recently published a guide/recommendations about this: https://ropensci.org/blog/2024/02/06/verbosity-control-packages/
See also here, specifically discussing Bayesian software.

Perhaps this would be a reasonable guide to follow for posterior?

@n-kall
Copy link
Collaborator

n-kall commented Mar 14, 2024

I made a first attempt at implementing the note-style messages in summarise draws. It seems that it could be done with attributes (see example below).

I'm not sure if this would be the best way, but it seems feasible with a small change to create_summary_list, and a bit more to print.draws_summary.
If this is a reasonable approach, I suppose there should still be an option to print the messages / warnings at compute time, as well as saving them in an attribute.

For example, the following works with my current implementation:

# add to the message attribute
add_message <- function(x, msg) {
  attr(x, "message") <- c(attr(x, "message"), msg)
  x
}

prob_positive <- function(x) {
  msg <- NULL
  if (all(x > 0)) {
    msg <- c(msg, "all draws are positive")
  }
  out <- Pr(x > 0)
  out <- add_message(out, msg)
  out
}

prob_negative <- function(x) {
  msg <- NULL
  if (all(x > 0)) {
    msg <- c(msg, "no draws are negative")
  }
  out <- Pr(x < 0)
  out <- add_message(out, msg)
  out
}

summarise_draws(example_draws(), prob_positive, prob_negative)
 # A tibble: 10 × 4
    variable prob_positive prob_negative .messages
    <chr>            <dbl>         <dbl> <chr>
  1 mu               0.895        0.105  ""
  2 tau              1            0      "1, 2"
  3 theta[1]         0.9          0.1    ""
  4 theta[2]         0.892        0.108  ""
  5 theta[3]         0.77         0.23   ""
  6 theta[4]         0.862        0.138  ""
  7 theta[5]         0.775        0.225  ""
  8 theta[6]         0.802        0.198  ""
  9 theta[7]         0.912        0.0875 ""
 10 theta[8]         0.822        0.178  ""

 Messages:
 1 prob_positive: all draws are positive
 2 prob_negative: no draws are negative

@n-kall
Copy link
Collaborator

n-kall commented Mar 15, 2024

Regarding possible levels or categories: Perhaps there could be different categories for different types of messages. For example:

  • diagnostic messages (e.g. high pareto_khat, high rhat, low ess)
  • computation messages (e.g. NA, constant or Inf values leading to NA diagnostic; ESS capping; weights being ignored)
  • draw structure messages (e.g. merging chains)

@mjskay
Copy link
Collaborator Author

mjskay commented Mar 15, 2024

I think there's a two different issues here:

  • our own mechanism for capturing messages and saving them on objects / summaries.
  • global options for turning on/off messages at different levels of verbosity. We already have this for the merge chains warning @avehtari encountered: you should be able to set the "posterior.warn_on_merge_chains" to false to turn it off. However we don't have a notion of warming levels that would silence sets of message types, which I agree would be useful.

Re: the first point, since R has mechanisms for capturing messages and warnings, I'd suggest using that to gather the messages within summarise_draws/etc, then attach them to the final output object at the end. We could even use message subclasses to group messages of the same type to simplify output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants