-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add convergence warnings #77
Comments
Yeah, at least for the rhat and ess warnings this makes sense. But presumably we wouldn't include the warnings about divergences, treedepth, and e-bfmi given that they are Stan-specific? Or did you want to broaden the scope of posterior to summarize sampler diagnostics? |
I didnt plan on including algorithm specific diagnostics. @avehtari
initially suggested moving the general warnings to posterior (which I
agree with). I would like to hear his take on the algorithm specific
diagnostics as well.
Jonah Gabry <notifications@github.com> schrieb am Fr., 19. Juni 2020, 21:14:
… Yeah, at least for the rhat and ess warnings this makes sense. But
presumably we wouldn't include the warnings about divergences, treedepth,
and e-bfmi given that they are Stan-specific? Or did you want to broaden
the scope of posterior to summarize sampler diagnostics?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AFZJWXXTJBYJZWUV53RXO2JJANCNFSM4OC37OOQ>
.
|
They are not Stan-specific, but NUTS / dynamic HMC algorithm-specific. It's good question whether posterior package would be the place for algorithm-specific diagnostics. There can be other inference packages using NUTS / dynamic HMC, too. How is bayesplot going forward with posterior? bayesplot has algorithm-specific diagnostic plots, but is bayesplot going to use posterior inside? If so, would it then be natural to allow (some) posterior object to carry algorithm-specific information, too? Or would it be a Stan-specific object inherited from posterior with additional Stan-specific parts? Also the algorithm-specific diagnostics should be improved, too. Again, there is the question, where would be the best place for advise, e.g., what to do if any divergences are observed. |
Yeah good point.
Yeah I'd like to use posterior inside bayesplot, so I agree it would definitely be nice if posterior handled some algorithm-specific information. I think it can do this without needing to know about Stan. Taking CmdStanR as an example, both
so posterior would just need to know how to handle draws objects containing variables like So maybe posterior just needs separate functions for diagnosing general MCMC and diagnosing particular algorithms, e.g.
which then could be wrapped up into a single method inside interfaces if desired: fit$diagnose() # internally does the two lines above Anyway, just brainstorming here. What do you all think?
I agree! |
I agree. We should indeed handle those in posterior. One question I think we should discuss is how to name those columns. I know stan uses |
Yeah, what if posterior allows both as input, but always uses |
@paul-buerkner @avehtari Any further thoughts on this topic? I've been meaning to think more about this but haven't really had the time yet unfortunately. Should we try to find a time to discuss this in more detail? For better or worse, this is the kind of functionality we should try to get (close to) right the first time because the diagnostic warnings will potentially affect a lot of users/packages. |
We had a very good discussion about this with @paul-buerkner last week, and I hope he made notes afterwards. We could have a video call after Wednesday. |
Ok great, sounds good |
I made notes and will post them here in some better written form tomorrow.
Am Mo., 28. Sept. 2020 um 21:45 Uhr schrieb Jonah Gabry <
notifications@github.com>:
… Ok great, sounds good
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AAMCYCLEDPNKOSAUSDSIDKV3ANCNFSM4OC37OOQ>
.
|
Aki and I discussed how we can make the warning system modular to achieve the following things:
This should allow both users and package developers building on top of posterior to adjust the warning system to their needs. We could think of a function creating a special S3 object class which controls all the warnings and thresholds. This could work a little like There are a few questions remaining:
@avehtari feel free to add more if you feel I have missed anything. |
Cool, thanks for sharing the notes. Did you talk at all about whether or not to include HMC/NUTS-specific diagnostics? I think it would be good to include those so that all convergence warnings can be handled the same. But if we include those then how does that change the question of where the warnings should live? Right now
By control over content do you mean whether they should be able to turn of certain warnings but keep other ones? Or something different?
Yeah I like the idea of explicitly passing the warning object. |
I think that having two separate methods for The per-variable diagnostics like In theory, we can control 3 things abouts the convergence warnings:
I think we all agree that points 1 and 2 should be controlled via the warning object. The question is whether we should give users or developers control over the point 3 (warning message content)? Right now, I think we should start with some fixed warning text, but I at least wanted to mention it so that we can discuss if necessary. |
@paul-buerkner summarized the discussion well. Couple additional comments
This holds also for R* which is a generic diagnostic.
In addition of short and longer warning text, we might also consider for example for Rhat that a short warning just says some Rhat is too high, but an additional information would give loo khat type table of how many are exceeding different thresholds. Also for each diagnostic check / warning we would have a link to web page with more information how to interpret the diagnostic warnings, what might be the next steps, etc. |
Just a quick note from our (@jgabry; @andrewgelman; @avehtari) call today. We decided that it might be a good thing to have all detailed recommendations following warnings (e.g., divergent transitions, rhat etc) on web pages rather than in the warning messages themselves. The latter would then just point to the web pages. This way, the warning interface becomes much simpler and we can update our recommendations easier without going through CRAN submission process. |
Landed here after looking for existing issues requesting a diagnose draws feature, so commenting to lend support to |
In thinking about the content of the output of
|
I don't think that this info is in the standard CSV output, but maybe in the diagnostic files if those are produced? |
But surely the idea here isn't to parse existing diagnostic files in order to... run diagnostics :P |
But surely the idea here isn't to parse existing diagnostic files in order to... run diagnostics
I meant the diagnostic CSV files that are generated by cmdstan during sampling if you ask for them. They do not compute any summary diagnostics, but instead gives you the full information of the trajectories, including rejected ones if I recall.
Weird, can't seem to get the formatting above to work.
|
I'm actually working on a workflow package that takes a different approach than is currently employed in cmdstanr and I'd already been considering generating and parsing the diagnostic CSVs by default, which would render the pertinent info available to post-sampling packages like posterior. (Though n.b. the eventual goal is to run checks during sampling to enable termination iff passed) But another thought is: how does cmdstan's diagnose binary produce treedepth warnings without the diagnostic CSVs? Or am I misremembering that it does so? |
CmdStan's bin/diagnose doesn't need the diagnostic files, just the standard csv output files. The csv comments contain the the max depth option used (either specified by the user or the default).
I think it's good to have that option but it shouldn't be the default (of course it's up to you since you're doing it in your own package!). The extra diagnostic files basically double the size of what you're writing to disk, which is fine for small models but risky as a default. The information in them is also only useful if you need info about the hmc trajectories, which isn't never but it's not the case for a typical user (it's not even accessible from most interfaces). |
Although perhaps you're not intending your aria package for the "typical" user, more for power users like yourself? |
But as @jsocolar notes, couldn't you reach max treedepth without actually exceeding it? Or I guess the diagnostic is about % reaching not % exceeding? |
Yeah I think it's % reaching, but it's been a while since I checked so I could be wrong. |
Yup, I know; I'm also capturing all info and writing to a proper compressed binary format (netcdf4 with the InferenceData spec), so the files will be deleted when cmdstan is done with them. I'm also looking into hacks that either redirect the writing or straight delete content between cmdstan's writes, but that's longer term. |
Interesting. I'll be curious to follow the progress! |
Just confirmed this is true. Never realized this! One implication is that if a fraction even moderately less than 1 of transitions hit the default |
Wonder if we could model the step sizes with truncation to estimate the CDF and thereby % >= max? Anyone know if there's any theory on the expected distribution of step sizes for well-behaved chains? |
For some of my models with a lot of parameters, treedepth has almost no variability, even if the step-size is right near the border between two treedepths. |
I think it could potentially be more than very few but yeah not necessarily, and I bet you're right that most users have the wrong impression because it's not explained well enough. That said, if you're constantly hitting max treedepth exactly then that's cutting it pretty close and also worth being warned about (but the warning should be clear about what it actually means). The warning message for this in CmdStanR, for example, is
So it does correctly say "hit the maximum treedepth limit", which doesn't imply it would have exceeded the limit if allowed. However, it does then mention the prematurely terminated trajectories. Even though it doesn't imply that that actually happened I could totally see it being confusing. If you have ideas for how we could express this more clearly that would be great. Regardless, I think you're both correct that we'd need more info about the trajectories in order to distinguish between hitting max treedepth naturally and being capped at max treedepth artificially. |
Great post. I especially like the plot of step_size vs accept_stat with treedepth as the color. Very nice. I wonder, would it be useful to add a plot like that to bayesplot? |
I think it's not useful to store the information whether max_treedepth was reached, but it would be useful to be easily to access the number of leapfrog steps per iteration as that is informative about the computational cost and can be compared to the expected number of leapfrog steps for normal distribution with the optimal stepsize.
This hints your posterior has almost constant curvature, that is, it's very close to normal |
The current convergence warnings in our packages (e.g., in rstan and brms) are inconsistent and partly confusing to users. Examples are given in https://discourse.mc-stan.org/t/improve-warnings-for-low-ess/15355. Other examples include warnings because of rhat or ess being NA, which is most likely caused by constant parameters, which does not pose a problem if the parameters are supposed to be constant (e.g., a correlation of a variable with itself).
We should move all of these warning messages to posterior, make them consistent and improve their informativeness, by mentioning different reasons for the potential problems and pointing to more detailed doc.
The text was updated successfully, but these errors were encountered: