Conversation
Codecov Report
@@ Coverage Diff @@
## develop #530 +/- ##
===========================================
- Coverage 79.13% 79.12% -0.01%
===========================================
Files 45 45
Lines 9441 9438 -3
===========================================
- Hits 7471 7468 -3
Misses 1970 1970
Continue to review full report at Codecov.
|
|
@mitzimorris This should be ready for review now if you're able to take a look |
|
This run is against |
| 'due to this limit will result in slow exploration.', | ||
| 'For optimal performance, increase this limit.', | ||
| ] | ||
|
|
There was a problem hiding this comment.
I hate how brittle this test is - all we really care is that if mentions "maximum treedepth limit"
There was a problem hiding this comment.
Yeah, we have a bunch of tests which are pretty reliant on exact wording. In this case I just made the minimum change required, but it would be worth doing a pass on all of these to clean them up. I'm not always sure what the intent of the test is just from reading it, though
There was a problem hiding this comment.
you definitely did the right thing - minimal change, leave this mess for later.
we need models that will always fail in known ways and tests for existing diagnostics to check that the problem is correctly diagnosed - which requires more understanding of the problem than I had when I wrote this test back in the beginning.
There was a problem hiding this comment.
I suppose it’s sort of a question of purpose, but I could see an argument toward simply testing that the diagnose command returns a (non empty) string at all, not trying to test specific outputs. Presumably cmdstan is testing it actually diagnoses as it should, we just need to test that functionality is wrapped properly
There was a problem hiding this comment.
simply testing that the diagnose command returns a (non empty) string at all, not trying to test specific outputs.
CmdStan diagnose utility was changed to always return a string - https://github.com/stan-dev/cmdstan/blob/166712876de05c8548af05bb7acbedd0410735e0/src/cmdstan/diagnose.cpp#L214-L215
assuming that CmdStanPy will return this as well? if so, check that string is non-empty and doesn't match "no problems detected"
There was a problem hiding this comment.
I meant even simpler than that: call diagnose on any model (for example Bernoulli, or this one, or a few different ones) and only check that the command didn’t fail and a string was returned.
It returning a correct string is above the wrappers pay grade, in my opinion, and leads to tests being sensitive to wording etc.
It’s similar to the discussion of whether we should be testing accuracy of fits in tests, which is where we’d get occasional failures in the VI tests etc. We could instead just test that the results have the shape we want. If we want to test and verify that we’re correctly reading in values, we can do that from a known CSV file with fixed values
Now, this methodology would have downsides: we might not catch a bug in cmdstan if one slipped by the upstream tests, or if a change isn’t communicated like the default ID changing in 2.28. But our tests would be a lot cleaner
There was a problem hiding this comment.
you're absolutely correct. keep it simple!
Submission Checklist
Summary
This fixes a few issues discovered while testing the 2.29 release candidate:
fixed_paramchanges are not in 2.29, which we were assuming they would bedo_commandwould combine the stderr and stdout streams. This bug exists even without 2.29, it just wasn't uncovered until the new warnings were added to stancatexitin the tests to handle the intermitten issues with logging and pytestCopyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Simons Foundation
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: