-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix/ eigenvalue parsing bug #215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if you need me to explain more the error handling, this will likely require adjustments in a couple more upstream places as well
for _, s := range strings.Fields(line) { | ||
eigenvalue, err := strconv.ParseFloat(s, 64) | ||
if err != nil { | ||
panic(fmt.Sprintf("Attempting to calculate condition number but could not parse eigenvalues -- %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally, we never want to panic unless there is a really bad reason to. Unfortunately, throughout this codebase, and some of pkgr, this was not always well held as panic is an easy escape hatch of just 💣 .
If a function is going to panic on such a situation, it should communicate it in the function name, generally using the term must
, so like mustParse
If not, instead can have the function signature return the value + error, so then the if err != nil just return right there like 0, err
The idea becomes then that error can bubble all the way up the stack to the final output, or be intercepted and handled otherwise, but that is something for some other place in the code to decide what to do about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually started to code this up and then realized that nothing for about three levels above this expects an error to come through so I just got lazy and made it panic. My thinking was that this should never be a user-facing error. In other words, it should only trigger if we parse something incorrectly, in which case spewing the ugly stack trace is probably as good as anything because we'll just want them to come to us with it.
Alternatively, we could handle it a few levels up and just log the error and return something like nil
for the condition number. But that didn't feel right to me, honestly. Your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i guessed as much, and could also see just using the mustParse() nomenclature and leaving the panic. I would also pass the specific value that failed on parsing as well (s) to help with diagnostics.
parsers/nmparser/parse_lst_file.go
Outdated
} | ||
|
||
sort.Float64s(eigenvalues) | ||
ratio := eigenvalues[len(eigenvalues)-1] / eigenvalues[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could the code get here if no eigenvalues were found - if so len(0) - 1 could cause problems.
Also, could there be an eigenvalue of 0, in which case divide by 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could the code get here if no eigenvalues were found - if so len(0) - 1 could cause problems.
I think no. I had an explicit check for that in here (that also panicked) but it felt excessive. I think the only way we would get here with length==0 is if the EIGENVALUES
header was the last thing in the file. Because if there was a number afterwards it would parse that as an eigenvalue, and if there was anything else it would panic on line 461.
could there be an eigenvalue of 0, in which case divide by 0
hmmm, I hadn't considered this. We never checked for this before but we could. In that case... what should we do? Seems like we either have to error or substitute some very tiny number instead. But the former is what would already happen and the latter feels very wrong to me because that would just return some very large (and probably wrong) condition number.
Honestly this math is a little over my head, but a little googling tells me that a) it is possible to have an eigenvalue equal to zero and b) that might mean the condition number is technically infinite. Any guidance on this scenario is appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd defer to @kylebaron or @timwaterhouse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should be possible (but unlikely) to get eigenvalues of zero if you don't force the R matrix to be positive definite (all eigenvalues > 0: default for non-EM methods is to not force positive definiteness). I think the condition number doesn't mean too much if the matrix has zero or negative eigenvalues, so maybe return a very large number (or infinity if possible). Even if the lowest eigenvalue is a very small positive number, that still indicates a poor fit so a large condition number is appropriate. @kylebaron?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that you just won't get anything if it can't come up with positive eigenvalues (like it'll say "COVARIANCE MATRIX UNOBTAINABLE" or something like that or it just won't give you anything).
I think NONMEM is making all these decisions too and implementing it in the .ext
file. If we take that output (or lack of output) bbi won't be in conflict with what NONMEM is doing and non assumptions needed on how to parse etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. I guess i never paid close attention to it before. But at least having that heuristic would be good. Would have helped on most-recent project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the proposal is to add another heuristic, which I'm not quite sure what to call (eigenvalue_issues
?) that would trigger if (any of):
- smallest eigenvalue is <= 0
- eigenvalue line (
-10000003
) is missing from.ext
- any eigenvalues were negative, but potentially got forced back to positive (look for "
Negative Eigenvalues in Matrix
" in the.lst
?)
I realize these are all slightly different (or maybe overlapping?) but I don't think we want to have too many heuristics. The docs for ?model_summary
would explain that this flag can mean a few different things and "the user should check...".
How does that all sound? If we like that, then we'll continue on to discussing what to return for the "NULL
" value... (it's a bit of a tangent, so I want to get this straight first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just kicking around some ideas (I know this is going to seem like a lot, but getting some ideas out):
- Agree we don't want too many heuristics / red flags coming up
- I feel like we need to know if the user both requested the covariance matrix and requested eigenvalues get printed (it seems like if you don't request them, they don't appear in either the
.lst
or the.ext
files; @timwaterhouse can you confirm this?) - If the user didn't request it we shouldn't warn that there was an issue with eigenvalues
- If the user did request eigenvalues and the covariance matrix was unobtainable or the cov step fails, maybe warn about the covariance matrix only
- If the user requested eigenvalues and NONMEM had to force positive definiteness, then issue an eigenvalue warning
- You could look for something in that text; I'd tend toward
Forcing positive definiteness
butNegative Eigenvalues
could work too
- You could look for something in that text; I'd tend toward
- If the user requested eigenvalues and NONMEM reports a negative eigenvalue, then issue an eigenvalue warning (honestly I doubt this will happen but maybe it could)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if you don't use $COV ... PRINT=E
then there are no eigenvalues in either .lst
or .ext
files. Agree that we shouldn't warn about missing eigenvalues if they're not requested. This should be flagged by EIGENVLS. PRINTED: NO
in the .lst
file.
@seth127, I'm good with your proposed eigenvalue_issues
(or whatever) instead of a bunch of separate ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with respect to lst vs ext - I would likewise go straight to the ext. That ship has now definitively sailed. At one point, with an objective of being similar to PsN sumo, we thought to maybe be able to give back results with only lst. At this point, its more hastle than its worth and if anything i'd rather rely on lst less and take advantage of more structured data
get largeNumberLimit from configbbi/parsers/nmparser/parse_lst_file.go Lines 465 to 470 in f148fc1
This comment was generated by todo based on a
|
@dpastoor this is ready for review again. I tagged https://github.com/metrumresearchgroup/bbi/releases/tag/3.0.3-beta.1 for testing. |
It gives me the condition number based on the eigenvalues reported by NM, which I believe is what’s expected. |
@dpastoor this is ready for your re-review. @timwaterhouse tested it on his project and it worked, plus we have tests now. Note: Drone was failing on this branch, which was fixed as discussed here as part of a red herring. That is still ongoing, but I don't think it effects this PR, mostly because it's been going on for 6 months at least and the flaky test is totally unrelated. |
Lgtm! |
Closes #214
Closes #216
Also refactors to pull condition number from
-1000000003
line of.ext
, when present, instead of.lst
file.