Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/ eigenvalue parsing bug #215

Merged
merged 6 commits into from
May 26, 2021
Merged

Conversation

seth127
Copy link
Collaborator

@seth127 seth127 commented Apr 19, 2021

Closes #214
Closes #216

Also refactors to pull condition number from -1000000003 line of .ext, when present, instead of .lst file.

seth127 added a commit to metrumresearchgroup/bbitest that referenced this pull request Apr 20, 2021
Copy link
Contributor

@dpastoor dpastoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if you need me to explain more the error handling, this will likely require adjustments in a couple more upstream places as well

for _, s := range strings.Fields(line) {
eigenvalue, err := strconv.ParseFloat(s, 64)
if err != nil {
panic(fmt.Sprintf("Attempting to calculate condition number but could not parse eigenvalues -- %v", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally, we never want to panic unless there is a really bad reason to. Unfortunately, throughout this codebase, and some of pkgr, this was not always well held as panic is an easy escape hatch of just 💣 .

If a function is going to panic on such a situation, it should communicate it in the function name, generally using the term must , so like mustParse

If not, instead can have the function signature return the value + error, so then the if err != nil just return right there like 0, err

The idea becomes then that error can bubble all the way up the stack to the final output, or be intercepted and handled otherwise, but that is something for some other place in the code to decide what to do about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually started to code this up and then realized that nothing for about three levels above this expects an error to come through so I just got lazy and made it panic. My thinking was that this should never be a user-facing error. In other words, it should only trigger if we parse something incorrectly, in which case spewing the ugly stack trace is probably as good as anything because we'll just want them to come to us with it.

Alternatively, we could handle it a few levels up and just log the error and return something like nil for the condition number. But that didn't feel right to me, honestly. Your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i guessed as much, and could also see just using the mustParse() nomenclature and leaving the panic. I would also pass the specific value that failed on parsing as well (s) to help with diagnostics.

}

sort.Float64s(eigenvalues)
ratio := eigenvalues[len(eigenvalues)-1] / eigenvalues[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the code get here if no eigenvalues were found - if so len(0) - 1 could cause problems.

Also, could there be an eigenvalue of 0, in which case divide by 0

Copy link
Collaborator Author

@seth127 seth127 Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the code get here if no eigenvalues were found - if so len(0) - 1 could cause problems.

I think no. I had an explicit check for that in here (that also panicked) but it felt excessive. I think the only way we would get here with length==0 is if the EIGENVALUES header was the last thing in the file. Because if there was a number afterwards it would parse that as an eigenvalue, and if there was anything else it would panic on line 461.

could there be an eigenvalue of 0, in which case divide by 0

hmmm, I hadn't considered this. We never checked for this before but we could. In that case... what should we do? Seems like we either have to error or substitute some very tiny number instead. But the former is what would already happen and the latter feels very wrong to me because that would just return some very large (and probably wrong) condition number.

Honestly this math is a little over my head, but a little googling tells me that a) it is possible to have an eigenvalue equal to zero and b) that might mean the condition number is technically infinite. Any guidance on this scenario is appreciated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd defer to @kylebaron or @timwaterhouse

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be possible (but unlikely) to get eigenvalues of zero if you don't force the R matrix to be positive definite (all eigenvalues > 0: default for non-EM methods is to not force positive definiteness). I think the condition number doesn't mean too much if the matrix has zero or negative eigenvalues, so maybe return a very large number (or infinity if possible). Even if the lowest eigenvalue is a very small positive number, that still indicates a poor fit so a large condition number is appropriate. @kylebaron?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that you just won't get anything if it can't come up with positive eigenvalues (like it'll say "COVARIANCE MATRIX UNOBTAINABLE" or something like that or it just won't give you anything).

I think NONMEM is making all these decisions too and implementing it in the .ext file. If we take that output (or lack of output) bbi won't be in conflict with what NONMEM is doing and non assumptions needed on how to parse etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. I guess i never paid close attention to it before. But at least having that heuristic would be good. Would have helped on most-recent project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the proposal is to add another heuristic, which I'm not quite sure what to call (eigenvalue_issues?) that would trigger if (any of):

  • smallest eigenvalue is <= 0
  • eigenvalue line (-10000003) is missing from .ext
  • any eigenvalues were negative, but potentially got forced back to positive (look for "Negative Eigenvalues in Matrix" in the .lst?)

I realize these are all slightly different (or maybe overlapping?) but I don't think we want to have too many heuristics. The docs for ?model_summary would explain that this flag can mean a few different things and "the user should check...".

How does that all sound? If we like that, then we'll continue on to discussing what to return for the "NULL" value... (it's a bit of a tangent, so I want to get this straight first)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just kicking around some ideas (I know this is going to seem like a lot, but getting some ideas out):

  • Agree we don't want too many heuristics / red flags coming up
  • I feel like we need to know if the user both requested the covariance matrix and requested eigenvalues get printed (it seems like if you don't request them, they don't appear in either the .lst or the .ext files; @timwaterhouse can you confirm this?)
  • If the user didn't request it we shouldn't warn that there was an issue with eigenvalues
  • If the user did request eigenvalues and the covariance matrix was unobtainable or the cov step fails, maybe warn about the covariance matrix only
  • If the user requested eigenvalues and NONMEM had to force positive definiteness, then issue an eigenvalue warning
    • You could look for something in that text; I'd tend toward Forcing positive definiteness but Negative Eigenvalues could work too
  • If the user requested eigenvalues and NONMEM reports a negative eigenvalue, then issue an eigenvalue warning (honestly I doubt this will happen but maybe it could)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if you don't use $COV ... PRINT=E then there are no eigenvalues in either .lst or .ext files. Agree that we shouldn't warn about missing eigenvalues if they're not requested. This should be flagged by EIGENVLS. PRINTED: NO in the .lst file.

@seth127, I'm good with your proposed eigenvalue_issues (or whatever) instead of a bunch of separate ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with respect to lst vs ext - I would likewise go straight to the ext. That ship has now definitively sailed. At one point, with an objective of being similar to PsN sumo, we thought to maybe be able to give back results with only lst. At this point, its more hastle than its worth and if anything i'd rather rely on lst less and take advantage of more structured data

@todo
Copy link

todo bot commented May 5, 2021

get largeNumberLimit from config

// TODO: get largeNumberLimit from config
// or derive. something like (number of parameters) * 10
largeNumberLimit := 1000.0
cb := make([]bool, len(allCondDetails))
for i, cn := range allCondDetails {
cb[i] = cn.ConditionNumber > largeNumberLimit


This comment was generated by todo based on a TODO comment in f148fc1 in #215. cc @metrumresearchgroup.

@seth127
Copy link
Collaborator Author

seth127 commented May 5, 2021

@dpastoor this is ready for review again. I tagged https://github.com/metrumresearchgroup/bbi/releases/tag/3.0.3-beta.1 for testing.

@timwaterhouse
Copy link

3.0.3-beta.1 looks good. I tested with a model that produces negative eigenvalues. Everything looks good.

**Heuristic Problem(s) Detected:**
– eigenvalue_issues 

It gives me the condition number based on the eigenvalues reported by NM, which I believe is what’s expected.

@seth127
Copy link
Collaborator Author

seth127 commented May 25, 2021

@dpastoor this is ready for your re-review. @timwaterhouse tested it on his project and it worked, plus we have tests now.

Note: Drone was failing on this branch, which was fixed as discussed here as part of a red herring. That is still ongoing, but I don't think it effects this PR, mostly because it's been going on for 6 months at least and the flaky test is totally unrelated.

@dpastoor
Copy link
Contributor

Lgtm!

@dpastoor dpastoor self-requested a review May 25, 2021 22:49
@seth127 seth127 merged commit 43f1495 into develop May 26, 2021
@seth127 seth127 deleted the fix/condition_num_extra_line_bug branch May 26, 2021 13:36
@seth127 seth127 mentioned this pull request Jul 13, 2021
@seth127 seth127 restored the fix/condition_num_extra_line_bug branch July 23, 2021 15:26
@seth127 seth127 deleted the fix/condition_num_extra_line_bug branch October 18, 2021 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eigenvalue issues heuristic Condition number parsed incorrectly when eigenvalues fail to parse
4 participants