Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug "unreachable" code #364

Closed
SeanCurtis-TRI opened this issue Jan 22, 2019 · 5 comments · Fixed by #381
Closed

Debug "unreachable" code #364

SeanCurtis-TRI opened this issue Jan 22, 2019 · 5 comments · Fixed by #381
Assignees

Comments

@SeanCurtis-TRI
Copy link
Contributor

The EPA/GJK implementation has several tests which try to confirm the valid state of the algorithm and its data. There have been issues (e.g. #319, #322). In order to address the issue, we need to be able to reproduce them. Typically, it may prove nearly impossible to do so for end users. Perhaps it only comes up in incredibly complex or non-deterministic scenarios. However, these do come up. So, FCL has to look out for its interests.

In those cases where we are asserting "impossible-to-reach" code or valid state, FCL should capture the state of the geometry sufficient to reproduce in tests. Examples of such sites:

@SeanCurtis-TRI SeanCurtis-TRI self-assigned this Jan 22, 2019
@SeanCurtis-TRI
Copy link
Contributor Author

On a related note: we're also experiencing some explosions in the number of iterations EPA is experiencing (an apparent infinite loop that eventually consumes all memory). Related to debugging these in-the-guts EPA issues, it might be worth adding a "max_epa_iterations" property to the query request so that a user can tune it. Failure to satisfy the max requests should throw the same exception as above so that the conditions under which it is met likewise gets documented.

@DamrongGuoy
Copy link
Contributor

I have put this issue into my OKRs. Per f2f discussion with @SeanCurtis-TRI and @hongkai-dai , the low-level functions will throw a specific exception when we get into invalid states, and the high-level functions will catch the exception and report the state of the geometry sufficient to reproduce in tests. The report will include the kinds of geometries and their poses.

@hongkai-dai
Copy link
Contributor

anzu team mentioned observing this problem again in the slack channel https://tri-internal.slack.com/archives/CG17WNJJ1/p1551723060004100

@SeanCurtis-TRI
Copy link
Contributor Author

Re-opening as #381 has proven to be insufficient.

The exception has "worked" in the sense that exceptions have been thrown and shape, pose, and solver parameters have been provided in the error message (see #319 for an example). However, creating a simple signed distance query with the provided data has not successfully reproduced the bugs in the wild. The only logical assumption is not enough information has been captured.

While there is an attempt to catch one of these bugs in the wild and expand this functionality based on that, some discussions have led to the following additional thoughts:

  1. It should document the FCL scalar type, the libccd scalar type, and the libccd version.
  2. The errors document that a problem happened and (purport to document) the configuration that led to it. But it would also be insightful to see the data local to the exception being thrown; i.e., are NaNs being generated and propagated?

@hongkai-dai
Copy link
Contributor

Confirm that we can reproduce bugs in FCL using @SeanCurtis-TRI 's exception throwing code. The caveat is to build libccd with double precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants