Skip to content
This repository has been archived by the owner on Aug 20, 2019. It is now read-only.

[RST-2202] Catch potential errors when computing the covariances #18

Merged
merged 2 commits into from
Jul 9, 2019

Conversation

svwilliams
Copy link
Contributor

There are several possible errors that can cause the covariance computation to throw. One source of such error arises from numerical stability/accuracy issues, which is a runtime phenomenon and cannot be solved by code fixes elsewhere. We should catch those errors and display a message rather than letting the exception propagate and take down the whole state estimation stack.

It is an open question for me whether we should publish the state estimate without the covariance information in this case. Or if we should issue the warning and not publish. Some things (probably most things) only care about the state estimate itself, and ignore the covariance anyway. Not publishing due to lack of covariance may be overly conservative. On the other hand, failure to compute a covariance is an indication of some sort of issue with the graph (improperly constructed, numerical stability issues, etc.) and the state estimate may be garbage even if it exists. Interested in hearing opinions.

ROS_FATAL_STREAM("An error occurred computing the covariance information for " << latest_stamp_ << ":\n" <<
e.what());
}
// TODO(swilliams): Should we always publish the odometry message, even if the covariance information is missing?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will publish the last good covariance. If we're going to pubish when the covariance computation fails, we should set the odom_output_ covariances to all zeros (or any invalid covariance that can be detected in the consumer side).

I lean towards publishing even if the covariance couldn't be computed. In that case, I'd do:

  1. When the covariance computation fails, set the odom_output_.pose.covariance or odom_output_.twist.covariance to all zeros. Any consumer should check the covariance is valid, so that way it's possible to detect the covariance couldn't be computed and either use the pose or disregard it.
  2. Request the pose and twist covariance in two separate try-catch blocks. I guess there are cases where only one of them can't be computed.

In theory, as you're saying, if the covariance couldn't be computed, it means that something is wrong with the problem we're trying to optimize and the solution found is likely wrong. I think it's better to still publish and explicitly tell the consumer the problem couldn't be solve (or at least the covariance), as opposed to not publishing, which could also mean that something else happened (e.g. it took too long to publish, a configuration issue, ...).

That being said, if I had to implement a consumer of this output, TBH I wouldn't use the pose or twist if their covariance couldn't be computed. In the best case, it might just be that the covariance is rank deficient, so likely some components of the pose or twist are close to the optimal, but there are others that aren't, but even in that case I don't see myself using only some pose or twist components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Clearing the covariance is definitely a good call.
  2. I'm somewhat hesitant to separate the covariance computation into two operations. This is a heavy computation, and is much more efficient to perform as a single operation rather than two. I'm also unsure if the current "rank deficient" error is affected by which covariance blocks are requested. I'll do some digging in the Ceres code. If the "rank deficient" error will occur regardless of what covariance blocks are requested, I am going to leave the computation as a single operation. If the covariance block request might affect the "rank deficient" error, we can discuss whether breaking them into two operations is worth the performance hit or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Ceres forms a (usually sparse) representation of the entire Jacobian, then performs a QR decomposition on the full Jacobian to determine the rank. This will occur regardless what specific covariance blocks were requested. As such I'm going to leave the position and velocity covariance requests in a single call to getCovariance().

Copy link
Collaborator

@ayrton04 ayrton04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be valuable to add a debug parameter to optionally serialize/save the graph when this happens?

@svwilliams
Copy link
Contributor Author

In an ideal world, maybe. And while I hacked together a serialize function to debug this specific problem, it is not general purpose. I need to revive the graph serialization effort at some point.

@svwilliams svwilliams merged commit f7d23ff into devel Jul 9, 2019
@svwilliams svwilliams deleted the RST-2202-underconstrained-error branch July 9, 2019 15:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants