Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting transform from TransformListener with old timestamp always gives latest transform #388

Closed
jkflying opened this issue Jul 3, 2019 · 6 comments

Comments

@jkflying
Copy link

jkflying commented Jul 3, 2019

We had quite a long investigation into why our camera data wasn't synchronized with the robot body frame, and eventually came all the way back to the TransformListener (which just wraps TF2 these days, hence filing this bug here). We have rapid orientation shifts, so we need to use the TF2 caching buffer to correctly transform our point cloud data at the correct time, since there is a small delay (~100ms) between when the light enters the camera and when the data is available in ROS.

Code is here:
https://github.com/PX4/avoidance/blob/e83c7abf117907f74a90eab5c57779b38db4ef9a/local_planner/src/nodes/local_planner_node.cpp#L676

We've debugged it in depth, and it seems that no matter how far in the past the timestamp for pcl_cloud is (up to the 10s of buffer), it always transforms it with the latest transform (as if the pcl_cloud timestamp was 0). We've checked the received transforms, and although the timestamps are correct for what we ask, the actual values are bitwise-identical to the latest transform.

We'll end up writing our own (simple) cache to solve this since we can't wait for upstream fixes to all permeate out into the repos, but I thought this bug report would still be valuable.

I know several other projects also don't use TF based caching due to similar issues ( eg. ethz-asl/voxblox#201 ), so I don't think we're crazy here, something is actually wrong.

This is all tested on Kinetic. Happy to answer questions and do high-level debugging, but I mostly wanted to report this. Hey, maybe we're doing something wrong 🙂

@jkflying
Copy link
Author

jkflying commented Jul 3, 2019

@baumanta FYI

@tfoote
Copy link
Member

tfoote commented Jul 11, 2019

This is quite surprising. Is there any sort of reproduction you can provide? tf lookups are accurate enough to render pointclouds from scanning lasers. My first guess is that there's several potential point cloud conversions inside transformPointCloud that look like they're happening, one of which might be failing to copy the header information properly. For debugging I'd suggest putting in a debug output of the lookupTransform next to the call with the relevant frames and times to verify that the lookup is functioning.

@tfoote
Copy link
Member

tfoote commented Jul 11, 2019

Also make sure to check the return code to make sure that it's succeeding at all. It might not be transforming anything. transformPointCloud catches the exceptions and has a return value for failure.

@jkflying
Copy link
Author

jkflying commented Jul 12, 2019

A minimal reproduction is a bit of a pain since we noticed it running on real hardware, but it was reproducible running the Local Planner with Mavros + PX4 SITL.

To debug we inlined the transformPointCloud you linked above, and examined the timestamp coming back on the transform, which was indeed correct for the timestamp we had requested. However we also retrieved the latest timestamp (by passing Time(0)) and found the actual transform (excluding timestamp) were identical to 6 decimal places (ie. all that we were printing), which wasn't expected since the vehicle was moving at the time. We increased the timestamp 'lag' up to 1s by subtracting a Duration from it before passing it into the lookupTransform and observed the same results.

The other logic around when the data was in the buffer seemed correct. Asking for data which hadn't arrived yet produced the expected exceptions, as did asking for data which was too old.

I'm not sure if maybe it is something specific to our transform chain? It shouldn't be anything crazy, we have a dynamic TF coming in over MAVROS for the FCU->local, and we have a static TF between FCU and camera.

We now have a working fix which is basically doing a cache of just the final transform results, with lerp/slerp for in between timestamps. The fact that doing this fixed our issue, and that the TF2 buffer works for other people, leads me to believe there must be some edge case in our setup which breaks the TF2 implementation.

Edit: this is the last checkout which doesn't use our new tf buffering: https://github.com/PX4/avoidance/tree/39717991297b00dd786682b325b222a506d8e3c9

@tfoote
Copy link
Member

tfoote commented Jul 19, 2019

The behavior of the transform always looking up with the same value at all times sounds a lot like the behavior of a static transform.

I have a sneaking suspicion that the extra static transform publisher I noticed here: mavlink/mavros#1278 is initializing the fcu frame as a static frame, and currently tf does not support changing the frame type after initialization. So when the dynamic frame is published later it gets reparented but the storage is not reallocated.

You should be able to rosrun tf2_tools view_frames.py to see if the frame is being considered a static or dynamic. Though the race condition for initialization might be slightly different for the different startup time.

@jkflying
Copy link
Author

This is confirmed to be a MAVROS issue, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants