-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple TF publishers can clear TF buffer in rospy kinetic #341
Comments
Thanks for the instructions how to reproduce it and debugging into it. The resolution with a lock protecting the now check does suggest there's a race condition. I'm surprised that it's being hit here, but definitely if there's two in flight and it context switches in the middle of this method to another instance it could trigger the reset inappropriately. |
I have the same bug in Indigo... |
It looks like apt-get only holds the most recent and broken version of tf2. Is there any way to fix this without installing from source? |
Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.
Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.
This effects all versions. There's no older version to roll back to in the binaries. I'll roll a new release shortly. PRs are in for both active development branches. #344 and backport to #345 I believe that the lock is the right thing to do. It's necessary for the check and set. I looked at refactoring the clear call to be outside the lock since it will take more time, but it doesn't really make sense to test for reset while it's still clearing anyway so I kept it inside the lock. |
Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.
Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.
Hi @tfoote is it possible that the same issue occurs in the C++ side? I see the And we are getting the message in https://github.com/ros/geometry2/blob/noetic-devel/tf2_ros/src/transform_listener.cpp#L108 and running out of ideas for the cause. Thanks |
I guess there's no reason it shouldn't be able to happen in c++ too. If you're getting that error that's about the only option as long as you believe time is monotonic. Have you validated that you're not actually getting time jumps possibly from something like an NTP client? It's a very short window and would have to be a race likely with thread preemption. If you can add a lock there to protect and that resolves it then it's more of an issue than I'd have imagined. |
Hi @tfoote We actually have an update on it, and yes, you are right, it is a problem with both, not having monotonic time in some parts and also NTP corrections, in our case. While investigating, we implemented the same python test mentioned above in C++ and in fact we didn't experience the issue without any lock-protection in the C++ side, fortunately. For the monotonic part, we replicated the current All together, we are good so far. Many thanks again for the support, and confirming our findings. You can keep the issue closed ;) |
I recently had a problem with TFs clearing with a message similar to "Detected jump back in time of 0.000130s. Clearing TF buffer.", wich caused all of the nodes relying on TF information to break.
Looking at the code, it seems this pull request #329 was a fault, with the backport in #336 to Kinetic.
To confirm the issue, I tested with a simple package with multiple TF broadcasters and a single TF listener in python.
The package is available here: https://github.com/redstorm45/poc-tf-kinetic
I was able to reproduce the behavior, but testing on melodic showed no error.
I suspect a change in the way subscribers are handled between the two versions.
I was able to stop the problem from happening by adding a lock to the whole check_for_reset function here:
geometry2/tf2_ros/src/tf2_ros/transform_listener.py
Lines 67 to 72 in 764804d
Therefore, I think the last_update time is updated while another thread is checking against it, causing a buffer reset (which is not needed in that case).
I don't think this is the right solution, but the bug needs to be addressed in Kinetic.
The text was updated successfully, but these errors were encountered: