Multiple TF publishers can clear TF buffer in rospy kinetic #341

redstorm45 · 2018-11-13T19:54:07Z

I recently had a problem with TFs clearing with a message similar to "Detected jump back in time of 0.000130s. Clearing TF buffer.", wich caused all of the nodes relying on TF information to break.
Looking at the code, it seems this pull request #329 was a fault, with the backport in #336 to Kinetic.

To confirm the issue, I tested with a simple package with multiple TF broadcasters and a single TF listener in python.
The package is available here: https://github.com/redstorm45/poc-tf-kinetic
I was able to reproduce the behavior, but testing on melodic showed no error.
I suspect a change in the way subscribers are handled between the two versions.

I was able to stop the problem from happening by adding a lock to the whole check_for_reset function here:

geometry2/tf2_ros/src/tf2_ros/transform_listener.py

Lines 67 to 72 in 764804d

    
           def check_for_reset(self): 
        
               now = rospy.Time.now() 
        
               if now < self.last_update: 
        
                   rospy.logwarn("Detected jump back in time of %fs. Clearing TF buffer." % (self.last_update - now).to_sec()) 
        
                   self.buffer.clear() 
        
               self.last_update = now

Therefore, I think the last_update time is updated while another thread is checking against it, causing a buffer reset (which is not needed in that case).

I don't think this is the right solution, but the bug needs to be addressed in Kinetic.

tfoote · 2018-11-14T05:59:10Z

Thanks for the instructions how to reproduce it and debugging into it. The resolution with a lock protecting the now check does suggest there's a race condition. I'm surprised that it's being hit here, but definitely if there's two in flight and it context switches in the middle of this method to another instance it could trigger the reset inappropriately.

brinij · 2018-11-15T13:23:15Z

I have the same bug in Indigo...

Achllle · 2018-11-16T00:36:06Z

It looks like apt-get only holds the most recent and broken version of tf2. Is there any way to fix this without installing from source?

Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.

tfoote · 2018-11-16T22:48:50Z

This effects all versions. There's no older version to roll back to in the binaries. I'll roll a new release shortly. PRs are in for both active development branches. #344 and backport to #345

I believe that the lock is the right thing to do. It's necessary for the check and set. I looked at refactoring the clear call to be outside the lock since it will take more time, but it doesn't really make sense to test for reset while it's still clearing anyway so I kept it inside the lock.

Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.

carlosjoserg · 2020-12-30T14:44:00Z

Hi @tfoote is it possible that the same issue occurs in the C++ side?

I see the check_for_reset equivalent here https://github.com/ros/geometry2/blob/noetic-devel/tf2_ros/src/transform_listener.cpp#L106-L111 wihtout the protection applied in #344

And we are getting the message in https://github.com/ros/geometry2/blob/noetic-devel/tf2_ros/src/transform_listener.cpp#L108 and running out of ideas for the cause.

Thanks

tfoote · 2021-02-11T10:19:52Z

I guess there's no reason it shouldn't be able to happen in c++ too. If you're getting that error that's about the only option as long as you believe time is monotonic. Have you validated that you're not actually getting time jumps possibly from something like an NTP client?

It's a very short window and would have to be a race likely with thread preemption. If you can add a lock there to protect and that resolves it then it's more of an issue than I'd have imagined.

carlosjoserg · 2021-02-11T13:18:10Z

Hi @tfoote

We actually have an update on it, and yes, you are right, it is a problem with both, not having monotonic time in some parts and also NTP corrections, in our case.

While investigating, we implemented the same python test mentioned above in C++ and in fact we didn't experience the issue without any lock-protection in the C++ side, fortunately.

For the monotonic part, we replicated the current ros::Rate using a SteadyClock, like this.

All together, we are good so far. Many thanks again for the support, and confirming our findings. You can keep the issue closed ;)

tfoote added the bug label Nov 14, 2018

tfoote added a commit that referenced this issue Nov 16, 2018

Protect the time reset logic from a race condition.

00c134e

Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.

tfoote mentioned this issue Nov 16, 2018

Protect the time reset logic from a race condition. #344

Merged

tfoote added a commit that referenced this issue Nov 16, 2018

Protect the time reset logic from a race condition.

7363b38

Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.

tfoote mentioned this issue Nov 16, 2018

Protect the time reset logic from a race condition. #345

Merged

tfoote added a commit that referenced this issue Nov 16, 2018

Protect the time reset logic from a race condition.

32404a3

Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.

tfoote closed this as completed in #344 Nov 16, 2018

tfoote added a commit that referenced this issue Nov 16, 2018

Protect the time reset logic from a race condition.

2d54652

Fixes #341 This could incorrectly trigger a buffer clear if two concurrent callbacks were invoked.

zzv2 mentioned this issue Nov 27, 2018

Jump back in time warning #347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple TF publishers can clear TF buffer in rospy kinetic #341

Multiple TF publishers can clear TF buffer in rospy kinetic #341

redstorm45 commented Nov 13, 2018

tfoote commented Nov 14, 2018

brinij commented Nov 15, 2018

Achllle commented Nov 16, 2018

tfoote commented Nov 16, 2018 •

edited

Loading

carlosjoserg commented Dec 30, 2020

tfoote commented Feb 11, 2021

carlosjoserg commented Feb 11, 2021

Multiple TF publishers can clear TF buffer in rospy kinetic #341

Multiple TF publishers can clear TF buffer in rospy kinetic #341

Comments

redstorm45 commented Nov 13, 2018

tfoote commented Nov 14, 2018

brinij commented Nov 15, 2018

Achllle commented Nov 16, 2018

tfoote commented Nov 16, 2018 • edited Loading

carlosjoserg commented Dec 30, 2020

tfoote commented Feb 11, 2021

carlosjoserg commented Feb 11, 2021

tfoote commented Nov 16, 2018 •

edited

Loading