Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bond breaks on forward time jumps #16

Closed
mikaelarguedas opened this issue Feb 24, 2017 · 8 comments
Closed

bond breaks on forward time jumps #16

mikaelarguedas opened this issue Feb 24, 2017 · 8 comments

Comments

@mikaelarguedas
Copy link
Member

bond is not robust to forward time jumps, if the time source is updated with a more recent time, every bond breaks (tearing down any process/nodelet relying on it).

Alternatives could be:

  1. use monotonic clock for bond timeout:
  • pros: no time jump
  • cons: will timeout if using simulated time for example. If you pause the simulation and restart it, every nodelet will die
  1. split the bond period into a bunch of smaller timeouts, and break only if all of them timeout:
  • pros: get rid of the single time jump problem
  • cons: if a time jumps occurs in the last wait of the bond period it will still crash
  1. use previous value to detect time jumps, if time jump detected reset the timeout:
  • pros: bond will never break on time jumps whatever ros time source is used
  • cons: arbitrary metric to decide at what threashold a time difference is considered a jump. Should be ok though because the bond period is much smaller than any time jumps done when updating the system clock

I'd lean toward option 3 because it respects the ros time source chosen by the user and should not break in the case of a paused simulated time or the update of the system time on a real system

@wjwwood
Copy link
Member

wjwwood commented Feb 24, 2017

In ROS 2, @tfoote and I talked about this issue and we came up with a more flexible option of a "custom time source" which could drive ROS Time. The default would basically implement ROS Time as it works in ROS 1, which is to subscribe to /clock and return time based on the values received on that topic. The reason I bring this up is this "custom time source" provides for pre and post update callbacks which can be called each time the custom time source updates the time, see:

https://github.com/ros2/rcl/blob/master/rcl/include/rcl/time.h#L59-L60

For the ROS Time implementation in ROS 1 these callbacks would be called after receiving a message on /clock but before and after setting ros::Time::now() to the value received. These callbacks could be used to detect backward jumps in time at the instance that it occurs and then provide a callback to the user when it does happen. This is only possible because ROS Time based on the /clock topic is quantized rather than continuous. So this wouldn't help with jumps in system time, unless you could get a callback from the system when this occurs (not sure if you can).

It would be a more intrusive change to ROS, but if you were to implement a method like ros::Time::register_callback_on_sim_time_jump(callback) which, after being set up, would call the given callback when time goes backward because of data from /clock. This might help bond out on this issue, because option 1 becomes more viable (at least imo) as it eliminates (or at least mitigates) the con about sim time.

@flixr
Copy link
Contributor

flixr commented Feb 25, 2017

Regarding Option 1:
I don't quite understand why it would not work with sim time....
If I see it correctly then bond always uses WallDurations and so even when running with sim time the bond timeouts are tied to WallTime (which makes total sense to me, as I don't see a reason to also pause the bond heartbeats when pausing sim time).

So it looks like we could replace WallDuration with a new "MonotonicDuration" which would work in the same way, except of not being susceptible to time jumps.

@flixr
Copy link
Contributor

flixr commented Feb 25, 2017

I quickly added a MonotonicTime class and used that in bond, which IMHO should fix the issue...
See #17 and ros/roscpp_core#55

Let me know what you think about such an approach...

@flixr
Copy link
Contributor

flixr commented Feb 25, 2017

I didn't look into the details too much, so not exactly sure what the bond message is actually used for and if we would need to also add a MonotonicTimer for publishing those...

@flixr
Copy link
Contributor

flixr commented Feb 26, 2017

added a MonotonicTimer in ros/ros_comm#1003

@wjwwood
Copy link
Member

wjwwood commented Mar 2, 2017

That's true, I don't guess the bond messages need to stop or slow down with ROS time. Using a monotonic clock sounds fine to me, especially if bond is already using wall time (system time).

@flixr
Copy link
Contributor

flixr commented Mar 6, 2017

@mikaelarguedas maybe you can test #18 as well, works nicely here...

@mikaelarguedas
Copy link
Member Author

this has been fixed for lunar in #18, if the underlying stedy timer feature is backported to other distributions I will backport #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants