-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'use_sim_time' only works when set dynamically, not through launch parameters #595
Comments
I cannot reproduce the bug as described requiring user interaction. I do find a small race condition on startup that the clock does not get initialized until after the node has been spun at least once.
Running the slightly modified program https://github.com/tfoote/bug595/blob/master/bug_595.cpp Note that the commented out get_parameter in the before spin section will have the side effect of causing the parameter value to be evaluated. https://github.com/tfoote/bug595/blob/8bca0a2c7a33dee53a247b8c12a669d5e24bd1d4/bug_595.cpp#L17 Note that I was substituting for gazebo a rostopic publisher sending Without the parameter set in the launch file I get the following expected behavior.
I believe that we should make sure that the parameters get propogated into the node on startup before a spin is called if they are set prior to node startup. But without being able to reproduce the full latency going until the user interaction I don't know what to debug. My only speculation is that somehow it's a thread starvation where the spinning doesn't get to the parameter callbacks because it's running other callbacks etc. @mkhansen-intel @crdelsey do you see this behavior with this sample package? |
@tfoote I added your example package to our workspace and ran it like this
The output I get is
It never changes to sim time till I run
in a separate terminal. In this output example, I called ros2 param set as above between count 5 and 6, so 6 show the correct sim time.
My ros2 code is from about a week ago. I'll try updating to the latest and see if the problem still occurs. |
Well, updating the ros2 source a few minutes ago made the problem go away using the bug595 example code. I'll retry the nav stack this afternoon after I get my source tree in working order again. |
I think that this is fundamentally a race condition at startup which explains why we're seeing it behave differently under different circumstances (such as the nav stack). The code currently relies on creating a subscription when a node is attached: rclcpp/rclcpp/src/rclcpp/time_source.cpp Line 81 in 33f1e17
This would explain why it has to be manually triggered by an external state change. The event will get fired and the time_source will synchronize with the node's parameter when the new set param call gets executed. Usually the call would be a no-op but since the time_source is not initialized with the state of the node it updates the time_source. The "simple" solution would be to synchronously query the parameter state after declaring the event callback. So at worst you would get a duplicate callback for an event occurring between registering the callback and reading the parameter. And you wouldn't miss any state. However we don't have an executor in this context to use to spin/wait on for the synchronous behavior as is done in the synchronous parameter client. The clock interface is also initialized before the parameter interface now because the parameter interface uses the clock interface to send events. |
The bug595 code consistently seems to work now, but the nav2 stack seems to be consistently failing still. I'll have to try some experiments. In some cases the observable time is set in the constructor before the first spin, in other cases, the nodes with observable times are composed nodes, or are in namespace. @tfoote From your description, it sounds like the TimeSource does not need to call |
So I've now dug into how this is effecting things and as @chapulina discovered there's actually two cycles necessary for the clock to get updated. The underlying flow of events is that:
The underlying issue is that the clock is relying on the published events and cannot react until the node is spun to first get the parameter and then the clock. If the clock entity could check the parameter at startup and immediately subscribe the behavior would be much more intuitive. Also we should make it so that the node internals do not require external communications for this sort of configuration changes. The obvious solution is to give the clock a pointer to the parameter interface, but that make a loop because the parameter interface already has a pointer to the clock. So to clean this up we can follow the same pattern we're already using to separate the clock from the time source. Such that the clock is just keeping the time, and the parameter can reference it appropriately. And a new time_source interface can query the parameter, subscribe to the clock topic and update the clock when appropriate. To do this we will need a new time_source_interface, which will just split out the clock and time_source datatypes already contained in the clock_interface. The second level of improvement would be that the parameter_interface be extended to provide a way to register a callback for changes to the parameters so that the time_source can get callbacks without relying on external communications. Once that callback functionality is provided the parameter event publishing can actually be separated into another component that can use the callback notifications to publish the events externally. This would also have the benefit of breaking the dependency of the parameters on the clock. And lastly it could be leveraged to deal with startup conditions to defer the publishing of events until the clock is known to be initialized to avoid all startup parameters always being published at walltime. Thanks to @wjwwood for sketching this through with me. Here's our notes from the whiteboard. |
Second fix for #595 This was causing it to require /clock topic data or a parameter change becaue the clock was attached after the node was attached. Doing it in the other order would have worked. Also homogenizing the behavior to use the last recieved value otherwise zero time when enabling sim time.
Way to go! That did the trick. Everything is coming up with sim_time_enabled now. :-) |
Resolves #595 * Separate the Node Time Source from the Node Clock * Implement initial value checking of use_sim_time parameter parameter * Be sure to update all newly attached clocks * Homogenizing the behavior to use the last received value otherwise zero time when enabling sim time. * Add virtual destructors to interface classes
Resolves ros2#595 * Separate the Node Time Source from the Node Clock * Implement initial value checking of use_sim_time parameter parameter * Be sure to update all newly attached clocks * Homogenizing the behavior to use the last received value otherwise zero time when enabling sim time. * Add virtual destructors to interface classes
Bug report
Required Info:
Steps to reproduce issue
Launch file:
Source file:
Expected behavior
When I set 'use_sim_time' through a launch file, it should start using the simulation clock from gazebo as soon as it is available. This is seen in the output below as 'Node Time'
Actual behavior
The node won't actually switch to sim time unless use_sim_time is set dynamically after the node has launched.
Additional information
Here you can see the parameter was set already by the launch file, thus the param 'get' returned True. But the node time didn't change until I set it a 2nd time dynamically.
Credit to @crdelsey who helped to find / isolate this bug and wrote the test node
The text was updated successfully, but these errors were encountered: