-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve component_manager_isolated shutdown #2085
Conversation
@clalancette Now that I have this implemented, one open question is that if the |
This eliminates a potential hang when the isolated container is being shutdown via the rclcpp SignalHandler Co-authored-by: Chris Lalancette <clalancette@openrobotics.org> Signed-off-by: Michael Carroll <michael@openrobotics.org>
0dda81c
to
8017ca7
Compare
@mjcarroll I think that the Without the |
Chris and I looked at this together, and we thought that the safest thing was to include both. The context check by itself was enough to prevent it from hanging on my local computer, but that does not guarantee that it is completely safe. The issue stems from a little bit of under-documentation in the way that the executor's cancel works. The question is, should an executor be allowed to A potential alternative would be to The original fix here was to delay until the executor is spinning, but we ended up in a state where the executor is not spinning so the loop cannot be broken. |
Signed-off-by: Michael Carroll <michael@openrobotics.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only left suggestions to clean up comments a bit. Otherwise, this looks good to me.
rclcpp_components/include/rclcpp_components/component_manager_isolated.hpp
Outdated
Show resolved
Hide resolved
…isolated.hpp Signed-off-by: Michael Carroll <michael@openrobotics.org> Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
This eliminates a potential hang when the isolated container is being shutdown via the rclcpp SignalHandler. Signed-off-by: Michael Carroll <michael@openrobotics.org> Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
This eliminates a potential hang when the isolated container is being shutdown via the rclcpp SignalHandler. Signed-off-by: Michael Carroll <michael@openrobotics.org> Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
This eliminates a potential hang when the isolated container is being shutdown via the rclcpp SignalHandler. Signed-off-by: Michael Carroll <michael@openrobotics.org> Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
This eliminates a potential hang when the isolated container is being shutdown via the rclcpp SignalHandler. Signed-off-by: Michael Carroll <michael@openrobotics.org> Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
I spent 2 hours debugging this on humble before finally finding this fix 😄 Thanks for the fix! Can you please backport to humble? I cherrypicked it locally and it worked well |
This eliminates a potential hang when the isolated container is being shutdown via the rclcpp SignalHandler. Signed-off-by: Michael Carroll <michael@openrobotics.org> Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
adding explicit default constructor should be fine to keep the ABI? i think that is okay but need to be sure... |
I think that is OK, but the addition of the |
This eliminates a potential hang when the isolated container is being shutdown via the rclcpp::SignalHandler.
Previously, when the SignalHandler triggers the shutdown of the rest of the system, the ComponentManagerIsolated object would cancel and destroy the executor associated with each loaded component.
Part of the cancel process checks if the executor is currently spinning as there may have been a race condition where
cancel_executor
is called before the thread is created. While I didn't personally observe this condition, the comment indicates that it may happen.The issue is that there is a race where the executor is already shutdown/cancelled (so is_spinning is always false), causing an infinite loop.
This code introduces a guard atomic to check that the loop has been started before continuing to check the
is_spinning
as well as checking the overall ROS context for shutdown.Closes #2083
Signed-off-by: Michael Carroll michael@openrobotics.org