-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception thrown while waiting for action result: RCLError "failed to add guard condition to wait set" #2163
Comments
There's a gap in the stack trace between There are a few places where
This would end up being called here in
In this case if |
shared_ptr wasn't used here. So this problem occurs. |
I have an update in flight to the executors that should make the handling of the guard conditions more thread-safe. I would be interested if you can reproduce with this PR @schornakj |
@Barry-Xu-2018 can you elaborate a bit here? i think that |
The problem is related to use Guard condition for node is defined at
While executing below codes, rclcpp/rclcpp/src/rclcpp/executor.cpp Lines 761 to 772 in 3088b53
Before calling add_handles_to_wait_set(), node may be destroyed. rclcpp/rclcpp/include/rclcpp/strategies/allocator_memory_strategy.hpp Lines 233 to 235 in 3088b53
|
In the new executor structures (https://github.com/ros2/rclcpp/pull/2143/files), I have worked around this by making the node return a These guard conditions are all added to a single waitable and held as weak pointers, which are locked right before adding to the rcl_wait_set so that we can ensure they are still valid: https://github.com/ros2/rclcpp/blob/mjcarroll/executor_structures/rclcpp/src/rclcpp/executors/executor_notify_waitable.cpp#L46-L64 |
@mjcarroll @Barry-Xu-2018 thank you very much for the explanation 👍 |
Cool! That looks like a good set of changes to a very important part of the core ROS 2 functionality. This will be somewhat tricky for me to replicate on my robots, though, since I'm currently using a specific version of Humble and I'd need to work out some sort of ad-hoc Rolling-based deployment. |
I've been working on isolating this issue, and I think it's related to creating or destroying a Subscriber in one thread while waiting on an action result future in a different thread. I'll try to create a reproducible example now that I have a clearer idea of what's going on. |
Yeah, that would be fantastic. It would be good to have a test that shows this problem, and thus shows that @mjcarroll 's fixes actually fix it (hopefully). |
Hi, If I can help, I was doing a tutorial and found this problem. The error is reproducible, which may help solve the problem by providing a testbed for this issue. https://github.com/fmrico/vector_transmission In different terminals
The error always happens when unloading the Consumer:
I hope this helps!! |
@fmrico thank you for sharing that! I've been struggling to come up with a minimal reproducible example that's independent of my project, and your test helps narrow down the problem a lot. I modified the VectorConsumer node in your example to remove everything except for a single subscriber and the crash still happens. I think that shows that the specific cause of this crash is destroying the subscriber. |
Hey folks -- I just tried @fmrico's repo and reproduction steps on the latest Rolling binaries and could still reproduce the exception. |
@sea-bass the latest rolling binaries aren't expected to fix this. I tried @fmrico steps #2163 (comment) to reproduce with the open PR #2142 from @mjcarroll and this seems to work.
|
I'm also running into this segfault on
|
This issue has been mentioned on ROS Discourse. There might be relevant details there: |
Bug report
Required Info:
2023-01-27
sync snapshotSteps to reproduce issue
Calling
rclcpp_action::Client::async_get_result()
afterrclcpp_action::Client::async_send_goal()
sometimes throwsrclcpp::exceptions::RCLError
with a "failed to add guard condition to wait set: guard condition implementation is invalid" message.While I don't have a totally-atomic code sample that reproduces this issue independent of my project, the code in question is similar to this:
Expected behavior
If the goal_handle future returned from
async_send_goal()
is set (indicating that the goal request was accepted), then I should always be able to use the goal_handle to wait for the action result.Actual behavior
For some small proportion of action goals (on the order of 1 in 100), an exception is thrown:
Additional information
Based on the stack trace, this looks like a problem at the intersection of the rclcpp action client and the MultiThreadedExecutor, which is a place where I've had issues roughly similar to this one in the past.
The text was updated successfully, but these errors were encountered: