-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rclcpp_action server could cause deadlock with its wrapper's mutex #1285
Comments
And to be clear, this isn't just a specific issue with Navigation2, any users that have mutex locks on their resources will run into this problem. We typically don't run into this issue because we don't in quick succession call an action server immediately after returning a result. But clearly there are cases where that would be expected behavior. If you were to use actions as workers though, you'd absolutely run into this issue more frequently. |
|
* unlock action_server_reentrant_mutex_ before calling user callback functions add an additional lock to keep previous behavior broken by deadlock fix Also add a test case to reproduce deadlock situation in rclcpp_action Signed-off-by: Daisuke Sato <daisukes@cmu.edu>
* Add missing locking to the rclcpp_action::ServerBase. (#1421) This patch actually does 4 related things: 1. Renames the recursive mutex in the ServerBaseImpl class to action_server_reentrant_mutex_, which makes it a lot clearer what it is meant to lock. 2. Adds some additional error checking where checks were missed. 3. Adds a lock to publish_status so that the action_server structure is protected. Signed-off-by: Chris Lalancette <clalancette@openrobotics.org> * [backport] Fix action server deadlock (#1285, #1313) Signed-off-by: Daisuke Sato <daisukes@cmu.edu> * revert comment Signed-off-by: Daisuke Sato <daisukes@cmu.edu> Co-authored-by: Chris Lalancette <clalancette@openrobotics.org>
* unlock action_server_reentrant_mutex_ before calling user callback functions add an additional lock to keep previous behavior broken by deadlock fix Also add a test case to reproduce deadlock situation in rclcpp_action Signed-off-by: Daisuke Sato <daisukes@cmu.edu>
Bug report
Required Info:
Steps to reproduce issue
The original issue is here and I think this is rclcpp issue
ros-navigation/navigation2#1961
Expected behavior
Actual behavior
Problem
user callbacks are called in the
reentlant_mutex_
lock contextin another thread, the wrapper first lock its mutex
reentlant_mutex_
This inconsistent locking order cause deadlocks.
So I think rclcpp_action needs to call user callbacks outside of
reentlant_mutex_
lock context.Additional information
Here is gdb information
Thread 9 and 14 are trying to hold two mutex in different order
https://gist.github.com/daisukes/712316a97832f5d9ab851ad47c77ad98
Thread 9: server thread
Frame# 6 nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose, rclcpp::Node>::handle_goal
Frame# 14 rclcpp_action::ServerBase::execute_goal_request_received()
Thread 14: working thread started here
Frame# 6 rclcpp_action::ServerBase::notify_goal_terminal_state()
Frame# 11 nav2_util::SimpleActionServer<nav2_msgs::action::ComputePathToPose, rclcpp::Node>::succeeded_current
The text was updated successfully, but these errors were encountered: