-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix action server deadlock issue that caused by other mutexes locked in CancelCallback #1635
Fix action server deadlock issue that caused by other mutexes locked in CancelCallback #1635
Conversation
Signed-off-by: Kaven Yau <love29881460@qq.com>
9fe631a
to
9c76968
Compare
It's also related to #1599. if (!handle->is_active()) {
return rclcpp_action::CancelResponse::REJECT;
}
return rclcpp_action::CancelResponse::ACCEPT; if did not add the if condition to CancelCallback to check whether the goal active, it would throw exception. |
9c76968
to
4de34cd
Compare
Signed-off-by: Kaven Yau <love29881460@qq.com>
4de34cd
to
cd78da4
Compare
@clalancette I leave it to you if you think this bug patch should make it into Galactic, but it would be very nice if it did since it impacts a number of Nav2 users 😄 Love it, thanks for the patch @KavenYau! Hopefully this is the last-last race condition issue in the action server we'll need to fix 😄 |
So the good news is that as it stands, this patch is ABI compatible. So we could do it in a Galactic patch release. That said, this would be a good fix to have (and is one of the reasons to do testing at this point). So I've put it on the Galactic project board to look at. That doesn't guarantee we'll get it in, but we'll definitely consider it. |
Let me know if I can help speed things up - its overall a pretty simple change so I'd hope a maintainer here could review it shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this will require all cancel callbacks for servers to check the is_active()
state... which isn't ideal, imo.
Also, I'm concerned we could be allowing a goal to change states from SUCCEEDED or ABORTED to CANCELED erroneously.
I don't feel confident approving this change myself, I think we need more eyes on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change proposed in this PR looks okay to me. Correct me if I'm wrong, but I think the race related to transitioning to canceling and transitioning to a terminal state is not a new bug being introduced (it already exists).
Is there any progress on this? 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fix itself looks good to me.
@jacobperron @wjwwood it would be probably better if either of you take a look again? and about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes lgtm
Co-authored-by: William Woodall <william+github@osrfoundation.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems windows job got stuck to clear out docker containers?
|
I've retriggered the Windows build. |
@jacobperron thanks 👍 all green, i think this can be merged. |
Should we consider backporting this one to Galactic? |
I think so 😄 Edit: Can this be backported to Foxy as well? |
Should I backport this to Foxy / Galactic? |
…in CancelCallback (ros2#1635)
Definitely Galactic. I haven't looked at the code in Foxy, but if it applies there and doesn't break ABI, then it can be backported to Foxy as well. |
I've already added this to the Foxy project board. |
Dose it mean that I don't need to make a backporting PR for it because it's already in plan? |
It still needs someone to do the work for both Galactic and Foxy, so if you are willing to do it, it would be appreciated. |
@Mergifyio backport galactic |
…in CancelCallback (#1635) * Fix deadlock issue that caused by other mutexes locked in CancelCallback Signed-off-by: Kaven Yau <love29881460@qq.com> * Add unit test for rclcpp action server deadlock Signed-off-by: Kaven Yau <love29881460@qq.com> * Update rclcpp_action/test/test_server.cpp Co-authored-by: William Woodall <william+github@osrfoundation.org> Co-authored-by: Kaven Yau <love29881460@qq.com> Co-authored-by: Jacob Perron <jacob@openrobotics.org> Co-authored-by: William Woodall <william+github@osrfoundation.org> (cherry picked from commit fba080c)
Command
|
…in CancelCallback (#1635) (#1646) * Fix deadlock issue that caused by other mutexes locked in CancelCallback Signed-off-by: Kaven Yau <love29881460@qq.com> * Add unit test for rclcpp action server deadlock Signed-off-by: Kaven Yau <love29881460@qq.com> * Update rclcpp_action/test/test_server.cpp Co-authored-by: William Woodall <william+github@osrfoundation.org> Co-authored-by: Kaven Yau <love29881460@qq.com> Co-authored-by: Jacob Perron <jacob@openrobotics.org> Co-authored-by: William Woodall <william+github@osrfoundation.org> (cherry picked from commit fba080c) Co-authored-by: Kaven Yau <kavenyau@foxmail.com>
…in CancelCallback (ros2#1635) Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
Resolves the deadlock issue mentioned in ros-navigation/navigation2#2273 and ros-navigation/navigation2#2304
The unit test which I added could also reproduce the issue.