Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix action server deadlock issue that caused by other mutexes locked in CancelCallback #1635

Merged

Conversation

KavenYau
Copy link
Contributor

Resolves the deadlock issue mentioned in ros-navigation/navigation2#2273 and ros-navigation/navigation2#2304

The unit test which I added could also reproduce the issue.

@KavenYau KavenYau changed the title Fix action deadlock issue that caused by other mutexes locked in CancelCallback Fix action server deadlock issue that caused by other mutexes locked in CancelCallback Apr 15, 2021
Signed-off-by: Kaven Yau <love29881460@qq.com>
@KavenYau
Copy link
Contributor Author

KavenYau commented Apr 16, 2021

It's also related to #1599.

        if (!handle->is_active()) {
          return rclcpp_action::CancelResponse::REJECT;
        }
        return rclcpp_action::CancelResponse::ACCEPT;

if did not add the if condition to CancelCallback to check whether the goal active, it would throw exception.

@KavenYau KavenYau force-pushed the fix-goal_handle-race-condition-issue branch from 9c76968 to 4de34cd Compare April 16, 2021 01:41
Signed-off-by: Kaven Yau <love29881460@qq.com>
@KavenYau KavenYau force-pushed the fix-goal_handle-race-condition-issue branch from 4de34cd to cd78da4 Compare April 16, 2021 01:45
@clalancette clalancette added this to In progress in Galactic via automation Apr 16, 2021
@SteveMacenski
Copy link
Collaborator

@clalancette I leave it to you if you think this bug patch should make it into Galactic, but it would be very nice if it did since it impacts a number of Nav2 users 😄

Love it, thanks for the patch @KavenYau! Hopefully this is the last-last race condition issue in the action server we'll need to fix 😄

@clalancette
Copy link
Contributor

So the good news is that as it stands, this patch is ABI compatible. So we could do it in a Galactic patch release.

That said, this would be a good fix to have (and is one of the reasons to do testing at this point). So I've put it on the Galactic project board to look at. That doesn't guarantee we'll get it in, but we'll definitely consider it.

@SteveMacenski
Copy link
Collaborator

Let me know if I can help speed things up - its overall a pretty simple change so I'd hope a maintainer here could review it shortly

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this will require all cancel callbacks for servers to check the is_active() state... which isn't ideal, imo.

Also, I'm concerned we could be allowing a goal to change states from SUCCEEDED or ABORTED to CANCELED erroneously.

I don't feel confident approving this change myself, I think we need more eyes on this.

Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change proposed in this PR looks okay to me. Correct me if I'm wrong, but I think the race related to transitioning to canceling and transitioning to a terminal state is not a new bug being introduced (it already exists).

@KavenYau
Copy link
Contributor Author

Is there any progress on this? 😁

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fix itself looks good to me.

@fujitatomoya
Copy link
Collaborator

@jacobperron @wjwwood it would be probably better if either of you take a look again? and about exception problem, i think we can discuss that on #1599?

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm

rclcpp_action/test/test_server.cpp Show resolved Hide resolved
Co-authored-by: William Woodall <william+github@osrfoundation.org>
Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@jacobperron jacobperron added this to Proposed in Foxy Patch Release 5 via automation Apr 22, 2021
@fujitatomoya
Copy link
Collaborator

it seems windows job got stuck to clear out docker containers?

00:00:09.572 C:\J\workspace\ci_windows>powershell -Command "if ($(docker ps -q) -ne $null) { docker stop $(docker ps -q)}" 
...

@jacobperron
Copy link
Member

I've retriggered the Windows build.

@fujitatomoya
Copy link
Collaborator

@jacobperron thanks 👍

all green, i think this can be merged.

@clalancette
Copy link
Contributor

Should we consider backporting this one to Galactic?

@clalancette clalancette moved this from In progress to Done in Galactic Apr 26, 2021
@SteveMacenski
Copy link
Collaborator

SteveMacenski commented Apr 26, 2021

I think so 😄

Edit: Can this be backported to Foxy as well?

@KavenYau
Copy link
Contributor Author

KavenYau commented Apr 27, 2021

Should I backport this to Foxy / Galactic?

KavenYau added a commit to KavenYau/rclcpp that referenced this pull request Apr 27, 2021
@clalancette
Copy link
Contributor

Should I backport this to Foxy / Galactic?

Definitely Galactic. I haven't looked at the code in Foxy, but if it applies there and doesn't break ABI, then it can be backported to Foxy as well.

@jacobperron
Copy link
Member

I've already added this to the Foxy project board.

@KavenYau
Copy link
Contributor Author

I've already added this to the Foxy project board.

Dose it mean that I don't need to make a backporting PR for it because it's already in plan?

@clalancette
Copy link
Contributor

Dose it mean that I don't need to make a backporting PR for it because it's already in plan?

It still needs someone to do the work for both Galactic and Foxy, so if you are willing to do it, it would be appreciated.

@ivanpauno
Copy link
Member

@Mergifyio backport galactic

mergify bot pushed a commit that referenced this pull request Apr 29, 2021
…in CancelCallback (#1635)

* Fix deadlock issue that caused by other mutexes locked in CancelCallback

Signed-off-by: Kaven Yau <love29881460@qq.com>

* Add unit test for rclcpp action server deadlock

Signed-off-by: Kaven Yau <love29881460@qq.com>

* Update rclcpp_action/test/test_server.cpp

Co-authored-by: William Woodall <william+github@osrfoundation.org>

Co-authored-by: Kaven Yau <love29881460@qq.com>
Co-authored-by: Jacob Perron <jacob@openrobotics.org>
Co-authored-by: William Woodall <william+github@osrfoundation.org>
(cherry picked from commit fba080c)
@mergify
Copy link

mergify bot commented Apr 29, 2021

Command backport galactic: success

Backports have been created

ivanpauno pushed a commit that referenced this pull request Apr 30, 2021
…in CancelCallback (#1635) (#1646)

* Fix deadlock issue that caused by other mutexes locked in CancelCallback

Signed-off-by: Kaven Yau <love29881460@qq.com>

* Add unit test for rclcpp action server deadlock

Signed-off-by: Kaven Yau <love29881460@qq.com>

* Update rclcpp_action/test/test_server.cpp

Co-authored-by: William Woodall <william+github@osrfoundation.org>

Co-authored-by: Kaven Yau <love29881460@qq.com>
Co-authored-by: Jacob Perron <jacob@openrobotics.org>
Co-authored-by: William Woodall <william+github@osrfoundation.org>
(cherry picked from commit fba080c)

Co-authored-by: Kaven Yau <kavenyau@foxmail.com>
KavenYau added a commit to KavenYau/rclcpp that referenced this pull request May 2, 2021
…in CancelCallback (ros2#1635)

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
ivanpauno pushed a commit that referenced this pull request May 5, 2021
…in CancelCallback (#1635) (#1654)

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
@jacobperron jacobperron removed this from Proposed in Foxy Patch Release 5 Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Galactic
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

7 participants