Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix occasionally missing goal result caused by race condition #1677

Merged
merged 3 commits into from May 20, 2021

Conversation

KavenYau
Copy link
Contributor

Fix occasionally missing goal result when a action finished quickly (one thread enter to execute_result_request_received and the other enter to publish_result).

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me.

@@ -627,19 +627,19 @@ ServerBase::publish_result(const GoalUUID & uuid, std::shared_ptr<void> result_m
}

{
std::lock_guard<std::recursive_mutex> lock(pimpl_->unordered_map_mutex_);
std::lock_guard<std::recursive_mutex> unordered_map_lock(pimpl_->unordered_map_mutex_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me, but it can introduce a ptential deadlock in the future.
I would rather:

  • Document in which order we're currently taking the unordered_map_mutex_, action_server_reentrant_mutex_ mutexes, to avoid someone doing it in the opposite order later.
  • Take action_server_reentrant_mutex_ out of the if/loop together with unordered_map_mutex_ using std::scoped_lock.

Copy link
Contributor Author

@KavenYau KavenYau May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With using std::scoped_lock., then it could not be backported to foxy? Because foxy is compiled with C++14.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but please add documentation explaining what's the current order we're taking the mutexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a NOTE comment to this block scope.

rclcpp_action/src/server.cpp Outdated Show resolved Hide resolved
@@ -500,9 +500,13 @@ ServerBase::execute_result_request_received(std::shared_ptr<void> & data)
result_response = create_result_response(action_msgs::msg::GoalStatus::STATUS_UNKNOWN);
} else {
// Goal exists, check if a result is already available
std::lock_guard<std::recursive_mutex> lock(pimpl_->unordered_map_mutex_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
std::lock_guard<std::recursive_mutex> lock(pimpl_->unordered_map_mutex_);
std::lock_guard lock(pimpl_->unordered_map_mutex_);

@@ -627,19 +627,19 @@ ServerBase::publish_result(const GoalUUID & uuid, std::shared_ptr<void> result_m
}

{
std::lock_guard<std::recursive_mutex> lock(pimpl_->unordered_map_mutex_);
std::lock_guard<std::recursive_mutex> unordered_map_lock(pimpl_->unordered_map_mutex_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
std::lock_guard<std::recursive_mutex> unordered_map_lock(pimpl_->unordered_map_mutex_);
std::lock_guard unordered_map_lock(pimpl_->unordered_map_mutex_);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This two changes are inconsistent with other std::lock_guard in server.cpp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, we were using c++14 and this was not possible, so we're going to find inconsistencies.
The idea is to use the less verbose version in new code.

In this case, a backport PR will need to be modified to be compliant with c++14 again.
I think that's fine, but it's also fine if you want to keep the verbose version here.

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
@KavenYau KavenYau force-pushed the fix-occasionally-missing-goal-result branch from 6b7ef43 to 2492d7c Compare May 20, 2021 01:17
@fujitatomoya
Copy link
Collaborator

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@ivanpauno
Copy link
Member

Failures are unrelated

@ivanpauno ivanpauno merged commit 1c61751 into ros2:master May 20, 2021
@fujitatomoya
Copy link
Collaborator

@ivanpauno we would want this to backport to foxy and galactic? what you think?

@SteveMacenski
Copy link
Collaborator

I would like that 😄

@KavenYau KavenYau deleted the fix-occasionally-missing-goal-result branch May 21, 2021 01:11
fujitatomoya pushed a commit to fujitatomoya/rclcpp that referenced this pull request May 21, 2021
)

* Fix occasionally missing goal result caused by race condition

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

* Take action_server_reentrant_mutex_ out of the sending result loop

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

* add note for explaining the current locking order in server.cpp

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
fujitatomoya pushed a commit to fujitatomoya/rclcpp that referenced this pull request May 21, 2021
)

* Fix occasionally missing goal result caused by race condition

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

* Take action_server_reentrant_mutex_ out of the sending result loop

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

* add note for explaining the current locking order in server.cpp

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
fujitatomoya added a commit that referenced this pull request May 25, 2021
…#1683)

* Fix occasionally missing goal result caused by race condition

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

* Take action_server_reentrant_mutex_ out of the sending result loop

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

* add note for explaining the current locking order in server.cpp

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

Co-authored-by: Kaven Yau <kavenyau@foxmail.com>
jacobperron pushed a commit that referenced this pull request May 25, 2021
…#1682)

* Fix occasionally missing goal result caused by race condition

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

* Take action_server_reentrant_mutex_ out of the sending result loop

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

* add note for explaining the current locking order in server.cpp

Signed-off-by: Kaven Yau <kavenyau@foxmail.com>

Co-authored-by: Kaven Yau <kavenyau@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants