New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix occasionally missing goal result caused by race condition #1677
Fix occasionally missing goal result caused by race condition #1677
Conversation
Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me.
@@ -627,19 +627,19 @@ ServerBase::publish_result(const GoalUUID & uuid, std::shared_ptr<void> result_m | |||
} | |||
|
|||
{ | |||
std::lock_guard<std::recursive_mutex> lock(pimpl_->unordered_map_mutex_); | |||
std::lock_guard<std::recursive_mutex> unordered_map_lock(pimpl_->unordered_map_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me, but it can introduce a ptential deadlock in the future.
I would rather:
- Document in which order we're currently taking the
unordered_map_mutex_
,action_server_reentrant_mutex_
mutexes, to avoid someone doing it in the opposite order later. - Take
action_server_reentrant_mutex_
out of the if/loop together withunordered_map_mutex_
usingstd::scoped_lock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With using std::scoped_lock.
, then it could not be backported to foxy
? Because foxy
is compiled with C++14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, but please add documentation explaining what's the current order we're taking the mutexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a NOTE comment to this block scope.
@@ -500,9 +500,13 @@ ServerBase::execute_result_request_received(std::shared_ptr<void> & data) | |||
result_response = create_result_response(action_msgs::msg::GoalStatus::STATUS_UNKNOWN); | |||
} else { | |||
// Goal exists, check if a result is already available | |||
std::lock_guard<std::recursive_mutex> lock(pimpl_->unordered_map_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
std::lock_guard<std::recursive_mutex> lock(pimpl_->unordered_map_mutex_); | |
std::lock_guard lock(pimpl_->unordered_map_mutex_); |
@@ -627,19 +627,19 @@ ServerBase::publish_result(const GoalUUID & uuid, std::shared_ptr<void> result_m | |||
} | |||
|
|||
{ | |||
std::lock_guard<std::recursive_mutex> lock(pimpl_->unordered_map_mutex_); | |||
std::lock_guard<std::recursive_mutex> unordered_map_lock(pimpl_->unordered_map_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
std::lock_guard<std::recursive_mutex> unordered_map_lock(pimpl_->unordered_map_mutex_); | |
std::lock_guard unordered_map_lock(pimpl_->unordered_map_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This two changes are inconsistent with other std::lock_guard
in server.cpp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, we were using c++14 and this was not possible, so we're going to find inconsistencies.
The idea is to use the less verbose version in new code.
In this case, a backport PR will need to be modified to be compliant with c++14 again.
I think that's fine, but it's also fine if you want to keep the verbose version here.
Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
6b7ef43
to
2492d7c
Compare
Failures are unrelated |
@ivanpauno we would want this to backport to foxy and galactic? what you think? |
I would like that 😄 |
) * Fix occasionally missing goal result caused by race condition Signed-off-by: Kaven Yau <kavenyau@foxmail.com> * Take action_server_reentrant_mutex_ out of the sending result loop Signed-off-by: Kaven Yau <kavenyau@foxmail.com> * add note for explaining the current locking order in server.cpp Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
) * Fix occasionally missing goal result caused by race condition Signed-off-by: Kaven Yau <kavenyau@foxmail.com> * Take action_server_reentrant_mutex_ out of the sending result loop Signed-off-by: Kaven Yau <kavenyau@foxmail.com> * add note for explaining the current locking order in server.cpp Signed-off-by: Kaven Yau <kavenyau@foxmail.com>
backpork PRs are open for foxy and galactic. |
…#1683) * Fix occasionally missing goal result caused by race condition Signed-off-by: Kaven Yau <kavenyau@foxmail.com> * Take action_server_reentrant_mutex_ out of the sending result loop Signed-off-by: Kaven Yau <kavenyau@foxmail.com> * add note for explaining the current locking order in server.cpp Signed-off-by: Kaven Yau <kavenyau@foxmail.com> Co-authored-by: Kaven Yau <kavenyau@foxmail.com>
…#1682) * Fix occasionally missing goal result caused by race condition Signed-off-by: Kaven Yau <kavenyau@foxmail.com> * Take action_server_reentrant_mutex_ out of the sending result loop Signed-off-by: Kaven Yau <kavenyau@foxmail.com> * add note for explaining the current locking order in server.cpp Signed-off-by: Kaven Yau <kavenyau@foxmail.com> Co-authored-by: Kaven Yau <kavenyau@foxmail.com>
Fix occasionally missing goal result when a action finished quickly (one thread enter to
execute_result_request_received
and the other enter topublish_result
).