-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protect rcl_node_init()/fini() with rcl logging mutex. #1124
protect rcl_node_init()/fini() with rcl logging mutex. #1124
Conversation
Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described here #1042 (comment), this "partially" fixes the issue but does not fix it completely.
We first need some consensus on how we want to fix the issue before going ahead with a solution.
If the consensus is doing something like this, we can follow-up later.
/// Return the global logging configure mutex. | ||
RCLCPP_PUBLIC | ||
std::shared_ptr<std::mutex> | ||
get_rcl_logging_configure_mutex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shared_ptr in the return signature seems unnecessary. Returning by (const) reference would express intention better since:
- The return value should never be null
- The caller should not determine the lifetime of the returned object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be marked const since it shouldn't mutate the Context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I choose to hold a shared_ptr
to the mutex in #998 because of the following reason:
- We have the default context, with static lifetime.
- The global logging mutex also have static lifetime.
- According to the standard, objects with static lifetime created in function scope in a single translation unit are created on first use, and destructed in the opposite order they well created. There's no ordering ensured for objects in different translation units.
Then imagine the following situation:
- An user has a program that uses two contexts: the default one and another one with static lifetime.
- One of both contexts get initialized first (doesn't matter which one), get_global_logging_configure_mutex is called and then the mutex is initialized (on first use).
- The other context is created afterwards.
- As destruction ordering is not ensured between different translation units, a possible destruction order is:
- context1
- mutex
- context2
In case shutdown
for both contexts wasn't explicitly called the program, the Context
destructor will call it and that takes the logging mutex, which can segfault if the mutex was previously destroyed.
That's the long story of why using the shared pointer ...
Having say that, I should've added a comment in the code. The rationale is not obvious at all.
node_name.c_str(), namespace_.c_str(), | ||
context_->get_rcl_context().get(), &rcl_node_options); | ||
{ | ||
std::lock_guard<std::mutex> guard(*(context_->get_rcl_logging_configure_mutex())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two subtle bugs here due to using a shared_ptr:
- What if it's null?
- What if the object is deleted after the shared_ptr is deterrence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it's null?
We should add a check and handle that case correctly.
What if the object is deleted after the shared_ptr is deterrence?
If the Node
holds the shared pointer, that's avoided.
We decided to finally add mutual exclusion protection in rclcpp/rclpy instead of rcl, ros2/rcl#660 is a start on that path. We're near to release date and we are running out of time, for that reason we will continue the fix proposal within the ros2 team to streamline the implementation and iterate on reviews faster. I will probably borrow some of your code @fujitatomoya, thanks for starting a fix! |
thanks for catching up!!! |
fix candidate for #1042