Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protect rcl_node_init()/fini() with rcl logging mutex. #1124

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions rclcpp/include/rclcpp/context.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,11 @@ class Context : public std::enable_shared_from_this<Context>
return sub_context;
}

/// Return the global logging configure mutex.
RCLCPP_PUBLIC
std::shared_ptr<std::mutex>
get_rcl_logging_configure_mutex();
Copy link
Contributor

@rotu rotu May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared_ptr in the return signature seems unnecessary. Returning by (const) reference would express intention better since:

  • The return value should never be null
  • The caller should not determine the lifetime of the returned object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be marked const since it shouldn't mutate the Context

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose to hold a shared_ptr to the mutex in #998 because of the following reason:

  • We have the default context, with static lifetime.
  • The global logging mutex also have static lifetime.
  • According to the standard, objects with static lifetime created in function scope in a single translation unit are created on first use, and destructed in the opposite order they well created. There's no ordering ensured for objects in different translation units.

Then imagine the following situation:

  • An user has a program that uses two contexts: the default one and another one with static lifetime.
  • One of both contexts get initialized first (doesn't matter which one), get_global_logging_configure_mutex is called and then the mutex is initialized (on first use).
  • The other context is created afterwards.
  • As destruction ordering is not ensured between different translation units, a possible destruction order is:
    • context1
    • mutex
    • context2

In case shutdown for both contexts wasn't explicitly called the program, the Context destructor will call it and that takes the logging mutex, which can segfault if the mutex was previously destroyed.

That's the long story of why using the shared pointer ...

Having say that, I should've added a comment in the code. The rationale is not obvious at all.


protected:
// Called by constructor and destructor to clean up by finalizing the
// shutdown rcl context and preparing for a new init cycle.
Expand Down
6 changes: 6 additions & 0 deletions rclcpp/src/rclcpp/context.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,12 @@ Context::interrupt_all_wait_sets()
}
}

std::shared_ptr<std::mutex>
Context::get_rcl_logging_configure_mutex()
{
return logging_configure_mutex_;
}

void
Context::clean_up()
{
Expand Down
21 changes: 15 additions & 6 deletions rclcpp/src/rclcpp/node_interfaces/node_base.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,13 @@ NodeBase::NodeBase(
// Create the rcl node and store it in a shared_ptr with a custom destructor.
std::unique_ptr<rcl_node_t> rcl_node(new rcl_node_t(rcl_get_zero_initialized_node()));

ret = rcl_node_init(
rcl_node.get(),
node_name.c_str(), namespace_.c_str(),
context_->get_rcl_context().get(), &rcl_node_options);
{
std::lock_guard<std::mutex> guard(*(context_->get_rcl_logging_configure_mutex()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two subtle bugs here due to using a shared_ptr:

  1. What if it's null?
  2. What if the object is deleted after the shared_ptr is deterrence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it's null?

We should add a check and handle that case correctly.

What if the object is deleted after the shared_ptr is deterrence?

If the Node holds the shared pointer, that's avoided.

ret = rcl_node_init(
rcl_node.get(),
node_name.c_str(), namespace_.c_str(),
context_->get_rcl_context().get(), &rcl_node_options);
}
if (ret != RCL_RET_OK) {
// Finalize the interrupt guard condition.
finalize_notify_guard_condition();
Expand Down Expand Up @@ -123,8 +126,14 @@ NodeBase::NodeBase(

node_handle_.reset(
rcl_node.release(),
[](rcl_node_t * node) -> void {
if (rcl_node_fini(node) != RCL_RET_OK) {
[this](rcl_node_t * node) -> void {
rcl_ret_t ret;
{
std::lock_guard<std::mutex> guard(
*(this->context_->get_rcl_logging_configure_mutex()));
ret = rcl_node_fini(node);
}
if (ret != RCL_RET_OK) {
RCUTILS_LOG_ERROR_NAMED(
"rclcpp",
"Error in destruction of rcl node handle: %s", rcl_get_error_string().str);
Expand Down