Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the subscriber's destructor does not wait until currently processing callbacks are exited #2447

Closed
GoesM opened this issue Mar 11, 2024 · 8 comments

Comments

@GoesM
Copy link

GoesM commented Mar 11, 2024

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • binaries
  • Version or commit hash:
    • the latest
  • DDS implementation:
    • defaulted
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

the subscriber's destructor does not wait until currently processing callbacks are exited, which may cause troubles like UAF, null-pointer bugs.

Details are shown in navigation2 stack: ros-navigation/navigation2#4166

Expected behavior

We believe that the destructor of subscriber should have mechanisms to ensure complete release of relevant resources and fully exit ongoing works.

Actual behavior

there's no such mechanism.

Additional information

If there is no wait mechanism implemented in the subscriber's destructor to ensure that currently processing callbacks are exited beforehand, it could introduce the possibility of the freed pointer being accessed by ongoing tasks within the subscriber.

Consequently, this scenario may lead to instances of Use-After-Free (UAF) and null-pointer bugs. One illustrative example can be found within the navigation2 stack, as documented in the following issue: ros-navigation/navigation2#4166

@alsora
Copy link
Collaborator

alsora commented Mar 11, 2024

This is just an initial thought, I haven't looked at the nav2 ticket.

Destructors in c++ must not be called while an object is in use.
The destructor shouldn't wait for callbacks to end, but rather it should not be invoked until the subscription is executing callbacks.

ROS already has mechanisms to ensure this.
Subscriptions are managed through shared pointers, executors have weak ownership and will lock the pointer (thus preventing the destructor to be called) while executing a callback.

@GoesM
Copy link
Author

GoesM commented Mar 11, 2024

So, when we actively call the reset() function of the shared pointer ( an example in nav2 ), if this subscription is executing a task at the same time, the subscription pointer would not be released immediately, but will only be released after the task is completely completed. May I understand it correctly?

@mjcarroll
Copy link
Member

Correct, while a subscription callback is executing, the rclcpp::Executor is holding a std::shared_ptr to the rclcpp::Subscription. The code in reference is here:

void
Executor::execute_any_executable(AnyExecutable & any_exec)
{
if (!spinning.load()) {
return;
}
if (any_exec.timer) {
TRACETOOLS_TRACEPOINT(
rclcpp_executor_execute,
static_cast<const void *>(any_exec.timer->get_timer_handle().get()));
execute_timer(any_exec.timer);
}
if (any_exec.subscription) {
TRACETOOLS_TRACEPOINT(
rclcpp_executor_execute,
static_cast<const void *>(any_exec.subscription->get_subscription_handle().get()));
execute_subscription(any_exec.subscription);
}

Where rclcpp::AnyExecutable holds the shared_ptr here:

rclcpp::SubscriptionBase::SharedPtr subscription;

The reset() function on a shared_ptr will only decrement the internal reference count, causing the destructor be called when the count reaches 0.

@GoesM
Copy link
Author

GoesM commented Mar 11, 2024

I need to completely release the shared pointer of the subscription (or just make sure the shared pointer has been released after reset() called). May I ask if rclcpp provides related API of the subscription that can be used to simultaneously block and wait for the completion of the ongoing task of the subscription, and finally release the shared pointer?

@mjcarroll
Copy link
Member

I think there may be something problematic in the architecture of the node that you are running under ASAN. I posted some observations on the original upstream issue: ros-navigation/navigation2#4166 (comment)

@GoesM
Copy link
Author

GoesM commented Mar 11, 2024

greatly thanks for your help ~

@mjcarroll
Copy link
Member

@GoesM since this is an upstream issue, I'm going to close it out. Feel free to reopen (or open a new issue) if you find something else related here.

@GoesM
Copy link
Author

GoesM commented Mar 21, 2024

Thanks a lot, for all of your help and patience~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants