Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't call spin_until_future_complete inside a callback executed by an executor #773

Open
orduno opened this issue Jun 28, 2019 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@orduno
Copy link

orduno commented Jun 28, 2019

Feature request

Feature description

spin_until_future_complete does not work recursively:

// TODO(wjwwood): does not work recursively; can't call spin_node_until_future_complete

Could someone provide some background why this is not allowed?

@nuclearsandwich nuclearsandwich added the question Further information is requested label Jul 11, 2019
@orduno
Copy link
Author

orduno commented Aug 7, 2019

@wjwwood Could you provide some background on this? What's the root of the problem? We might be able to help.

@nuclearsandwich
Copy link
Member

Re-ping @wjwwood 🙇‍♂️

@wjwwood
Copy link
Member

wjwwood commented Aug 21, 2019

I'm going to have to assume some understanding of how the executors and callback groups work to keep this reasonably brief. There are a few explanations floating around the internet, but I think the best thing to do if you want to understand them better is to start with the two most commonly used and just trace where the code goes in their spin functions:

  • void
    SingleThreadedExecutor::spin()
    {
    if (spinning.exchange(true)) {
    throw std::runtime_error("spin() called while already spinning");
    }
    RCLCPP_SCOPE_EXIT(this->spinning.store(false); );
    while (rclcpp::ok(this->context_) && spinning.load()) {
    rclcpp::executor::AnyExecutable any_executable;
    if (get_next_executable(any_executable)) {
    execute_any_executable(any_executable);
    }
    }
    }
  • void
    MultiThreadedExecutor::spin()
    {
    if (spinning.exchange(true)) {
    throw std::runtime_error("spin() called while already spinning");
    }
    RCLCPP_SCOPE_EXIT(this->spinning.store(false); );
    std::vector<std::thread> threads;
    size_t thread_id = 0;
    {
    std::lock_guard<std::mutex> wait_lock(wait_mutex_);
    for (; thread_id < number_of_threads_ - 1; ++thread_id) {
    auto func = std::bind(&MultiThreadedExecutor::run, this, thread_id);
    threads.emplace_back(func);
    }
    }
    run(thread_id);
    for (auto & thread : threads) {
    thread.join();
    }
    }
    size_t
    MultiThreadedExecutor::get_number_of_threads()
    {
    return number_of_threads_;
    }
    void
    MultiThreadedExecutor::run(size_t)
    {
    while (rclcpp::ok(this->context_) && spinning.load()) {
    executor::AnyExecutable any_exec;
    {
    std::lock_guard<std::mutex> wait_lock(wait_mutex_);
    if (!rclcpp::ok(this->context_) || !spinning.load()) {
    return;
    }
    if (!get_next_executable(any_exec)) {
    continue;
    }
    if (any_exec.timer) {
    // Guard against multiple threads getting the same timer.
    if (scheduled_timers_.count(any_exec.timer) != 0) {
    continue;
    }
    scheduled_timers_.insert(any_exec.timer);
    }
    }
    if (yield_before_execute_) {
    std::this_thread::yield();
    }
    execute_any_executable(any_exec);
    if (any_exec.timer) {
    std::lock_guard<std::mutex> wait_lock(wait_mutex_);
    auto it = scheduled_timers_.find(any_exec.timer);
    if (it != scheduled_timers_.end()) {
    scheduled_timers_.erase(it);
    }
    }
    // Clear the callback_group to prevent the AnyExecutable destructor from
    // resetting the callback group `can_be_taken_from`
    any_exec.callback_group.reset();
    }
    }

The problem with recursive spinning is different in those two executors.


For the single threaded executor, the issue is actually recursion, the code looks something like this:

  • spin()
    • wait for work
    • match work with user callback
    • take data and call user callback
      • in user callback, call spin() <-- recursion happens here

This spin() method is not designed to be re-entrant.

Also, you can too easily run into a dead lock using the single threaded executor, because if you did something like:

void my_callback(const SomeMessageType & message) {
  // ...
  auto future = service->async_send_request(request);
  spin_until_future_complete(future);
  // ...
}

And if the service callback and the above subscription callback are in the same mutually exclusive callback group, then even though you're spinning, the service would never be handled.


For the multi threaded executor, the issue is thread safety because only one thread may wait for work at the same time, and while you're executing your callback another thread is likely waiting for work to be ready. This means you may not be able to get work in your thread. This isn't such an issue, but it would require the implementation to be restructured to allow for this.

However, it suffers from the same callback group deadlock as the single threaded executors.

Also while waiting for work you're consuming the thread you're in which could be used to execute work otherwise (especially in the same callback group).


The "right" solution to this, in my opinion, requires an async/await syntax in C++, which doesn't exist right now. You can look at asyncio in Python3 or the new async/wait syntax in rust for what I mean. But basically we need the callbacks to be coroutines which we suspend while waiting for the condition to be met (some future is completed) and after that return to the callback and finish it. Since this is a language feature we can't really rely on that, but it would be the ideal way.

So ignoring that as a solution, I'd have to really dig into the architecture of the executor in order to figure out what we need to do in order to properly support this.

@cevans87
Copy link

The "right" solution to this, in my opinion, requires an async/await syntax in C++, which doesn't exist right now. You can look at asyncio in Python3 or the new async/wait syntax in rust for what I mean. But basically we need the callbacks to be coroutines which we suspend while waiting for the condition to be met (some future is completed) and after that return to the callback and finish it. Since this is a language feature we can't really rely on that, but it would be the ideal way.

Some other C++ executor/future frameworks I've seen have worked around this decently well by just improving existing futures a bit and requiring all functions executed in an executor to have a `Future' return type.

Requiring callbacks to return futures along with a richer rclcpp::Future<T> with .then.

using rclcpp_lifecycle::node_interfaces::LifecycleNodeInterface::CallbackReturn;

rclcpp::Future<CallbackReturn>
on_activate(const rclcpp_lifecycle::State &)
{
  // Future chaining results in a rclcpp::Future<CallbackReturn>
  return client_->async_send_request(foo_request_)
    .then([](const FooReply& reply){
      return (reply.success) ? CallbackReturn::SUCCESS: CallbackReturn::FAILURE;
    });
}

Alternatively, don't expect return types. Expect a callback when the function is done. Note that there's no compile-time enforcement that someone handles the callback, so this isn't great. I think there might be many other pitfalls that I don't remember with this one.

using rclcpp_lifecycle::node_interfaces::LifecycleNodeInterface::CallbackReturn;

void
on_activate(const rclcpp_lifecycle::State &, std::function<void(CallbackReturn)> done)
{
  client_->async_send_request(foo_request_, [](const FooReply& reply) {
      if (reply.success) {
        done(CallbackReturn::SUCCESS);
      } else {
        done(CallbackReturn::FAILURE);
    });
}

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/code-smells-in-ros-code/19905/5

nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022
Signed-off-by: Stephen Brawner <brawner@gmail.com>
@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/deferrable-canceleable-lifecycle-transitions/32318/1

@tgroechel
Copy link

Throwing out another way to "get around" this using async services + SharedFutureResponse: #1709 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants