Can't call spin_until_future_complete inside a callback executed by an executor #773

orduno · 2019-06-28T20:51:18Z

Feature request

Feature description

spin_until_future_complete does not work recursively:

rclcpp/rclcpp/include/rclcpp/executors.hpp

Line 76 in 0ccac1e

    
           // TODO(wjwwood): does not work recursively; can't call spin_node_until_future_complete

Could someone provide some background why this is not allowed?

The text was updated successfully, but these errors were encountered:

orduno · 2019-08-07T17:55:33Z

@wjwwood Could you provide some background on this? What's the root of the problem? We might be able to help.

nuclearsandwich · 2019-08-08T14:24:42Z

Re-ping @wjwwood 🙇‍♂️

wjwwood · 2019-08-21T00:43:47Z

I'm going to have to assume some understanding of how the executors and callback groups work to keep this reasonably brief. There are a few explanations floating around the internet, but I think the best thing to do if you want to understand them better is to start with the two most commonly used and just trace where the code goes in their spin functions:

rclcpp/rclcpp/src/rclcpp/executors/single_threaded_executor.cpp

Lines 26 to 39 in 65188b0

    
           void 
        
           SingleThreadedExecutor::spin() 
        
           { 
        
             if (spinning.exchange(true)) { 
        
               throw std::runtime_error("spin() called while already spinning"); 
        
             } 
        
             RCLCPP_SCOPE_EXIT(this->spinning.store(false); ); 
        
             while (rclcpp::ok(this->context_) && spinning.load()) { 
        
               rclcpp::executor::AnyExecutable any_executable; 
        
               if (get_next_executable(any_executable)) { 
        
                 execute_any_executable(any_executable); 
        
               } 
        
             } 
        
           }

rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp

Lines 41 to 108 in 65188b0

    
           void 
        
           MultiThreadedExecutor::spin() 
        
           { 
        
             if (spinning.exchange(true)) { 
        
               throw std::runtime_error("spin() called while already spinning"); 
        
             } 
        
             RCLCPP_SCOPE_EXIT(this->spinning.store(false); ); 
        
             std::vector<std::thread> threads; 
        
             size_t thread_id = 0; 
        
             { 
        
               std::lock_guard<std::mutex> wait_lock(wait_mutex_); 
        
               for (; thread_id < number_of_threads_ - 1; ++thread_id) { 
        
                 auto func = std::bind(&MultiThreadedExecutor::run, this, thread_id); 
        
                 threads.emplace_back(func); 
        
               } 
        
             } 
        
             run(thread_id); 
        
             for (auto & thread : threads) { 
        
               thread.join(); 
        
             } 
        
           } 
        
           size_t 
        
           MultiThreadedExecutor::get_number_of_threads() 
        
           { 
        
             return number_of_threads_; 
        
           } 
        
           void 
        
           MultiThreadedExecutor::run(size_t) 
        
           { 
        
             while (rclcpp::ok(this->context_) && spinning.load()) { 
        
               executor::AnyExecutable any_exec; 
        
               { 
        
                 std::lock_guard<std::mutex> wait_lock(wait_mutex_); 
        
                 if (!rclcpp::ok(this->context_) || !spinning.load()) { 
        
                   return; 
        
                 } 
        
                 if (!get_next_executable(any_exec)) { 
        
                   continue; 
        
                 } 
        
                 if (any_exec.timer) { 
        
                   // Guard against multiple threads getting the same timer. 
        
                   if (scheduled_timers_.count(any_exec.timer) != 0) { 
        
                     continue; 
        
                   } 
        
                   scheduled_timers_.insert(any_exec.timer); 
        
                 } 
        
               } 
        
               if (yield_before_execute_) { 
        
                 std::this_thread::yield(); 
        
               } 
        
               execute_any_executable(any_exec); 
        
               if (any_exec.timer) { 
        
                 std::lock_guard<std::mutex> wait_lock(wait_mutex_); 
        
                 auto it = scheduled_timers_.find(any_exec.timer); 
        
                 if (it != scheduled_timers_.end()) { 
        
                   scheduled_timers_.erase(it); 
        
                 } 
        
               } 
        
               // Clear the callback_group to prevent the AnyExecutable destructor from 
        
               // resetting the callback group `can_be_taken_from` 
        
               any_exec.callback_group.reset(); 
        
             } 
        
           }

The problem with recursive spinning is different in those two executors.

For the single threaded executor, the issue is actually recursion, the code looks something like this:

spin()
- wait for work
- match work with user callback
- take data and call user callback
  - in user callback, call spin() <-- recursion happens here

This spin() method is not designed to be re-entrant.

Also, you can too easily run into a dead lock using the single threaded executor, because if you did something like:

void my_callback(const SomeMessageType & message) {
  // ...
  auto future = service->async_send_request(request);
  spin_until_future_complete(future);
  // ...
}

And if the service callback and the above subscription callback are in the same mutually exclusive callback group, then even though you're spinning, the service would never be handled.

For the multi threaded executor, the issue is thread safety because only one thread may wait for work at the same time, and while you're executing your callback another thread is likely waiting for work to be ready. This means you may not be able to get work in your thread. This isn't such an issue, but it would require the implementation to be restructured to allow for this.

However, it suffers from the same callback group deadlock as the single threaded executors.

Also while waiting for work you're consuming the thread you're in which could be used to execute work otherwise (especially in the same callback group).

The "right" solution to this, in my opinion, requires an async/await syntax in C++, which doesn't exist right now. You can look at asyncio in Python3 or the new async/wait syntax in rust for what I mean. But basically we need the callbacks to be coroutines which we suspend while waiting for the condition to be met (some future is completed) and after that return to the callback and finish it. Since this is a language feature we can't really rely on that, but it would be the ideal way.

So ignoring that as a solution, I'd have to really dig into the architecture of the executor in order to figure out what we need to do in order to properly support this.

cevans87 · 2019-08-28T22:01:37Z

The "right" solution to this, in my opinion, requires an async/await syntax in C++, which doesn't exist right now. You can look at asyncio in Python3 or the new async/wait syntax in rust for what I mean. But basically we need the callbacks to be coroutines which we suspend while waiting for the condition to be met (some future is completed) and after that return to the callback and finish it. Since this is a language feature we can't really rely on that, but it would be the ideal way.

Some other C++ executor/future frameworks I've seen have worked around this decently well by just improving existing futures a bit and requiring all functions executed in an executor to have a `Future' return type.

Requiring callbacks to return futures along with a richer rclcpp::Future<T> with .then.

using rclcpp_lifecycle::node_interfaces::LifecycleNodeInterface::CallbackReturn;

rclcpp::Future<CallbackReturn>
on_activate(const rclcpp_lifecycle::State &)
{
  // Future chaining results in a rclcpp::Future<CallbackReturn>
  return client_->async_send_request(foo_request_)
    .then([](const FooReply& reply){
      return (reply.success) ? CallbackReturn::SUCCESS: CallbackReturn::FAILURE;
    });
}

Alternatively, don't expect return types. Expect a callback when the function is done. Note that there's no compile-time enforcement that someone handles the callback, so this isn't great. I think there might be many other pitfalls that I don't remember with this one.

using rclcpp_lifecycle::node_interfaces::LifecycleNodeInterface::CallbackReturn;

void
on_activate(const rclcpp_lifecycle::State &, std::function<void(CallbackReturn)> done)
{
  client_->async_send_request(foo_request_, [](const FooReply& reply) {
      if (reply.success) {
        done(CallbackReturn::SUCCESS);
      } else {
        done(CallbackReturn::FAILURE);
    });
}

ros-discourse · 2021-04-15T16:20:28Z

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/code-smells-in-ros-code/19905/5

Signed-off-by: Stephen Brawner <brawner@gmail.com>

ros-discourse · 2023-07-11T15:14:25Z

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/deferrable-canceleable-lifecycle-transitions/32318/1

tgroechel · 2023-07-27T16:30:22Z

Throwing out another way to "get around" this using async services + SharedFutureResponse: #1709 (comment)

nuclearsandwich added the question Further information is requested label Jul 11, 2019

nuclearsandwich assigned wjwwood Aug 8, 2019

jacobperron mentioned this issue Oct 3, 2019

Calling a Service from an other Service leading to a std::runtime_error #312

Closed

nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022

Remove std::cout line from test_rcl_lifecycle.cpp (ros2#773)

911088d

Signed-off-by: Stephen Brawner <brawner@gmail.com>

This was referenced Jun 13, 2023

Deferrable + Cancelable lifecycle change_state transition functions #2213

Open

LifeCycle pattern dosn't work (for me at least) #2057

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't call spin_until_future_complete inside a callback executed by an executor #773

Can't call spin_until_future_complete inside a callback executed by an executor #773

orduno commented Jun 28, 2019

orduno commented Aug 7, 2019

nuclearsandwich commented Aug 8, 2019

wjwwood commented Aug 21, 2019

cevans87 commented Aug 28, 2019

ros-discourse commented Apr 15, 2021

ros-discourse commented Jul 11, 2023

tgroechel commented Jul 27, 2023

Can't call spin_until_future_complete inside a callback executed by an executor #773

Can't call spin_until_future_complete inside a callback executed by an executor #773

Comments

orduno commented Jun 28, 2019

Feature request

Feature description

orduno commented Aug 7, 2019

nuclearsandwich commented Aug 8, 2019

wjwwood commented Aug 21, 2019

cevans87 commented Aug 28, 2019

ros-discourse commented Apr 15, 2021

ros-discourse commented Jul 11, 2023

tgroechel commented Jul 27, 2023