Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabled result caching: Result gets erased before result callback is registered #2101

Open
benthie opened this issue Feb 10, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@benthie
Copy link

benthie commented Feb 10, 2023

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
    • Ubuntu 22.04
  • Installation type:
    • binaries
  • Version or commit hash:
    • galactic, foxy, humble
  • DDS implementation:
    • CycloneDDS
  • Client library (if applicable):
    • rclcpp_action

Steps to reproduce issue

I used the Fibonacci ActionServer (C++) and ActionClient (Python) and extended them as follows to reproduce the bug.

In the server I passed in options that disable the result caching and reduced the execute() function to return quicker.

#include <functional>
#include <memory>
#include <thread>

#include "action_tutorials_interfaces/action/fibonacci.hpp"
#include "rclcpp/rclcpp.hpp"
#include "rclcpp_action/rclcpp_action.hpp"
#include "rclcpp_components/register_node_macro.hpp"

#include "action_tutorials_cpp/visibility_control.h"

namespace action_tutorials_cpp
{
class FibonacciActionServer : public rclcpp::Node
{
public:
  using Fibonacci = action_tutorials_interfaces::action::Fibonacci;
  using GoalHandleFibonacci = rclcpp_action::ServerGoalHandle<Fibonacci>;

  ACTION_TUTORIALS_CPP_PUBLIC
  explicit FibonacciActionServer(const rclcpp::NodeOptions & options = rclcpp::NodeOptions())
  : Node("fibonacci_action_server", options)
  {
    using namespace std::placeholders;

    auto serverOptions = rcl_action_server_get_default_options();
    serverOptions.result_timeout.nanoseconds = 0;

    this->action_server_ = rclcpp_action::create_server<Fibonacci>(
      this,
      "fibonacci",
      std::bind(&FibonacciActionServer::handle_goal, this, _1, _2),
      std::bind(&FibonacciActionServer::handle_cancel, this, _1),
      std::bind(&FibonacciActionServer::handle_accepted, this, _1),
      serverOptions);
  }

private:
  rclcpp_action::Server<Fibonacci>::SharedPtr action_server_;

  rclcpp_action::GoalResponse handle_goal(
    const rclcpp_action::GoalUUID & uuid,
    std::shared_ptr<const Fibonacci::Goal> goal)
  {
    RCLCPP_INFO(this->get_logger(), "Received goal request with order %d", goal->order);
    (void)uuid;
    return rclcpp_action::GoalResponse::ACCEPT_AND_EXECUTE;
  }

  rclcpp_action::CancelResponse handle_cancel(
    const std::shared_ptr<GoalHandleFibonacci> goal_handle)
  {
    RCLCPP_INFO(this->get_logger(), "Received request to cancel goal");
    (void)goal_handle;
    return rclcpp_action::CancelResponse::ACCEPT;
  }

  void handle_accepted(const std::shared_ptr<GoalHandleFibonacci> goal_handle)
  {
    using namespace std::placeholders;
    // this needs to return quickly to avoid blocking the executor, so spin up a new thread
    std::thread{std::bind(&FibonacciActionServer::execute, this, _1), goal_handle}.detach();
  }

  void execute(const std::shared_ptr<GoalHandleFibonacci> goal_handle)
  {
    RCLCPP_INFO(this->get_logger(), "Executing goal");
    const auto goal = goal_handle->get_goal();
    auto result = std::make_shared<Fibonacci::Result>();
    result->sequence.push_back(42);
    goal_handle->succeed(result);
    RCLCPP_INFO(this->get_logger(), "Executing done");
  }
};  // class FibonacciActionServer

}  // namespace action_tutorials_cpp

RCLCPP_COMPONENTS_REGISTER_NODE(action_tutorials_cpp::FibonacciActionServer)

In the client I added some more logging to both goal_response_callback and get_result_callback.

import rclpy
from rclpy.action import ActionClient
from rclpy.node import Node

from action_tutorials_interfaces.action import Fibonacci


class FibonacciActionClient(Node):

    def __init__(self):
        super().__init__('fibonacci_action_client')
        self._action_client = ActionClient(self, Fibonacci, 'fibonacci')

    def send_goal(self, order):
        goal_msg = Fibonacci.Goal()
        goal_msg.order = order

        self._action_client.wait_for_server()

        self._send_goal_future = self._action_client.send_goal_async(goal_msg)
        self._send_goal_future.add_done_callback(self.goal_response_callback)

    def goal_response_callback(self, future):
        goal_handle = future.result()
        if not goal_handle.accepted:
            self.get_logger().info('Goal rejected :(')
            return

        self.get_logger().info('Goal accepted :)')

        self.get_logger().info("goal_response_callback: get_result_async")
        self._get_result_future = goal_handle.get_result_async()
        self.get_logger().info("goal_response_callback: add_done_callback(result_callback)")
        self._get_result_future.add_done_callback(self.get_result_callback)
        self.get_logger().info("goal_response_callback: done")

    def get_result_callback(self, future):
        result = future.result()
        self.get_logger().info('Result: status {}'.format(result.status))
        self.get_logger().info('Result: {0}'.format(result.result.sequence))
        rclpy.shutdown()


def main(args=None):
    rclpy.init(args=args)

    action_client = FibonacciActionClient()

    action_client.send_goal(10)

    rclpy.spin(action_client)


if __name__ == '__main__':
    main()

Expected behavior

According to the ROS2 ActionServer design article I'd expect the server to discard the results "after responding to any pending result requests".
The client should log "result_callback: status 4", which is STATUS_SUCCEEDED.

Actual behavior

The server discards the result as soon as it detects it as expired (see server.cpp L500). In some cases the discarding happens before the client has registered the result callback and thus the result is lost and at the time the result_callback is executed the client logs "result_callback: status 0", which is STATUS_UNKNOWN.

Additional information

I also put the rclcpp_action code in my overlay and added and changed some debug messages to log_level WARN so that I could see when the result gets discarded and when STATUS_UNKNOWN is returned by the server. Here is the log in which one can see the failure.

------------------------------------------------------------------------------------------------------------------------
Client
------------------------------------------------------------------------------------------------------------------------
[INFO] [1676025199.686814735] [fibonacci_action_client]: Goal accepted :)
[INFO] [1676025199.687044809] [fibonacci_action_client]: goal_response_callback: get_result_async
[INFO] [1676025199.687553497] [fibonacci_action_client]: goal_response_callback: add_done_callback(result_callback)
[INFO] [1676025199.687765941] [fibonacci_action_client]: goal_response_callback: done
[INFO] [1676025199.688158300] [fibonacci_action_client]: Result: status 0
[INFO] [1676025199.693995159] [fibonacci_action_client]: Result: array('i')

------------------------------------------------------------------------------------------------------------------------
Server
------------------------------------------------------------------------------------------------------------------------
[INFO] [1676025199.677291824] [fibonacci_action_server]: Received goal request with order 10
[DEBUG] [1676025199.677356341] [rcl]: Sending service response
[WARN] [1676025199.677414186] [fibonacci_action_server.rclcpp_action]: Accepted goal 912881d7a81d4a17b8ac8d6e89d14b0
[DEBUG] [1676025199.677736839] [rcl]: Waiting without timeout
[DEBUG] [1676025199.677742997] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1676025199.677747903] [rcl]: Guard condition in wait set is ready
[DEBUG] [1676025199.677753235] [rcl]: Waiting without timeout
[DEBUG] [1676025199.677755049] [rcl]: Timeout calculated based on next scheduled timer: false
[INFO] [1676025199.677770234] [fibonacci_action_server]: Executing goal
[DEBUG] [1676025199.677945233] [rcl]: Timer successfully reset
[DEBUG] [1676025199.677950962] [rcl]: Updated timer period from '0ns' to '0ns'
[DEBUG] [1676025199.677958279] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1676025199.677961209] [rcl]: Timeout calculated based on next scheduled timer: true
[INFO] [1676025199.678002260] [fibonacci_action_server]: Executing done
[DEBUG] [1676025199.679048075] [rcl]: Timer in wait set is ready
[DEBUG] [1676025199.679071412] [rcl]: Timer canceled
[WARN] [1676025199.679082202] [fibonacci_action_server.rclcpp_action]: Expired goal 912881d7a81d4a17b8ac8d6e89d14b0
[DEBUG] [1676025199.679193269] [rcl]: Timer canceled
[DEBUG] [1676025199.679202156] [rcl]: Waiting without timeout
[DEBUG] [1676025199.679204135] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1676025199.679207484] [rcl]: Guard condition in wait set is ready
[DEBUG] [1676025199.679211155] [rcl]: Waiting without timeout
[DEBUG] [1676025199.679212952] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1676025199.687154346] [rcl]: Service in wait set is ready
[DEBUG] [1676025199.687177283] [rcl]: Service server taking service request
[DEBUG] [1676025199.687183095] [rcl]: Service take request succeeded: true
[WARN] [1676025199.687225064] [fibonacci_action_server.rclcpp_action]: Goal does not exist 912881d7a81d4a17b8ac8d6e89d14b0
[WARN] [1676025199.687228270] [fibonacci_action_server.rclcpp_action]: Creating result response with STATUS_UNKNOWN

I think this problem only applies to the Python API, because in C++ the result callback is part of the SendGoalOptions? I understand that the server cannot predict that some Python ActionClient is going to request the result soon. And I am not sure whether this an actual bug or rather a design decision and known issue the client has to deal with. As far as I understand, the Python API does not offer the ActionClient a way to register interest in the result at any earlier point in time. If there should be a way to do so, I am happy to try it out.

We only saw the issue in our Python based tests were the server was executing a rather short execution callback.

@fujitatomoya
Copy link
Collaborator

@benthie thanks for creating issue.

we have the same problem with ros2/ros2@1f5bd8e rolling.

i think this is not expected behavior as designed.

@fujitatomoya fujitatomoya added the bug Something isn't working label Sep 24, 2023
fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Sep 24, 2023
… registered

  ros2/rclcpp#2101

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants