New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rcl_action: result_timeout should be started on goal completion #1103
Comments
@SteveMacenski @clalancette @AlexeyMerzlyakov what do you think? |
One minute is better yet than 10 seconds, but doesn't really address the underlying problem. We need a different mechanic to expire goals that isn't based on request time (e.g. last-updated time? last-result-requested time?) But, I'll take incremental improvements where I can get it. Nav2's workaround of increasing the time solves my immediate problems like this, but still leaves every other user that isn't extremely well plugged into the on-goings of Nav2 / rclcpp development in the lurch. So for their sake to help mask more of the problem in the meantime, I would be very supportive of a move up to 1 minute. +1 this is a good suggestion |
i see, maybe we can rely on server process or event that resets the timeout. (e.g the followings are PRs just change the default value. |
Thank Alexey! I’m just escalating his good sleuthing, that is his idea! Thanks so much for the time to help address this issue @fujitatomoya! |
@fujitatomoya, thank you for the making an attention on this issue! Yes, it will handle the immediate problem, so I would like to see these two PR-s to be merged. In long-term perspective, I agree that the change is required to be in timeout handling mechanism. It seems, that initially the value was decreased in RCL as a workaround to reduce memory consumption on unused actions. However, from this point of view, the timeout is better to be calculated from latest feedback/status/any other event in action server-client chain, rather than from request starting time (as it made currently). Anyway, this is also a solution for now, so I would be OK to apply it. |
This issue has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-09-21/33733/2 |
If I understand correctly I think there are two bugs. One here, and one in the Python API.
A little info on the design. The client that sent the goal is expected to ask for the result as soon as it learns the goal has been accepted by the server. The result should be kept for action completion time + timeout in case someone other than the original client wants to look at it. What looks like it is working as intended is the client only expires goals in a terminal state. See here where the expire logic skips active goals (accepted, executing, canceling). That means even with the expire timer issue any client can get the result as long as they do so before the action completes. Second, the C++ action client properly requests the result as soon as the goal is accepted. I saw some comments about how this could be a problem for actions that take hours to run, but I think that case is fine. Instead I think the issue would be when the timeout was very small and the goal was completed faster than the client could request the result. Maybe in the Python API we should require the user of the Python action client to give a callback to be called when the result is ready? Or, maybe we should always request a result and give them a future they can keep or discard? Either way I think the get result service should be called here. |
@sloretz thank you for sharing comments. I might be mistaken, so i need to check.
true, this is against https://design.ros2.org/articles/actions.html#result-caching
i think there seems to be 2 ways to call
ros2/rclcpp#2101 creates the action server with after all, i think what needs to be fixed is result_timeout should be applied against what do you think? |
@iuhilnehc-ynos @Barry-Xu-2018 could you check this when you have time? i would like to have the 2nd opinion. thanks in advance. |
Totally agree. This is an important fixed point. According to the design, it is possible to not get result if timeout is set as 0. |
@Barry-Xu-2018 @iuhilnehc-ynos you guys have bandwidth for this fix? that would be really appreciated. CC: @clalancette @sloretz |
Okay.
|
i am not really sure about this comment. https://design.ros2.org/articles/actions.html#result-caching explains well.
if the timeout is set to 0, once goal has been completed, server checks |
While goal processing time is very short (just send goal accept to client), client doesn't send goal result to service. |
Understood. there will be always racy condition between saying, for example,
so i think current design with timeout on server after goal completion makes sense. but i am open for more options and ideas 👍 thanks |
I agree with you that a goal result could be missed if the goal executed very fast and the result timeout was zero, but I think that's a case of a missconfigured goal timeout in the application rather than a bug that needs to be fixed here. |
btw,
i guess, we would take this as |
I've also considered the above solution, and as you described, there are some unavoidable issues. This is because it's impossible to determine when the service will receive the client's request result. How long to retain the goal result is a problem. I tend to agree with sloretz's point of view. This should be resolved by asking users to set an appropriate timeout. |
@Barry-Xu-2018 do we have any update on this issue? |
No update. |
Bug report
Required Info:
Description
If the user application uses default
rcl_action_server_options_t
,goal_handle
will be considered as expired after 10 seconds.as described in open-navigation/navigation2#3765, that is so likely to take more than 10 seconds to set the goal result before expired(10 seconds) once accepted. open-navigation/navigation2#3765 (comment) analyzes and work-around this issue with setting 30 seconds via
rcl_action_server_options_t.result_timeout
.Consideration / Proposal
rcl_action
: increase the timeout from 10 seconds to 1 minute in default. (15 minutes are too long though.) Reduce result_timeout to 10 seconds. #1012 reduced the timeout into 10 seconds, but thinking about the use case such as Nav2 relies on ROS 2 action, 10 seconds is short in default. (backport required to Iron)rclpy_action
: result_timeout default should be set to rcl's default accordingly. (currently this is set to900
seconds.)Related Issue
timeout == 0
does not work)The text was updated successfully, but these errors were encountered: