-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the starting time of the goal expiration timeout #1121
Change the starting time of the goal expiration timeout #1121
Conversation
This patch doesn't introduce new interfaces. As a result, the current modification is not very clear for setting goal terminal timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is goal_handle->impl->goal_terminal_timestamp
set initially?
If my understanding of your code is correct (I don't live in rcl
) when calling _recalculate_expire_timer
, which I assume is periodically called during an action server's normal operations, we reset to now
the clock ticking down when its expired.
I don't understand the goal_terminal_timestamp == INT64_MAX
condition then --> shouldn't we always be resetting the clock if the action is still active? That would make the most sense to me; while the goal is still actively being processed, we stave off the timeout to remove it by another full timeout length. Then when the last update / sending of the result comes so we don't call it anymore, the full countdown is considered before removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have a few comments to confirm.
First, let me explain when _recalculate_expire_timer() is called.
I use scenario 1 to set goal_terminal_timestamp.
rcl/rcl_action/src/rcl_action/action_server.c Line 447 in a17bc95
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, i will try CI.
In this case, why is there any timer still set for expiring the goals? That seems like the root cause of the problem; our goals are being expired before they're done due to a terminal condition if they're long-running. In this design, that seems to me should be completely removed and this check in terminal states (success or cancel or abort) is all that should be required. This is where the timer to count down the expiration of the goal handle should be initiated. Alternatively, the timer for expiring the next goal should be adjusted during action server operations from |
@Barry-Xu-2018 can you rebase your development branch? it is failing build https://ci.ros2.org/job/ci_linux/19950/console since it does not include some changes in the mainline. Barry-Xu-2018/rcl@review/topic-use-goal-terminal-timestamp-to-expire-check...ros2:rcl:rolling |
62dd16f
to
0a87c6d
Compare
Rebase was done. @fujitatomoya |
Sorry, I'm not quite clear about the scenarios you described. Could you please describe them again? #1121 (comment)
|
There is a failure in windows test. |
@sloretz @iuhilnehc-ynos either of you, can you take a look at this? i would like to have another review on this. |
Your two situations this method is called:
I'll shorthand this as achieving a terminal state
I'll shorthand the timer for goal expiring as "the ticking clock" My assertion is that you found a clever place to make modifications to the goal handle's state: at achieving the terminal state. We know that once we achieve this state, the goal is done processing and anytime before this point, the goal is still processing (or was dropped on the floor from a users' crappy application, which no matter what is going to cause problems). If we have a place (e.g. here) we can modify a ticking clock, that calls into question the need for the ticking clock prior to this point at all. The issue we want to solve is that ticking clock expiring still-processing goals -- thus why even have the ticking clock have the ability to expire still-processing goals? I'd argue that it shouldn't even be possible. We should be waiting until the terminal state to trigger the clock to start its countdown until the handle / result are deleted. Of which, the default of 10s or what-have-you is totally appropriate and fully solves the problem. It also comes with the added benefit of simplifying the action logic since there (1) doesn't need to be a ticking clock checked while the goal is processing and (2) the modification you require don't need to reset it, it merely needs to start it |
Thank you for the further explanation.
Your current question revolves around "the ticking clock", but I haven't fully caught your idea yet. For "the ticking clock", while the timer is triggered (At 2 in above diagram), the next expire time need to be calculated.
There are currently two time points where "the ticking clock" is checked. One is when a goal enters the termination state, and the other is when the ticking clock timer is triggered. When you mentioned "the goal is processing," in what situation does that occur? If expire timer has been run, a goal enters terminal state to call _recalculate_expire_timer(). At this case, calculating and setting the timer is an unnecessary operation. |
That's a question you're most suited to answer than me. With the current behavior before this PR, where was the goal being expired resulting in the bug that this is meant to address (which was expiring still-processing goals causing issues)? That was happening which triggered this discussion. My suggestion is removing entirely where-ever that is happening if we're able to start the count down to removing a goal at a terminal state. You've shown the part of the code where that terminal state is found to be able to reset a timer. Instead, we could just start it here and remove the previous problematic countdown that previously caused us issues. I ack that perhaps I'm missing something and that my suggestion isn't fully informed with the ins-and-outs of rcl. What I'm trying to suggest at its core is this, regardless of the specific details:
To my eye, this PR resets some timeouts, but still leaves the original timeout in place that caused the problem in the past - which could still cause a problem moving forward as well if you have a timeout of 10s but an action length of 10 minutes (which, we have). |
Thank you for detailed explanation. This PR ensures that the current time is recorded as the start time for expiration only when the goal enters the terminal state. The expire timer is set based on this time. Therefore, with the current modifications, it is impossible for the goal to expire while being processed. In rcl_action_expire_goals(), it will get time of goal termination by calling rcl_action_goal_handle_get_goal_terminal_timestamp() to check if expiration occurs. |
OK, got it. That makes sense, you can ignore my comments then :-) This is a very important fix for us in Nav2 so I wanted to make sure beyond the shadow of my ignorance that this was going to fully resolve the problem 😉 I very much appreciate your time here! |
@Barry-Xu-2018 since you added the fix from last CI, i started the full CI. @SteveMacenski thank you for taking your time for the review! |
What's the good word? |
we would like to have 2nd review on this. |
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
@jmachowinski Thanks a lot for your review. I've updated the code based on your comments. |
@jmachowinski @alsora thanks for your reviews, i will start the CI. |
I will check failed test cases. |
Commit a07e119 is wrong. |
This reverts commit a07e119. Signed-off-by: Barry Xu <barry.xu@sony.com>
@Barry-Xu-2018 please consider adding some comments around that code that we thought was not needed but broke the tests when removed, so that in the future we will immediately know why we need it. |
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
3ac0079
to
959f39b
Compare
The all unstable cases are related to this commit ros2/message_filters@5551518 For Windows, it is related to this PR.
|
All CI tests passed. |
Will this be backported to Jazzy/Humble? |
I believe adding new function in C does not break ABI compatibility, so i say we can backport this to jazzy, iron and humble. @clalancette @alsora what do you think? |
This is really great news, I know there's alot of interest in Nav2 users for the backport to remove that timeout! From my review as well, it looks backportable |
Address one issue mentioned #1103
The expire timeout is being calculated from the time the goal was accepted. According to the design https://design.ros2.org/articles/actions.html#result-caching, it should be calculated from the time the goal is completed.