Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor ignore canceled timers #220

Merged
merged 7 commits into from
Nov 29, 2021

Conversation

amfern
Copy link
Contributor

@amfern amfern commented Nov 24, 2021

This is similar behavior observed in rclcpp, canceled timers shouldn't be handled

https://github.com/ros2/rclcpp/blob/e03e98220d7c4a79bfc934d43d762e017b493e9c/rclcpp/include/rclcpp/timer.hpp#L213

this is similar behavior observed in rclcpp, canceled timers shouldn't be handled

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>
@pablogs9
Copy link
Member

Could you provide some examples of how is a timer cancelled in RCLC? Is this reversible?

Thanks for the contribution @amfern

@pablogs9
Copy link
Member

@mergify backport galactic foxy

@mergify
Copy link
Contributor

mergify bot commented Nov 25, 2021

backport galactic foxy

🟠 Waiting for conditions to match

  • merged [:pushpin: backport requirement]

Hey, I reacted but my real name is @Mergifyio

@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2021

Codecov Report

Merging #220 (5bd205e) into master (f8de341) will decrease coverage by 0.08%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #220      +/-   ##
==========================================
- Coverage   63.58%   63.49%   -0.09%     
==========================================
  Files          16       16              
  Lines        2150     2153       +3     
  Branches      642      643       +1     
==========================================
  Hits         1367     1367              
- Misses        469      471       +2     
- Partials      314      315       +1     
Impacted Files Coverage Δ
rclc/src/rclc/executor.c 61.90% <0.00%> (-0.18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8de341...5bd205e. Read the comment docs.

Copy link
Contributor

@JanStaschulat JanStaschulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @amfern.
It is more explicit, that we handle that case - but does not affect the behavior - in my understanding.

  • The timer is still handled.
  • the return code rc is returned as in the original source code.
  • the only difference is, that the error RCL_RET_TIMER_CANCELED is not treated as an error (no debug output).

Did I miss anything?

@pablogs9
Copy link
Member

Added regression test

Signed-off-by: Your Name <you@example.com>
@pablogs9 pablogs9 force-pushed the ignore-canceled-timer-in-executor branch from 497dc9c to b61f8ea Compare November 25, 2021 11:06
Signed-off-by: Your Name <you@example.com>
@pablogs9 pablogs9 force-pushed the ignore-canceled-timer-in-executor branch from 046f1dc to b72a102 Compare November 25, 2021 11:14
Signed-off-by: Your Name <you@example.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
@amfern
Copy link
Contributor Author

amfern commented Nov 25, 2021

I use microros In my company and i had the need to halt publishers and resume at later point.
I found rcl has the functions, rcl_timer_cancel and rcl_timer_reset which fitted my needs perfectly. But when I would cancel any timer rclc_executor_spin_some(executor, 0); will exit prematurely with error RCL_RET_TIMER_CANCELED without handling publishers and subscribers. So it seems to me the cancel timer behavior is broken.

I run rclc_executor_spin_some() with 0 timeout, because the current OS we use internally isn't POSIX like and single threaded, so I try to block the main thread as little as possible.

Could you provide some examples of how is a timer cancelled in RCLC? Is this reversible?
@pablogs9 is this what you meant as an example?

@pablogs9
Copy link
Member

@JanStaschulat if you are ok with tests and the fix, we can merge and backport

@amfern
Copy link
Contributor Author

amfern commented Nov 25, 2021

@JanStaschulat @pablogs9 I appreciate the swift reply and the test.

Copy link
Contributor

@JanStaschulat JanStaschulat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amfern "But when I would cancel any timer rclc_executor_spin_some(executor, 0); will exit prematurely with error RCL_RET_TIMER_CANCELED without handling publishers and subscribers. So it seems to me the cancel timer behavior is broken."

Thanks for the explanation. With your fix, the error RCL_RET_TIMER_CANCELEDwill still be returned, in case the timer was cancelled. How does this solve your problem? Could you add a unit test in rclc/test directory, that shows a minimal example of this behavior?

Should the return value in this case be changed to RCL_RET_OK?

@amfern
Copy link
Contributor Author

amfern commented Nov 25, 2021

With this patch, the error will be ignored and the retuned value of _rclc_execute will be RCL_RET_OK.
So the for loop in the function _rclc_default_scheduling will continue handling the rest of the handlers and won't return prematurely here.

if (rc != RCL_RET_OK) {
.

I think the unit test @pablogs9 added pretty much covers this case. But I can add another unit test with subscriber and publisher and make sure they are getting called.

How do I run the tests? Is there a guide?

@JanStaschulat
Copy link
Contributor

You can just add a unit test, like Pablo did, and then call colcon build and colcon test --packages-select rclc (in case you have multiple packages currently in your ros workspace. This will run all unit tests of the rclc package.

@JanStaschulat
Copy link
Contributor

JanStaschulat commented Nov 25, 2021

"With this patch, the error will be ignored and the retuned value of _rclc_execute will be RCL_RET_OK."

Hmm, I can't see this. The return-code is assigned to RCL_RET_CANCELLED_TIMER by rcl_timer_call, then with break , the switch-case statement is exited and this rc is still returned in line

return rc;

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>
@amfern amfern force-pushed the ignore-canceled-timer-in-executor branch from 5e2b33b to 2557d26 Compare November 25, 2021 14:52
rclc/src/rclc/executor.c Show resolved Hide resolved
Signed-off-by: Ilya Guterman <amfernusus@gmail.com>
@amfern amfern force-pushed the ignore-canceled-timer-in-executor branch from 35149ee to b6157e9 Compare November 25, 2021 21:30
@amfern
Copy link
Contributor Author

amfern commented Nov 25, 2021

if i run the test @pablogs9 added, the test executor_spin_timer_cancelled without patching the executor, I would expect it to fail, but it doesn't.

Why it doesn't fail? @JanStaschulat Should it fail?

@JanStaschulat
Copy link
Contributor

JanStaschulat commented Nov 26, 2021

My hypothesis:
When the timer is cancelled, the timer is not ready any more and, consequently, is not returned as ready timer in the wait_set.

handle->data_available = (NULL != wait_set->timers[handle->index]);

So data_availableflag of the cancelled timer will be false and the execution of the timer is omitted in _rclc_execute:

_rclc_check_handle_data_available(handle))

if (handle->data_available) {

Therefore, the switch/case statement of the TIMER (line 1433) in _rclc_execute is not evaluated any more. That means, that the error code RCL_RET_TIMER_CANCELLED is not returned for a cancelled timer.

This would explain that the unit test works without this fix.

Could you insert a simple printf in line 1436 (as suggest in my previous comment) and run the unit test with colcon testto check that the line rc = rcl_timer_call(handle->timer);is not executed after the timer is cancelled? Could you post the result here?

If this is the case, then I wonder, why it did not work in your application.

@amfern
Copy link
Contributor Author

amfern commented Nov 26, 2021

yes, i can add the printf, but later today.

So is it possible for the timer to become ready and canceled at the same time?

And because i execute one loop each time around, i cancled the timer when it was ready but not handled yet.

@JanStaschulat
Copy link
Contributor

JanStaschulat commented Nov 26, 2021

@amfern
"So is it possible for the timer to become ready and canceled at the same time?"
Are you cancelling the timer in the same thread or in different thread? I expect that in micro-XRCE dds client implementation is thread-safe, @pablogs9 . So that should be no issue.

But maybe you have the case, where the timer is ready, returned in the wait set (internally the executor marks it as data_available) and then timer is canclled in a different thread so when calling the timer, the return value is RCL_RET_TIMER_CANCELLED ... But this should only happen once.

@amfern
Copy link
Contributor Author

amfern commented Nov 26, 2021

everything is single threaded in my application

…the test

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>
@amfern amfern force-pushed the ignore-canceled-timer-in-executor branch from 84d24ba to 5bd205e Compare November 26, 2021 21:40
@amfern
Copy link
Contributor Author

amfern commented Nov 26, 2021

I added printf("print after 'rc = rcl_timer_call(handle->timer);'"); to line 1433, and I don't see the print, so it works as you described.

1: [       OK ] TestDefaultExecutor.executor_remove_subscription (2 ms)
1: [ RUN      ] TestDefaultExecutor.executor_add_timer
1: [       OK ] TestDefaultExecutor.executor_add_timer (3 ms)
1: [ RUN      ] TestDefaultExecutor.executor_spin_timer
1: print after 'rc = rcl_timer_call(handle->timer);'print after 'rc = rcl_timer_call(handle->timer);'print after 'rc = rcl_timer_call(handle->timer);'print after 'rc = rcl_timer_call(handle->timer);'print after 'rc = rcl_timer_call(handle->timer);'[       OK ] TestDefaultExecutor.executor_spin_timer (504 ms)
1: [ RUN      ] TestDefaultExecutor.executor_spin_publisher_timer_cancelled
1: canceling timer[       OK ] TestDefaultExecutor.executor_spin_publisher_timer_cancelled (3 ms)
1: [ RUN      ] TestDefaultExecutor.executor_spin_timer_cancelled
1: print after 'rc = rcl_timer_call(handle->timer);'print after 'rc = rcl_timer_call(handle->timer);'print after 'rc = rcl_timer_call(handle->timer);'[       OK ] TestDefaultExecutor.executor_spin_timer_cancelled (505 ms)
1: [ RUN      ] TestDefaultExecutor.executor_remove_timer

Also, I removed this patch and recompiled arduino_micro_ros and this issue didn't reproduce anymore, so I guess recompiling was what solved the problem and not this patch.

I am ok with not merging this patch, but i leave it to your discretion.

@JanStaschulat
Copy link
Contributor

JanStaschulat commented Nov 29, 2021

Thanks for your test and feedback.
I still think it is good to have this patch in the rclc-executor. In a multi-threaded use-case, the rclc-executor could receive a ready timer while simultaniously, in a different thread the timer is cancelled between rcl_wait and _rclc_execute function. This this patch, the cancelled timer is detected and changes the return value to RCL_RET_OK, which is the appropriate behavior.

So I will close this pull request and merge.

@JanStaschulat JanStaschulat merged commit 27b4d4f into ros2:master Nov 29, 2021
mergify bot pushed a commit that referenced this pull request Nov 29, 2021
* executor ignore canceled timers

this is similar behavior observed in rclcpp, canceled timers shouldn't be handled

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* Add regression test

Signed-off-by: Your Name <you@example.com>

* Uncrustify

Signed-off-by: Your Name <you@example.com>

* Fix warning unused

Signed-off-by: Your Name <you@example.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* return sucess on cancled timer

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* validate cancled timer doesn't impact the execution of other handlers

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* executor_spin_publisher_timer_cancelled cancel timer on beggining of the test

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

Co-authored-by: Pablo Garrido <pablogs9@gmail.com>
(cherry picked from commit 27b4d4f)
mergify bot pushed a commit that referenced this pull request Nov 29, 2021
* executor ignore canceled timers

this is similar behavior observed in rclcpp, canceled timers shouldn't be handled

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* Add regression test

Signed-off-by: Your Name <you@example.com>

* Uncrustify

Signed-off-by: Your Name <you@example.com>

* Fix warning unused

Signed-off-by: Your Name <you@example.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* return sucess on cancled timer

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* validate cancled timer doesn't impact the execution of other handlers

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* executor_spin_publisher_timer_cancelled cancel timer on beggining of the test

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

Co-authored-by: Pablo Garrido <pablogs9@gmail.com>
(cherry picked from commit 27b4d4f)
@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2021

backport galactic foxy

✅ Backports have been created

JanStaschulat pushed a commit that referenced this pull request Nov 29, 2021
* executor ignore canceled timers

this is similar behavior observed in rclcpp, canceled timers shouldn't be handled

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* Add regression test

Signed-off-by: Your Name <you@example.com>

* Uncrustify

Signed-off-by: Your Name <you@example.com>

* Fix warning unused

Signed-off-by: Your Name <you@example.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* return sucess on cancled timer

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* validate cancled timer doesn't impact the execution of other handlers

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* executor_spin_publisher_timer_cancelled cancel timer on beggining of the test

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

Co-authored-by: Pablo Garrido <pablogs9@gmail.com>
(cherry picked from commit 27b4d4f)

Co-authored-by: Ilya guterman <amfernusus@gmail.com>
JanStaschulat pushed a commit that referenced this pull request Nov 29, 2021
* executor ignore canceled timers

this is similar behavior observed in rclcpp, canceled timers shouldn't be handled

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* Add regression test

Signed-off-by: Your Name <you@example.com>

* Uncrustify

Signed-off-by: Your Name <you@example.com>

* Fix warning unused

Signed-off-by: Your Name <you@example.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* return sucess on cancled timer

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* validate cancled timer doesn't impact the execution of other handlers

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

* executor_spin_publisher_timer_cancelled cancel timer on beggining of the test

Signed-off-by: Ilya Guterman <amfernusus@gmail.com>

Co-authored-by: Pablo Garrido <pablogs9@gmail.com>
(cherry picked from commit 27b4d4f)

Co-authored-by: Ilya guterman <amfernusus@gmail.com>
@amfern
Copy link
Contributor Author

amfern commented Nov 29, 2021

Thanks @JanStaschulat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants