Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer tests #163

Merged
merged 11 commits into from
Sep 8, 2017
Merged

Timer tests #163

merged 11 commits into from
Sep 8, 2017

Conversation

mikaelarguedas
Copy link
Member

@mikaelarguedas mikaelarguedas commented Aug 31, 2017

wip: opening this for visibility, replaces #160. Still need to write some tests for this and make sure it doesnt break anything

Note: following recommendation in #160 (comment) wait will wake up if a timer reached it's deadline even if it has been canceled. It's up to the client library to make sure to not add canceled timers to the waitset

@mikaelarguedas mikaelarguedas added the in progress Actively being worked on (Kanban column) label Aug 31, 2017
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me

@@ -0,0 +1,110 @@
// Copyright 2015 Open Source Robotics Foundation, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new file and not a moved file? If so should this be 2017?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a new file and indeed should be updated 👍

this also fixes a bug where the wait would always return when all the
timers were canceled

fixes rclcpp#319
@wjwwood
Copy link
Member

wjwwood commented Sep 1, 2017

Offline @mikaelarguedas and I looked at this and came up with 370b7fe.

This should fix the issues in rclpy as well as in rclcpp (ros2/rclcpp#319).

It should also replace #160 and makes ros2/rclpy#108 not necessary anymore.

@mikaelarguedas
Copy link
Member Author

Added a few more tests, let me know if there is any use case that you think should be covered and are not currently covered by these few tests

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@mikaelarguedas mikaelarguedas added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Sep 1, 2017
@mikaelarguedas
Copy link
Member Author

hmmm looks like we add more test exhibiting the timer inaccuracy on windows. I'll wait a bit to merge this to see if we can improve the windows situation

@dirk-thomas
Copy link
Member

I'll wait a bit to merge this to see if we can improve the windows situation

👍 to only merge this once it doesn't introduce new failing / flaky tests.

@mikaelarguedas
Copy link
Member Author

👍 to only merge this once it doesn't introduce new failing / flaky tests.

Not sure if I'll have time to investigate the windows timing issue that has been biting us for some time. The new tests are testing features that were not tested before. We can remove them from this PR and add them later when we figure out the windows timing issue. I do think that the code change should get in before beta3 though because it fixes existing bugs and provide a better behavior overall.

@dirk-thomas
Copy link
Member

Not sure if I'll have time to investigate the windows timing issue that has been biting us for some time. The new tests are testing features that were not tested before. We can remove them from this PR and add them later when we figure out the windows timing issue. I do think that the code change should get in before beta3 though because it fixes existing bugs and provide a better behavior overall.

I would suggest to split the PR into a part which can be merged now (with the same or less failing tests) and a second part which needs further work in order to not introduce new failing / flaky tests.

@mikaelarguedas
Copy link
Member Author

new (flaky) tests have been removed in 30a53cb and pushed to the timer_tests2 branch

}
}

bool using_provided_timeout_not_timer_timeout = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an inverted logic this variable could be named is_timer_timeout which sounds a bit more readable / intuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we could come up with a good name at the time of the writing. I'll update this to be clearer, thanks for the variable name suggestion

// Check for error.
if (ret != RMW_RET_OK) {
if (using_provided_timeout_not_timer_timeout) {
return RCL_RET_TIMEOUT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I fully understand the current semantic correctly. In which cases does rmw_wait return RMW_RET_TIMEOUT? Only when the passed timeout has happened? Does it matter if any of the conditions has been triggered?

With this change the semantic of RMW_RET_TIMEOUT and RCL_RET_TIMEOUT seem to be slightly different which seems confusing to me. Could instead the behavior of the rmw interface be changed (if necessary)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I fully understand the current semantic correctly. In which cases does rmw_wait return RMW_RET_TIMEOUT? Only when the passed timeout has happened?

correct, if a condition has been triggered rmw_wait would return RMW_RET_OK and the condition being non-null.

With this change the semantic of RMW_RET_TIMEOUT and RCL_RET_TIMEOUT seem to be slightly different which seems confusing to me.

Before this PR:
rmw_wait returns RMW_RET_TIMEOUT only when the provided timeout has been reached without any condition being triggered.
rcl_wait returns RCL_RET_TIMEOUT when the provided timeout has been reached or any timer (even canceled ones) has reached its deadline.

After this PR:
rmw_wait returns RMW_RET_TIMEOUT only when the provided timeout has been reached without any condition being triggered.
rcl_wait returns RCL_RET_TIMEOUT only when the provided timeout has been reached. If any condition has been triggered including timers it will return RCL_RET_OK with the corresponding timer non-nullified

The new new implementation matches my understanding of how wait should behave. Could you give more details (or an example) to support the inconsistency of the new behavior, and what type of change in the rmw interface would make more sense here ?

Copy link
Member

@dirk-thomas dirk-thomas Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we are clear that both functions should:

  • (1) return RET_TIMEOUT when the timeout expires and no other condition being triggered and
  • (2) return RET_OK when at least one condition has been triggered and the function returned before the given timeout.

The question is what should be returned when (3) the function only returns after a given timeout has expired and at the same time at least one condition has been triggered.

Without this patch there was a clear correlation: when rmw returned RMW_RET_TIMEOUT than rcl returned RCL_RET_TIMEOUT (https://github.com/ros2/rcl/pull/163/files#diff-8fda46206fc38bfdbcac0e5213bffddaL594).

With this patch the rmw API seems to return RMW_RET_TIMEOUT in the third case but the rcl API will return RCL_RET_OK (https://github.com/ros2/rcl/pull/163/files#diff-8fda46206fc38bfdbcac0e5213bffddaR626). Am I reading that correctly? If yes, the question is if that different semantic is intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have the same reading here.
(3) the function only returns after a given timeout has expired and at the same time at least one condition has been triggered.

In this case:
the "function" is rcl_wait
the "conditions" are "all rmw_waitset entities" + "rcl timers".
"a given timeout" is the user specified timeout

the function only returns after a given timeout has expired

so we reach the user specified timeout and not a timer, correct? In that case we do return RCL_RET_TIMEOUT regardless of if a condition has been met or not (same behavior as before)

So I don't think we changed anything in the existing semantic here.

Then if we use a timer timeout and not the one specified by the user, that means that the function will return before the user specified timeout, thus should return RCL_RET_OK regardless of the return code of rmw_wait.
Is that correct ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dirk-thomas @wjwwood can you confirm if my reading is correct here or if I need to change something in this PR to keep the correct semantic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this patch there was a clear correlation: when rmw returned RMW_RET_TIMEOUT than rcl returned RCL_RET_TIMEOUT (https://github.com/ros2/rcl/pull/163/files#diff-8fda46206fc38bfdbcac0e5213bffddaL594).

I think that was what was inconsistent with it. A condition may have been met (i.e. a timer was ready), but timeout was returned by rcl too despite the fact rcl had more conditions to consider. To me this makes rcl and rmw more consistent with each other, i.e. ret timeout only returned when the given timeout was exceeded and not when a condition was met instead.

@mikaelarguedas I think it looks right too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some offline discussion we were able to resolve the misunderstanding. rmw returns TIMEOUT when it wakes up by the timeout specified rather than any other condition. rcl might either pass the user provided timeout to rmw or the shortest timer - whichever is lower.

So before rcl was returning a TIMEOUT even though the timeout signaled by rmw was due to a timer and not due to the user provided timeout. With this patch rcl will not return TIMEOUT anymore when the timeout coming from rmw was due to a timer.

With this being said the change in the test_wait.cpp to expect OK rather than TIMEOUT in this case makes a lot more sense 😉

👍 ➕ 🚢

@mikaelarguedas mikaelarguedas merged commit 1fa4aca into master Sep 8, 2017
@mikaelarguedas mikaelarguedas deleted the timer_tests branch September 8, 2017 17:22
@mikaelarguedas mikaelarguedas removed the in review Waiting for review (Kanban column) label Sep 8, 2017
@mikaelarguedas mikaelarguedas mentioned this pull request Sep 8, 2017
Karsten1987 pushed a commit that referenced this pull request Nov 11, 2017
* dont return timeout errcode if we didnt hit user specified timeout

* update axisting test accordingly

* add timer test

* Revert "dont return timeout errcode if we didnt hit user specified timeout"

This reverts commit f44a93b.

* alternative fix to not return timeout when timer is ready

this also fixes a bug where the wait would always return when all the
timers were canceled

fixes rclcpp#319

* fixups

* more tests

* another one just for fun

* remove new tests

* clearer variable name

* update comment
ivanpauno pushed a commit that referenced this pull request Jan 2, 2020
* pass context to wait set

Signed-off-by: William Woodall <william@osrfoundation.org>

* add fini for context

Signed-off-by: William Woodall <william@osrfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants