Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax multithreaded test constraint #907

Merged
merged 1 commit into from
Oct 30, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions rclcpp/test/executors/test_multi_threaded_executor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,9 @@ TEST_F(TestMultiThreadedExecutor, timer_over_take) {
double diff = std::abs((now - last).nanoseconds()) / 1.0e9;
last = now;

if (diff < PERIOD - TOLERANCE || diff > PERIOD + TOLERANCE) {
if (diff < PERIOD - TOLERANCE) {
executor.cancel();
ASSERT_GT(diff, PERIOD - TOLERANCE);
Copy link
Member

@ivanpauno ivanpauno May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is first checking diff < UPPER_BOUND, and then asserting diff > UPPER_BOUND? (with UPPER_BOUND = PERIOD - TOLERANCE)
how the second condition can happen after the first check?

this is failing sometimes in CI: https://ci.ros2.org/view/nightly/job/nightly_linux_repeated/1858/testReport/junit/rclcpp/TestMultiThreadedExecutor/timer_over_take/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand my previous comment (#907 (comment)), I think the intention is that if diff < UPPER_BOUND, then we want the test fail. I admit it is confusing, and we could probably clarify with a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, so in that case we want the test to unconditionally fail, right?
I think that either adding a comment or using FAIL() would be more clear (now that we're checking only one of the conditions).

Copy link
Member

@jacobperron jacobperron May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

Regarding the test failure, I'm still not sure if the bug lies in the test code or not. Having timing dependent tests are tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the test failure, I'm still not sure if the bug relies in the test code or not. Having timing dependent tests are tricky.

I have taken a look, and it seems to be a flaky test (though the timer is triggered MUCH BEFORE than expected).
I've read the original PR #383 that motivated these changes, the only change I see in the multithreaded executor since then is #836, which doesn't seem to be related.

The test seems to be super flaky. I have checked the last few repeated jobs, and it has only failed once.
IMO, it sounds like we can ignore it.

@wjwwood do you think that any of the reset changes can be related to this? or is it just a flaky test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a PR increasing period/tolerance by an order of magnitude, and it failed too #1105 (comment).

There seems to be a real bug, so it should be triagged/fixed.

Copy link
Member

@ivanpauno ivanpauno May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be connected with this other failure too ros2/rcl#640.

(edit) Doesn't sound like it's really related.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a PR

Thanks!

it should be triagged/fixed.

Until it's fixed, the test should be either commented out or marked as xfail so it stops showing on CI results. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if xfail is intended to be used only on flaky tests, or also in tests that are failing due to something being actually broken.

If that mark is also intended for the second case, then it's ok to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that mark is also intended for the second case, then it's ok to do it.

I believe so: ros2/ros2#900

ASSERT_LT(diff, PERIOD + TOLERANCE);
}
}
};
Expand Down