retest-until-fail vs repeat-until-fail #19

mikaelarguedas · 2016-12-12T09:17:52Z

By investigating flaky tests I noticed that the repeated nightlies are using the --retest-until-fail <N> option, whereas when testing locally I run ament test with the option --ctest-args --repeat-until-fail <N> --

I'm wondering if --retest-until-fail <N> has the desired behavior for our repeated jobs. Below explained the difference of behavior between the two options

ctest --repeat-until-fail <N>:
Ensures that all tests are run until failure or N runs

for test in tests_of_this_package:
    for i in range(N):
        rc = run_test(test)
        if rc:
           break  # move on to the next TEST

ament test --retest-until-fail <N>
Stop testing a package as soon as a single test of this package fails

for i in range(N):
    rc = 0
    for test in tests_of_this_package:
        rc += run_test(test)
    if rc:
        break # move on to the next PACKAGE

This could explain why the reported flaky tests are different every night. Especially on packages with a lot of tests like the test_communication

Options we may want to consider:

change the default test argument for the repeated nightlies to use the ctest option repeat-until-fail
modify ament behavior to blacklist the failed tests but repeat the passed ones up to N times
do a run or two with the ctest --repeat-until-fail <N> argument to have an exhaustive list of the flaky tests

To have an idea of how long the nightlies would take with ctest --repeat-until-fail 20 I launched http://ci.ros2.org/job/ci_windows/2030/, on windows it took 5h30min instead of 2h

The text was updated successfully, but these errors were encountered:

dirk-thomas · 2016-12-12T17:12:36Z

The option you use locally only applies to CMake packages. It does not cover non-ctest tests like nose. So you are not checking if those are flaky.

The nightly job was designed to catch if there are any flaky tests and "fail fast". Since the goal is to not have any flaky tests there was no focus on enumerating all of them. As you have noticed doing so requires a lot of time.

mikaelarguedas · 2016-12-12T17:25:20Z

Good point for the non Cmake packages.

Thanks for the explanation I was not sure if it was intentional behavior or an overlooked side effect.
I didn't know that the motivation was to find if there is A flaky test in a package and not repeating every test to see which ones are flaky.
I guess I'll tag this as wontfix then

dirk-thomas · 2017-04-12T14:26:57Z

@mikaelarguedas what is the status of this ticket? Should it be closed?

mikaelarguedas · 2017-04-12T14:52:45Z

If there is no solution at all it can be closed or marked as untargeted.
I do still think that the current logic for retesting is suboptimal and doesnt allow us to identify all the flaky tests because it's stopping at the first one it encounters in a package.

dirk-thomas · 2017-04-12T16:00:09Z

@mikaelarguedas Can you update this issue into a todo (rather than a question) describing what should be done?

mikaelarguedas · 2017-04-12T22:17:00Z

TODO: Change the behaviour of repeat-until-fail to:

for each package
- create an empty "skip list"
- while the number of reruns is less than the specified N
  - for each test not in the skip list
    - run test
    - if the test fails mark it as fail and add it to the skipped list

I am not sure how doable it is and how much work this will require but that will be a more accurate way to identify all flaky tests

mikaelarguedas · 2018-05-29T13:12:50Z

I believe this can be closed as it has been addressed by the switch to colcon. From my understanding, colcon runs and repeat tests one by one in a package (compared to ament tools that was running then entire test suite of a package N times). So tests in a package are not impacted by the result of other tests from the same package anymore.

@dirk-thomas Feel free to comment or re-open if I missed something here.

mikaelarguedas added the question Further information is requested label Dec 12, 2016

mikaelarguedas added enhancement New feature or request and removed question Further information is requested labels Apr 12, 2017

mikaelarguedas mentioned this issue Jul 21, 2017

Tracking flaky tests ros2/build_farmer#41

Closed

mikaelarguedas closed this as completed May 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retest-until-fail vs repeat-until-fail #19

retest-until-fail vs repeat-until-fail #19

mikaelarguedas commented Dec 12, 2016 •

edited

Loading

dirk-thomas commented Dec 12, 2016

mikaelarguedas commented Dec 12, 2016

dirk-thomas commented Apr 12, 2017

mikaelarguedas commented Apr 12, 2017

dirk-thomas commented Apr 12, 2017

mikaelarguedas commented Apr 12, 2017

mikaelarguedas commented May 29, 2018

retest-until-fail vs repeat-until-fail #19

retest-until-fail vs repeat-until-fail #19

Comments

mikaelarguedas commented Dec 12, 2016 • edited Loading

dirk-thomas commented Dec 12, 2016

mikaelarguedas commented Dec 12, 2016

dirk-thomas commented Apr 12, 2017

mikaelarguedas commented Apr 12, 2017

dirk-thomas commented Apr 12, 2017

mikaelarguedas commented Apr 12, 2017

mikaelarguedas commented May 29, 2018

mikaelarguedas commented Dec 12, 2016 •

edited

Loading