[feat] Add `--maxfail` option to stop execution after a certain number of failures #1676

ekouts · 2021-01-07T14:44:50Z

Fixes #941 .

vkarak

We also need a unit test + documentation.

reframe/frontend/cli.py

reframe/frontend/executors/policies.py

reframe/core/exceptions.py

vkarak · 2021-01-11T17:36:07Z

@ekouts Is this still a draft?

vkarak · 2021-01-11T17:37:08Z

Can you also fix the unused import errors (see the CI checks).

reframe/frontend/cli.py

reframe/frontend/executors/__init__.py

reframe/frontend/executors/policies.py

ekouts · 2021-01-12T07:25:35Z

@ekouts Is this still a draft?

Yes, because I still need to write documentation + unittests. I will fix them and change it.

…t/max-failures

vkarak

We also need unit tests for the policies + documentation.

reframe/frontend/cli.py

vkarak

I played a bit with the feature and it's not exactly as I have imagined it. I've run ./bin/reframe -c unittests/resources/checks/frontend_checks.py -r --maxfail=1 and the output is not intuitive. Let's start with the baseline:

[  FAILED  ] Ran 6 test case(s) from 6 check(s) (5 failure(s))

When running with --maxfail=1, I get

[  FAILED  ] Ran 6 test case(s) from 6 check(s) (6 failure(s))

This is wrong, because you mark as failures all other tests, like sending a keyboard interrupt. Additionally, in the output I see 6 failures instead of just one, which does not serve the purpose of this option, which is basically to avoid the "fail clutter." When the --maxfail=N is passed I expect to see

[  FAILED  ] Ran X test case(s) from Y check(s) (N failure(s))

And the summary must contain exactly N failure infos.

The obvious thing to ask is why we treat KeyboardInterrupt differently. The answer is that it was just like that, but now I think that we should probably handle the KeyboardInterrupt and perhaps all the abort reasons in the same way and avoid "fail clutter" all the way. If I'm running 1000 test cases, and I type Ctrl-C, I don't want to see 1000 failure infos!

docs/manpage.rst

unittests/test_policies.py

ekouts · 2021-01-14T08:38:10Z

The obvious thing to ask is why we treat KeyboardInterrupt differently. The answer is that it was just like that, but now I think that we should probably handle the KeyboardInterrupt and perhaps all the abort reasons in the same way and avoid "fail clutter" all the way. If I'm running 1000 test cases, and I type Ctrl-C, I don't want to see 1000 failure infos!

I agree with but I have some more questions:

When maxfail is reached should tests be retried? For KeyboardInterrupt I assume not.
Do I include the aborted tests in the json report?

And the summary must contain exactly N failure infos.

In the asynchronous policy is it okay if it is not always exact? We can do it but this might not include tests that have actually finished in the meantime and failed for other reasons (eg failed dependency)

ekouts · 2021-01-14T09:07:10Z

@vkarak Some more questions:

When the --maxfail=N is passed I expect to see
[ FAILED ] Ran X test case(s) from Y check(s) (N failure(s))

I am not sure what the X number should be in this case, the ones that actually finished? Or the total number of the ones that have at least been setup? In the asynchronous policy all tests are trying to do the setup before the first failure prints the message of failure which can lead to confusing output this way.

Let's say we have --maxfai=1 and a tests fails in setup, then how should the output continue? As usual with [ RUN ] Testcase on partition using env of the rest of the testcases? Or move on directly to [----------] waiting for spawned checks to finish and print the successful + failed tests so far? In the first case you might get extra failures and not have exactly N failures.

vkarak · 2021-01-14T12:01:12Z

When maxfail is reached should tests be retried? For KeyboardInterrupt I assume not.

No need to retry them, I agree.

Do I include the aborted tests in the json report?

Good point. I would say not, but perhaps we need a way to identify directly or indirectly how many were aborted (e.g., we might need a num_cases_run property or similar).

In the asynchronous policy is it okay if it is not always exact? We can do it but this might not include tests that have actually finished in the meantime and failed for other reasons (eg failed dependency)

I'm fine of not being exact with the async policy.

I am not sure what the X number should be in this case, the ones that actually finished? Or the total number of the ones that have at least been setup? In the asynchronous policy all tests are trying to do the setup before the first failure prints the message of failure which can lead to confusing output this way.

Those that have actually finished, I would say.

Let's say we have --maxfai=1 and a tests fails in setup, then how should the output continue? As usual with [ RUN ] Testcase on partition using env of the rest of the testcases? Or move on directly to [----------] waiting for spawned checks to finish and print the successful + failed tests so far? In the first case you might get extra failures and not have exactly N failures.

None of this. As soon as it reached the maximum failure count, we abort everything (kill those pending) and report everything that has been finalized up until this point. This should be exactly as with keyboard interrupt. So I think this answers also the exactness question, right? If you abort immediately, you don't finalize any pending test, so the number of failures should be exact, shouldn't?

pep8speaks · 2021-01-21T09:27:27Z

Hello @ekouts, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2021-01-28 21:32:26 UTC

codecov-io · 2021-01-21T11:03:12Z

Codecov Report

Merging #1676 (40a02f4) into master (9d85b5f) will increase coverage by 0.19%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           master    #1676      +/-   ##
==========================================
+ Coverage   87.21%   87.40%   +0.19%     
==========================================
  Files          45       46       +1     
  Lines        7546     7688     +142     
==========================================
+ Hits         6581     6720     +139     
- Misses        965      968       +3

Impacted Files	Coverage Δ
reframe/frontend/statistics.py	`95.31% <72.72%> (-1.43%)`	⬇️
reframe/frontend/executors/__init__.py	`98.19% <96.00%> (+0.09%)`	⬆️
reframe/core/exceptions.py	`94.20% <100.00%> (+0.04%)`	⬆️
reframe/frontend/cli.py	`77.28% <100.00%> (+1.26%)`	⬆️
reframe/frontend/executors/policies.py	`99.67% <100.00%> (+0.34%)`	⬆️
reframe/frontend/runreport.py	`92.55% <100.00%> (ø)`
reframe/frontend/dependencies.py	`96.50% <0.00%> (-1.99%)`	⬇️
reframe/core/fields.py	`98.71% <0.00%> (-0.28%)`	⬇️
reframe/core/runtime.py	`89.63% <0.00%> (-0.07%)`	⬇️
reframe/core/schedulers/local.py	`95.86% <0.00%> (-0.04%)`	⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d85b5f...40a02f4. Read the comment docs.

vkarak

The output is ok, I had a minor comment on how you log it. I have some issues with the implementation, which I have commented.

unittests/test_policies.py

reframe/frontend/executors/__init__.py

reframe/frontend/executors/policies.py

reframe/frontend/executors/__init__.py

reframe/frontend/statistics.py

reframe/frontend/cli.py

reframe/frontend/executors/__init__.py

unittests/test_policies.py

reframe/frontend/executors/__init__.py

- Such requests simply get up to the CLI and are handled there based on their type. For getting the error message, the `what()` function is used as in the failure summary and stack traces are logged in different levels based on the exception types. Stack traces for requests for exit (e.g., keyboard interrupt force exit due to SIGTERM and failure limit errors) are logged in DEBUG2 level.

vkarak

lgtm now

vkarak · 2021-01-27T23:01:20Z

The modules system unit tests fail for some reason. We need to have a look. Other than that the PR is fine.

If we have this, it forces us not to monkeypatch MODULEPATH when we need it, which can lead to random failures of the unit tests.

Add maximum limit of failures in the cli

2bd5dc0

ekouts added prio: normal enhancement labels Jan 7, 2021

ekouts added this to the ReFrame sprint 21.01 milestone Jan 7, 2021

ekouts self-assigned this Jan 7, 2021

vkarak requested review from teojgo and vkarak January 8, 2021 10:40

vkarak requested changes Jan 8, 2021

View reviewed changes

ekouts mentioned this pull request Jan 8, 2021

[bugfix] Fix numbering of failed tasks when reframe execution is aborted #1673

Merged

teojgo reviewed Jan 8, 2021

View reviewed changes

reframe/core/exceptions.py Outdated Show resolved Hide resolved

Address PR comments

e6154b7

vkarak reviewed Jan 11, 2021

View reviewed changes

reframe/frontend/cli.py Outdated Show resolved Hide resolved

reframe/frontend/executors/__init__.py Outdated Show resolved Hide resolved

reframe/frontend/executors/policies.py Outdated Show resolved Hide resolved

reframe/frontend/executors/policies.py Outdated Show resolved Hide resolved

ekouts added 3 commits January 12, 2021 14:36

Merge branch 'master' of https://github.com/eth-cscs/reframe into fea…

33bb534

…t/max-failures

Address PR comments

e21dbb3

Remove unsused imports

6918cb2

vkarak requested changes Jan 12, 2021

View reviewed changes

reframe/frontend/cli.py Outdated Show resolved Hide resolved

reframe/frontend/cli.py Outdated Show resolved Hide resolved

ekouts added 2 commits January 13, 2021 16:59

Add unittests

06fe155

Add manpage documentation

3320ad3

vkarak requested changes Jan 13, 2021

View reviewed changes

docs/manpage.rst Outdated Show resolved Hide resolved

unittests/test_policies.py Outdated Show resolved Hide resolved

vkarak marked this pull request as ready for review January 14, 2021 12:03

ekouts added 2 commits January 18, 2021 15:28

Fix manpage text for maxfail

7ca9519

Make output less verbose when execution is aborted

d0dc85e

Fix pepe8 issues

e6cb3a2

ekouts added 2 commits January 21, 2021 10:34

remove unused exceptions from cli

febd127

Make aborted test cases fail silently

5247f13

vkarak requested changes Jan 21, 2021

View reviewed changes

vkarak modified the milestones: ReFrame sprint 21.01, ReFrame sprint 21.02 Jan 25, 2021

ekouts added 3 commits January 27, 2021 09:18

Update maxfail implementation

a89bdfb

Inverse logic to avoid indentation

ca992bc

Update runreport schema

445550d

vkarak requested changes Jan 27, 2021

View reviewed changes

ekouts added 3 commits January 27, 2021 17:38

Address PR comments

61d118f

Fix typo

482854a

update maxfail unittest

0d0c7f7

vkarak reviewed Jan 27, 2021

View reviewed changes

reframe/frontend/executors/__init__.py Outdated Show resolved Hide resolved

Vasileios Karakasis added 3 commits January 27, 2021 23:31

Fix unit tests

51e12d6

Rename ReframeForceExitError to ForceExitError

40a02f4

vkarak approved these changes Jan 27, 2021

View reviewed changes

vkarak changed the title ~~[feat] Add maximum limit of failures in the cli~~ [feat] Add new --maxfail option to stop execution after a certain number of failures Jan 27, 2021

vkarak changed the title ~~[feat] Add new --maxfail option to stop execution after a certain number of failures~~ [feat] Add --maxfail option to stop execution after a certain number of failures Jan 27, 2021

Vasileios Karakasis added 4 commits January 28, 2021 10:10

Merge branch 'master' into feat/max-failures

a61ab6b

Reset MODULEPATH in modules-related utility unit tests

a8c5309

Merge branch 'master' into feat/max-failures

d06af76

Remove 'daint-gpu' module from CI config

42a8bda

If we have this, it forces us not to monkeypatch MODULEPATH when we need it, which can lead to random failures of the unit tests.

vkarak merged commit fd068dc into reframe-hpc:master Jan 28, 2021

ekouts deleted the feat/max-failures branch January 29, 2021 07:34

[feat] Add --maxfail option to stop execution after a certain number of failures #1676

[feat] Add --maxfail option to stop execution after a certain number of failures #1676

Uh oh!

Conversation

ekouts commented Jan 7, 2021

Uh oh!

vkarak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vkarak commented Jan 11, 2021

Uh oh!

vkarak commented Jan 11, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ekouts commented Jan 12, 2021

Uh oh!

vkarak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vkarak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ekouts commented Jan 14, 2021

Uh oh!

ekouts commented Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkarak commented Jan 14, 2021

Uh oh!

pep8speaks commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-01-28 21:32:26 UTC

Uh oh!

codecov-io commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vkarak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vkarak left a comment

Choose a reason for hiding this comment

Uh oh!

vkarak commented Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

[feat] Add `--maxfail` option to stop execution after a certain number of failures #1676

[feat] Add `--maxfail` option to stop execution after a certain number of failures #1676

ekouts commented Jan 14, 2021 •

edited

Loading

pep8speaks commented Jan 21, 2021 •

edited

Loading

codecov-io commented Jan 21, 2021 •

edited

Loading

vkarak commented Jan 27, 2021 •

edited

Loading