-
Notifications
You must be signed in to change notification settings - Fork 117
[feat] Add --maxfail option to stop execution after a certain number of failures
#1676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vkarak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need a unit test + documentation.
|
@ekouts Is this still a draft? |
|
Can you also fix the unused import errors (see the CI checks). |
Yes, because I still need to write documentation + unittests. I will fix them and change it. |
vkarak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need unit tests for the policies + documentation.
vkarak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played a bit with the feature and it's not exactly as I have imagined it. I've run ./bin/reframe -c unittests/resources/checks/frontend_checks.py -r --maxfail=1 and the output is not intuitive. Let's start with the baseline:
[ FAILED ] Ran 6 test case(s) from 6 check(s) (5 failure(s))
When running with --maxfail=1, I get
[ FAILED ] Ran 6 test case(s) from 6 check(s) (6 failure(s))
This is wrong, because you mark as failures all other tests, like sending a keyboard interrupt. Additionally, in the output I see 6 failures instead of just one, which does not serve the purpose of this option, which is basically to avoid the "fail clutter." When the --maxfail=N is passed I expect to see
[ FAILED ] Ran X test case(s) from Y check(s) (N failure(s))
And the summary must contain exactly N failure infos.
The obvious thing to ask is why we treat KeyboardInterrupt differently. The answer is that it was just like that, but now I think that we should probably handle the KeyboardInterrupt and perhaps all the abort reasons in the same way and avoid "fail clutter" all the way. If I'm running 1000 test cases, and I type Ctrl-C, I don't want to see 1000 failure infos!
I agree with but I have some more questions:
|
|
@vkarak Some more questions:
I am not sure what the X number should be in this case, the ones that actually finished? Or the total number of the ones that have at least been setup? In the asynchronous policy all tests are trying to do the setup before the first failure prints the message of failure which can lead to confusing output this way. Let's say we have |
No need to retry them, I agree.
Good point. I would say not, but perhaps we need a way to identify directly or indirectly how many were aborted (e.g., we might need a
I'm fine of not being exact with the async policy.
Those that have actually finished, I would say.
None of this. As soon as it reached the maximum failure count, we abort everything (kill those pending) and report everything that has been finalized up until this point. This should be exactly as with keyboard interrupt. So I think this answers also the exactness question, right? If you abort immediately, you don't finalize any pending test, so the number of failures should be exact, shouldn't? |
|
Hello @ekouts, Thank you for updating! Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide Comment last updated at 2021-01-28 21:32:26 UTC |
Codecov Report
@@ Coverage Diff @@
## master #1676 +/- ##
==========================================
+ Coverage 87.21% 87.40% +0.19%
==========================================
Files 45 46 +1
Lines 7546 7688 +142
==========================================
+ Hits 6581 6720 +139
- Misses 965 968 +3
Continue to review full report at Codecov.
|
vkarak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output is ok, I had a minor comment on how you log it. I have some issues with the implementation, which I have commented.
- Such requests simply get up to the CLI and are handled there based on their type. For getting the error message, the `what()` function is used as in the failure summary and stack traces are logged in different levels based on the exception types. Stack traces for requests for exit (e.g., keyboard interrupt force exit due to SIGTERM and failure limit errors) are logged in DEBUG2 level.
vkarak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm now
--maxfail option to stop execution after a certain number of failures
--maxfail option to stop execution after a certain number of failures--maxfail option to stop execution after a certain number of failures
|
The modules system unit tests fail for some reason. We need to have a look. Other than that the PR is fine. |
If we have this, it forces us not to monkeypatch MODULEPATH when we need it, which can lead to random failures of the unit tests.
Fixes #941 .