Skip to content

Conversation

@ekouts
Copy link
Contributor

@ekouts ekouts commented Jan 7, 2021

Fixes #941 .

@ekouts ekouts added this to the ReFrame sprint 21.01 milestone Jan 7, 2021
@ekouts ekouts self-assigned this Jan 7, 2021
@vkarak vkarak requested review from teojgo and vkarak January 8, 2021 10:40
Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a unit test + documentation.

@vkarak
Copy link
Contributor

vkarak commented Jan 11, 2021

@ekouts Is this still a draft?

@vkarak
Copy link
Contributor

vkarak commented Jan 11, 2021

Can you also fix the unused import errors (see the CI checks).

@ekouts
Copy link
Contributor Author

ekouts commented Jan 12, 2021

@ekouts Is this still a draft?

Yes, because I still need to write documentation + unittests. I will fix them and change it.

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need unit tests for the policies + documentation.

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a bit with the feature and it's not exactly as I have imagined it. I've run ./bin/reframe -c unittests/resources/checks/frontend_checks.py -r --maxfail=1 and the output is not intuitive. Let's start with the baseline:

[  FAILED  ] Ran 6 test case(s) from 6 check(s) (5 failure(s))

When running with --maxfail=1, I get

[  FAILED  ] Ran 6 test case(s) from 6 check(s) (6 failure(s))

This is wrong, because you mark as failures all other tests, like sending a keyboard interrupt. Additionally, in the output I see 6 failures instead of just one, which does not serve the purpose of this option, which is basically to avoid the "fail clutter." When the --maxfail=N is passed I expect to see

[  FAILED  ] Ran X test case(s) from Y check(s) (N failure(s))

And the summary must contain exactly N failure infos.

The obvious thing to ask is why we treat KeyboardInterrupt differently. The answer is that it was just like that, but now I think that we should probably handle the KeyboardInterrupt and perhaps all the abort reasons in the same way and avoid "fail clutter" all the way. If I'm running 1000 test cases, and I type Ctrl-C, I don't want to see 1000 failure infos!

@ekouts
Copy link
Contributor Author

ekouts commented Jan 14, 2021

The obvious thing to ask is why we treat KeyboardInterrupt differently. The answer is that it was just like that, but now I think that we should probably handle the KeyboardInterrupt and perhaps all the abort reasons in the same way and avoid "fail clutter" all the way. If I'm running 1000 test cases, and I type Ctrl-C, I don't want to see 1000 failure infos!

I agree with but I have some more questions:

  1. When maxfail is reached should tests be retried? For KeyboardInterrupt I assume not.
  2. Do I include the aborted tests in the json report?

And the summary must contain exactly N failure infos.

  1. In the asynchronous policy is it okay if it is not always exact? We can do it but this might not include tests that have actually finished in the meantime and failed for other reasons (eg failed dependency)

@ekouts
Copy link
Contributor Author

ekouts commented Jan 14, 2021

@vkarak Some more questions:

When the --maxfail=N is passed I expect to see
[ FAILED ] Ran X test case(s) from Y check(s) (N failure(s))

I am not sure what the X number should be in this case, the ones that actually finished? Or the total number of the ones that have at least been setup? In the asynchronous policy all tests are trying to do the setup before the first failure prints the message of failure which can lead to confusing output this way.

Let's say we have --maxfai=1 and a tests fails in setup, then how should the output continue? As usual with [ RUN ] Testcase on partition using env of the rest of the testcases? Or move on directly to [----------] waiting for spawned checks to finish and print the successful + failed tests so far? In the first case you might get extra failures and not have exactly N failures.

@vkarak
Copy link
Contributor

vkarak commented Jan 14, 2021

When maxfail is reached should tests be retried? For KeyboardInterrupt I assume not.

No need to retry them, I agree.

Do I include the aborted tests in the json report?

Good point. I would say not, but perhaps we need a way to identify directly or indirectly how many were aborted (e.g., we might need a num_cases_run property or similar).

In the asynchronous policy is it okay if it is not always exact? We can do it but this might not include tests that have actually finished in the meantime and failed for other reasons (eg failed dependency)

I'm fine of not being exact with the async policy.

I am not sure what the X number should be in this case, the ones that actually finished? Or the total number of the ones that have at least been setup? In the asynchronous policy all tests are trying to do the setup before the first failure prints the message of failure which can lead to confusing output this way.

Those that have actually finished, I would say.

Let's say we have --maxfai=1 and a tests fails in setup, then how should the output continue? As usual with [ RUN ] Testcase on partition using env of the rest of the testcases? Or move on directly to [----------] waiting for spawned checks to finish and print the successful + failed tests so far? In the first case you might get extra failures and not have exactly N failures.

None of this. As soon as it reached the maximum failure count, we abort everything (kill those pending) and report everything that has been finalized up until this point. This should be exactly as with keyboard interrupt. So I think this answers also the exactness question, right? If you abort immediately, you don't finalize any pending test, so the number of failures should be exact, shouldn't?

@vkarak vkarak marked this pull request as ready for review January 14, 2021 12:03
@pep8speaks
Copy link

pep8speaks commented Jan 21, 2021

Hello @ekouts, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2021-01-28 21:32:26 UTC

@codecov-io
Copy link

codecov-io commented Jan 21, 2021

Codecov Report

Merging #1676 (40a02f4) into master (9d85b5f) will increase coverage by 0.19%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1676      +/-   ##
==========================================
+ Coverage   87.21%   87.40%   +0.19%     
==========================================
  Files          45       46       +1     
  Lines        7546     7688     +142     
==========================================
+ Hits         6581     6720     +139     
- Misses        965      968       +3     
Impacted Files Coverage Δ
reframe/frontend/statistics.py 95.31% <72.72%> (-1.43%) ⬇️
reframe/frontend/executors/__init__.py 98.19% <96.00%> (+0.09%) ⬆️
reframe/core/exceptions.py 94.20% <100.00%> (+0.04%) ⬆️
reframe/frontend/cli.py 77.28% <100.00%> (+1.26%) ⬆️
reframe/frontend/executors/policies.py 99.67% <100.00%> (+0.34%) ⬆️
reframe/frontend/runreport.py 92.55% <100.00%> (ø)
reframe/frontend/dependencies.py 96.50% <0.00%> (-1.99%) ⬇️
reframe/core/fields.py 98.71% <0.00%> (-0.28%) ⬇️
reframe/core/runtime.py 89.63% <0.00%> (-0.07%) ⬇️
reframe/core/schedulers/local.py 95.86% <0.00%> (-0.04%) ⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d85b5f...40a02f4. Read the comment docs.

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is ok, I had a minor comment on how you log it. I have some issues with the implementation, which I have commented.

Vasileios Karakasis added 3 commits January 27, 2021 23:31
- Such requests simply get up to the CLI and are handled there based on their
  type. For getting the error message, the `what()` function is used as in the
  failure summary and stack traces are logged in different levels based on the
  exception types. Stack traces for requests for exit (e.g., keyboard interrupt
  force exit due to SIGTERM and failure limit errors) are logged in DEBUG2
  level.
Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now

@vkarak vkarak changed the title [feat] Add maximum limit of failures in the cli [feat] Add new --maxfail option to stop execution after a certain number of failures Jan 27, 2021
@vkarak vkarak changed the title [feat] Add new --maxfail option to stop execution after a certain number of failures [feat] Add --maxfail option to stop execution after a certain number of failures Jan 27, 2021
@vkarak
Copy link
Contributor

vkarak commented Jan 27, 2021

The modules system unit tests fail for some reason. We need to have a look. Other than that the PR is fine.

Vasileios Karakasis added 4 commits January 28, 2021 10:10
If we have this, it forces us not to monkeypatch MODULEPATH when we need
it, which can lead to random failures of the unit tests.
@vkarak vkarak merged commit fd068dc into reframe-hpc:master Jan 28, 2021
@ekouts ekouts deleted the feat/max-failures branch January 29, 2021 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop execution after a certain number of failures

5 participants