-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: fail slow tests (not --full
)
#20672
Conversation
[skip cirrus] [skip circle]
|
||
- name: Build | ||
run: | | ||
python dev.py build --with-scipy-openblas | ||
|
||
- name: Test | ||
run: | | ||
python dev.py test -j2 | ||
python dev.py test -j2 -- --durations=0 --durations-min=0.25 --fail-slow=1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think the original experiment of adding the one CI job with pytest-fail-slow
below, with a 5-second global threshold to catch truly egregious single test cases, was a good thing.
That said, I'm less keen on adding another CI job and a second timing threshold to be aware of. I think this is going to be subjective of course, but it seems to me that we'd just be discouraging one form of total test time creep, but there are many others like overparametrization that would add up as well.
At some point, the code reviewer will have to look the tests over on a case by case basis to see what makes sense and what doesn't. I think the 5 second cutoff for "you should take a look at that" make sense, but an additional 1 second cutoff is starting to make the automatic catching a bit complex/picky for my taste. It may be preferable to have a single 1.1 second test vs. a parametrized test with 37 0.8 second cases of course, and I don't know if I'd want the CI weighing in on that this way.
That said, I don't care that much. In general, I'd probably opt for just putting the exception markers on anything that gets flagged in spatial
rather than hiding the tests in the full
/slow mode, which I often don't use when iterating locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total time of the spatial
tests on the list is only 10s, so that seems OK to me to make an exception and give them 5s.
I just re-ran the duration test after the DE tweaks I made. I was surprised to see no change. What's interesting is what happens when the test ran. I expect the PR to be merged into the tip of scipy/main, currently 7e69656, before the tests run. |
@sturlamolden I'm looking over long test durations and came across |
@andyfaff I'll merge main manually tonight to see what effect that had. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, guess I forgot to run this file locally.
[skip cirrus] [skip circle]
Something weird is going on with two of those windows tests. The build step is taking twice as long (~24 mins) as the other (11 min). The fast build uses |
Hmm I didn't change that part. This run is from April 29, before the first of these PRs with |
Not using Pythran cuts the time from 24 mins to ~18. But I don't think it's the whole story. |
It is to force the scheduler to release the reminder of the time slice and allow another thread to execute. Otherwise we are not testing concurrent access to cobyla. |
It might be that sleep(0) is sufficient, but I think we needed sleep(0.1) to actually trigger the segfault with the original code. |
[skip cirrus] [skip circle]
@rgommers I modified tests and made exceptions mostly as recommended by maintainers of each subpackage, although sometimes I just gave tests an exception rather than marking them as slow. I think this accounts for most tests that take >0.5s, but the threshold for failure is 1s, so we should be pretty robust w.r.t. variation in execution time. The ony question I have is why macOS tests / Conda & umfpack/scikit-sparse, fast, py3.11/npAny, dev.py (3.11) failed in the previous run. It looks like it failed due to |
--full
)--full
)
Why wouldn't it? The test reads: @pytest.mark.fail_slow(5)
def test_cobyla_threadsafe(): https://github.com/jwodder/pytest-fail-slow?tab=readme-ov-file#failing-slow-tests says: "To cause a specific test to fail if it takes too long to run, apply the |
I see. I was thinking of it being activated with a command line option like |
Indeed. I think we should not over-use |
As a follow-up, I can look into options for using it only when a command line argument is passed or an environment variable is set so that it would work like I had envisioned. Or we could simply remove it from |
This sounds fine to me for now, easiest to do that and merge this PR. |
We only really want pytest-fail-slow to be used in certain CI jobs; this is the simple way to get that. [skip circle] [skip cirrus]
OK. I thought about just commenting it out with a note and/or adding a tip in the developer documentation, but for now just removed so as not to hold this up. |
Oops @steppi I didn't add an exception for I'll mark it for now and maybe file an issue about the marked tests. |
[skip cirrus]
Ok, that should do it. |
@rgommers sounded like you might be interested in merging this one. If so, LMK when you'll have a chance for a last look and I'll re-run CI before then to make sure there are no new slow tests to adjust before merging. |
Yes indeed - now or tomorrow should work. |
Oops. I had applied an exception to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now looks about as clean as it is going to get, and all the new instructions in the docs are clear and work as advertised in my testing.
I'm still a little worried that the amount of flaky failures is going to be on the high side, but there is only one way to find out - which is to start using this. So in it goes - thanks Matt!
Reference issue
Followup to gh-20480
What does this implement/fix?
This PR causes CI to report a failure if a test not marked
slow
orxslow
takes over 1s to complete. Exceptions can be made using@pytest.mark.fail_slow
.Additional information
The intent is to distribute the job of keeping test times reasonable to all PR authors (rather than requiring maintainers to perform cleanups periodically).