New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Not a real PR) Diagnosing automated tests #993
Conversation
1dd8969
to
5dc8413
Compare
@rwightman will come back to this, but just fyi that I've been able to totally exclude fx tests as the culprit. See experiments 1 and 3 which confirm that basically, FX tests are able to all pass when done either alone, or first. In any case, seems that regardless of disabling FX tests, this is going to become a problem when there are more models anyway. |
@alexander-soare interesting, thanks for the analysis. I was noticing as I made it further into the FX tests it was starting to fail on smaller and smaller models, so yeah, seems like there might be some memory fragmentation, GC, or circular-ref / leak issues. Maybe I'll try inserting some forced GC cleanup in an upcoming PR and see what happens... |
@rwightman just note experiment 4 - I inserted |
@alexander-soare running locally I just tried installing pytest-xdist and running pytest with the --forked flag, it runs tests in different processes and appears to prevent some memory baggage accumulating, regardless of what it is (fragmentation, etc). I'll try this with my next PR. |
@rwightman great, and in case that doesn't sort it out, my last experiment passed. Separately running each test file (with fx tests having their own file). It's kicking the can down the road as it doesn't fix the root cause though. lmk |
First, get rid of all
EXCLUDE_FX_FILTERS
(and keep it this way unless otherwise noted)Experiment 1 - PASSED
Do only the 3 FX tests
Experiment 2 - CANCELED
Enable all tests
Ubuntu CANCELED in 2h 41m 40s ending at:
Interestingly, notice the time difference between the last passed test and the
[error]The operation was canceled.
, about 23 mins. Why was it canceled?Experiment 3 - OOM
Swap the positions of all 3 FX tests with their non FX counterparts. Everything enabled as usual.
Ubuntu OOM in 2h 13m 26s ending at:
Experiment 4 - OOM
All tests enabled and in original order. Do
gc.collect()
in all of them.Ubuntu OOM in 1h 36m 33s at:
Experiment 5 - OOM
Remove
gc.collect()
from exp 3. Separate FX tests into their own file.Ubuntu OOM (well, didn't really expect that to work...) in 1h 28m 40s at:
Experiment 6 - PASSED
Not only separate FX tests into their own file, but run each test file separately in the git workflow.
Ubuntu - PASSED
Mac -