-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test for parallel qc crash #2628
Conversation
@cfhammill I am able to get it crashed using your test, see here. Note that it does not fail all the time (see below). I am running 64f99ea. Regardless the number of iterations and jobs, the test always crashes. E.g., I used that config and the test crashed: with multiprocessing.Pool(2) as p:
p.map(gen_qc, range(5)) However, when running the test another time, it always passes. It is only if I manually remove the I am running on an OSX Mojave. Are you an Linux? Maybe it is an Apple file system specific issue? |
@jcohenadad yes I was running on ubuntu 17.10, so maybe it is file system specific. I started at two cores parallelism and 100 trials and worked my way up, periodically clearing Looking at the travis results, there's a failure on Ubuntu 14.04 that seems independent of the problem (I think it's mad that I pool mapped a function with no return value), and failing on mac high sierra (although passing on mojave). Seems suggestive that this is a mac specific problem. I don't have a mac to test on unfortunately, but I can implement the fix suggested in #2330. We can see if the problem resolves on travis. EDIT: On a closer look I don't think it's the dummy return problem, but I've added one to test anyway. |
@cfhammill works on 18.04, no? if it is a stupid Mojave-specific problem, then maybe we could try to use another folder creation function, or play with the "ignore error" flags when creating folders... |
It fails on 18.04 in the next push: https://travis-ci.org/neuropoly/spinalcordtoolbox/jobs/659871081?utm_medium=notification&utm_source=github_status I don't know if I'm satisfied this test triggers the bug reliably enough, @jcohenadad what do you think? Should I invest more time making the test robust or just go ahead and try the fix? |
hum... i would say, try the fix. Even if the test fails with 20% probability, we do so many pushes/PRs that i'm sure within days it will fail again (if the fix does not work). EDIT: i've restarted the job on Travis to see if it fails again on 18.04 (to get a sense of the reliability of the test) EDIT 2020-03-09 21:31:56: After a restart job, the test passes: https://travis-ci.org/neuropoly/spinalcordtoolbox/jobs/659871081?utm_medium=notification&utm_source=github_status. Let me restart again. EDIT: hum, passed again. Although I am wondering if, by restarting a job, the |
Presently unable to reproduce the failure using multiprocess
See if the ubuntu error resolves and the mac error persists
Should resolve the error in #2330
37a4ebe
to
679fb6b
Compare
Presently unable to reproduce the failure in #2330 using multiprocess but I figured it would be useful to get some extra eyes on this.
Using
multiprocess
with 6 processes generating 300 reports fails to trip this bug.@jcohenadad any thoughts on how to increase our chances of encountering the bug? It is possible I'm generating the reports wrong, or perhaps two different reporters need to be used.