Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for parallel qc crash #2628

Merged
merged 6 commits into from
Apr 5, 2020
Merged

Conversation

cfhammill
Copy link
Contributor

Presently unable to reproduce the failure in #2330 using multiprocess but I figured it would be useful to get some extra eyes on this.

Using multiprocess with 6 processes generating 300 reports fails to trip this bug.

@jcohenadad any thoughts on how to increase our chances of encountering the bug? It is possible I'm generating the reports wrong, or perhaps two different reporters need to be used.

@jcohenadad
Copy link
Member

jcohenadad commented Mar 7, 2020

@cfhammill I am able to get it crashed using your test, see here. Note that it does not fail all the time (see below). I am running 64f99ea.

Regardless the number of iterations and jobs, the test always crashes. E.g., I used that config and the test crashed:

    with multiprocessing.Pool(2) as p:
        p.map(gen_qc, range(5))

However, when running the test another time, it always passes. It is only if I manually remove the /tmp/qc folder that the test crashes again.

I am running on an OSX Mojave. Are you an Linux? Maybe it is an Apple file system specific issue?

@cfhammill
Copy link
Contributor Author

cfhammill commented Mar 8, 2020

@jcohenadad yes I was running on ubuntu 17.10, so maybe it is file system specific. I started at two cores parallelism and 100 trials and worked my way up, periodically clearing /tmp/qc. Also passes on my 18.04 running under windows subsystem linux.

Looking at the travis results, there's a failure on Ubuntu 14.04 that seems independent of the problem (I think it's mad that I pool mapped a function with no return value), and failing on mac high sierra (although passing on mojave). Seems suggestive that this is a mac specific problem.

I don't have a mac to test on unfortunately, but I can implement the fix suggested in #2330. We can see if the problem resolves on travis.

EDIT: On a closer look I don't think it's the dummy return problem, but I've added one to test anyway.
EDIT: Failing on 18.04 and mojave on travis now

@jcohenadad
Copy link
Member

@cfhammill works on 18.04, no?

if it is a stupid Mojave-specific problem, then maybe we could try to use another folder creation function, or play with the "ignore error" flags when creating folders...

@cfhammill
Copy link
Contributor Author

It fails on 18.04 in the next push: https://travis-ci.org/neuropoly/spinalcordtoolbox/jobs/659871081?utm_medium=notification&utm_source=github_status

I don't know if I'm satisfied this test triggers the bug reliably enough, @jcohenadad what do you think? Should I invest more time making the test robust or just go ahead and try the fix?

@jcohenadad
Copy link
Member

jcohenadad commented Mar 10, 2020

hum... i would say, try the fix. Even if the test fails with 20% probability, we do so many pushes/PRs that i'm sure within days it will fail again (if the fix does not work).

EDIT: i've restarted the job on Travis to see if it fails again on 18.04 (to get a sense of the reliability of the test)

EDIT 2020-03-09 21:31:56: After a restart job, the test passes: https://travis-ci.org/neuropoly/spinalcordtoolbox/jobs/659871081?utm_medium=notification&utm_source=github_status. Let me restart again.

EDIT: hum, passed again. Although I am wondering if, by restarting a job, the /tmp/qc folder is still present on the Travis machine...

Presently unable to reproduce the failure using multiprocess
See if the ubuntu error resolves and the mac error persists
Should resolve the error in #2330
@cfhammill cfhammill force-pushed the 2020-03-07_2330_parallel-qc-crash branch from 37a4ebe to 679fb6b Compare March 18, 2020 21:25
@cfhammill cfhammill changed the title (WIP) Test for parallel qc crash Test for parallel qc crash Mar 29, 2020
@cfhammill cfhammill added bug category: fixes an error in the code fix:minor sct_qc context: labels Mar 29, 2020
@cfhammill cfhammill added this to the 4.2.3 milestone Mar 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug category: fixes an error in the code sct_qc context:
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants