Test for parallel qc crash #2628

cfhammill · 2020-03-07T17:59:33Z

Presently unable to reproduce the failure in #2330 using multiprocess but I figured it would be useful to get some extra eyes on this.

Using multiprocess with 6 processes generating 300 reports fails to trip this bug.

@jcohenadad any thoughts on how to increase our chances of encountering the bug? It is possible I'm generating the reports wrong, or perhaps two different reporters need to be used.

jcohenadad · 2020-03-07T18:59:34Z

@cfhammill I am able to get it crashed using your test, see here. Note that it does not fail all the time (see below). I am running 64f99ea.

Regardless the number of iterations and jobs, the test always crashes. E.g., I used that config and the test crashed:

    with multiprocessing.Pool(2) as p:
        p.map(gen_qc, range(5))

However, when running the test another time, it always passes. It is only if I manually remove the /tmp/qc folder that the test crashes again.

I am running on an OSX Mojave. Are you an Linux? Maybe it is an Apple file system specific issue?

cfhammill · 2020-03-08T18:21:29Z

@jcohenadad yes I was running on ubuntu 17.10, so maybe it is file system specific. I started at two cores parallelism and 100 trials and worked my way up, periodically clearing /tmp/qc. Also passes on my 18.04 running under windows subsystem linux.

Looking at the travis results, there's a failure on Ubuntu 14.04 that seems independent of the problem (I think it's mad that I pool mapped a function with no return value), and failing on mac high sierra (although passing on mojave). Seems suggestive that this is a mac specific problem.

I don't have a mac to test on unfortunately, but I can implement the fix suggested in #2330. We can see if the problem resolves on travis.

EDIT: On a closer look I don't think it's the dummy return problem, but I've added one to test anyway.
EDIT: Failing on 18.04 and mojave on travis now

jcohenadad · 2020-03-09T20:00:25Z

@cfhammill works on 18.04, no?

if it is a stupid Mojave-specific problem, then maybe we could try to use another folder creation function, or play with the "ignore error" flags when creating folders...

cfhammill · 2020-03-10T00:56:53Z

It fails on 18.04 in the next push: https://travis-ci.org/neuropoly/spinalcordtoolbox/jobs/659871081?utm_medium=notification&utm_source=github_status

I don't know if I'm satisfied this test triggers the bug reliably enough, @jcohenadad what do you think? Should I invest more time making the test robust or just go ahead and try the fix?

jcohenadad · 2020-03-10T01:13:55Z

hum... i would say, try the fix. Even if the test fails with 20% probability, we do so many pushes/PRs that i'm sure within days it will fail again (if the fix does not work).

EDIT: i've restarted the job on Travis to see if it fails again on 18.04 (to get a sense of the reliability of the test)

EDIT 2020-03-09 21:31:56: After a restart job, the test passes: https://travis-ci.org/neuropoly/spinalcordtoolbox/jobs/659871081?utm_medium=notification&utm_source=github_status. Let me restart again.

EDIT: hum, passed again. Although I am wondering if, by restarting a job, the /tmp/qc folder is still present on the Travis machine...

Presently unable to reproduce the failure using multiprocess

See if the ubuntu error resolves and the mac error persists

Should resolve the error in #2330

cfhammill added 3 commits March 18, 2020 17:24

[WIP] Test for parallel qc crash

ba0ba88

Presently unable to reproduce the failure using multiprocess

(WIP) Lower parallelism and add dummy return

a0da3e6

See if the ubuntu error resolves and the mac error persists

qc.py: Set exist_ok in makedirs

679fb6b

Should resolve the error in #2330

cfhammill force-pushed the 2020-03-07_2330_parallel-qc-crash branch from 37a4ebe to 679fb6b Compare March 18, 2020 21:25

cfhammill changed the title ~~(WIP) Test for parallel qc crash~~ Test for parallel qc crash Mar 29, 2020

cfhammill requested a review from jcohenadad March 29, 2020 19:16

cfhammill added bug category: fixes an error in the code fix:minor sct_qc context: labels Mar 29, 2020

cfhammill added this to the 4.2.3 milestone Mar 29, 2020

jcohenadad approved these changes Apr 2, 2020

View reviewed changes

jcohenadad added 3 commits April 2, 2020 15:34

Merge branch 'master' into 2020-03-07_2330_parallel-qc-crash

63b94d4

Merge branch 'master' into 2020-03-07_2330_parallel-qc-crash

3330d54

Merge branch 'master' into 2020-03-07_2330_parallel-qc-crash

b0fe9ea

jcohenadad merged commit 637d0f2 into master Apr 5, 2020

jcohenadad deleted the 2020-03-07_2330_parallel-qc-crash branch April 5, 2020 01:06

cfhammill mentioned this pull request May 2, 2020

test_many_qc: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #2683

Closed

jcohenadad mentioned this pull request Aug 25, 2020

Crash upon folder creation if already exists #2330

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test for parallel qc crash #2628

Test for parallel qc crash #2628

cfhammill commented Mar 7, 2020

jcohenadad commented Mar 7, 2020 •

edited

Loading

cfhammill commented Mar 8, 2020 •

edited

Loading

jcohenadad commented Mar 9, 2020

cfhammill commented Mar 10, 2020

jcohenadad commented Mar 10, 2020 •

edited

Loading

Test for parallel qc crash #2628

Test for parallel qc crash #2628

Conversation

cfhammill commented Mar 7, 2020

jcohenadad commented Mar 7, 2020 • edited Loading

cfhammill commented Mar 8, 2020 • edited Loading

jcohenadad commented Mar 9, 2020

cfhammill commented Mar 10, 2020

jcohenadad commented Mar 10, 2020 • edited Loading

jcohenadad commented Mar 7, 2020 •

edited

Loading

cfhammill commented Mar 8, 2020 •

edited

Loading

jcohenadad commented Mar 10, 2020 •

edited

Loading