-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests with multiprocessing
hang on Python 3.8 on macOS
#11835
Comments
So much fun, I'll ping again on the CPython PR that seems to have introduced the issue. |
@larsoner @rgommers, there's something fishy going on here.
If I do a I tried to see if it was because a dummy Pool was being used, so I made it into a real Pool. Then I had a complaint about pickling the lambda, but when I made it into a regular function the cutdown |
I can go into (this is with Python 3.8.2 on macOS). Also pinging @pv and @peterbell10 , because I think we've added too many skips for tests with multiprocessing (thinking it's Python's/MapWrapper's/"using 'spawn' on macOS" fault), when I think the fault lies somewhere in |
--> multiprocessing with spawn method is compatible with pytest (I've just done it), just not how we test it. |
From the python issue it seems like if/when del is called can matter, which in turn can depend on garbage collection timing. So I'm not sure running an individual test necessarily rules in or out that test as the problem unless we check deletion states. Maybe one option is to atexit.register the closing of the pool to ensure the threads are dealt with properly |
And by "the pool" I mean the one created by quad_vec when "workers" is supplied. Basically we need to ensure that it's closed, and just relying on garbage collection is not safe enough |
I can confirm that my MWE still fails on my system (running
But modifying it with
So I think doing this sort of thing wherever we create a |
Actually it looks like |
I've just been trying to investigate this. In most of our test cases either |
There is a test in To check if it's the scipy
When I try to run this via when the test is:
there is no hang when run via So:
ergo it's how we use pytest in |
Well... |
That's very weird. Compare to the documentation:
Are you using the |
It's not the first time, and it certainly won't be the last, that documentation doesn't keep up. The story goes as follows... On a GUI app I'm developing whenever I started multiprocessing the GUI would start generating a cascade of replicas of the main window. On investigating further the suggestion was that freeze_support was what was needed, because the script was doing something like recursively importing the main script at which point another window would appear. When I was interrupting the hung runtests.py yesterday at some point I was seeing a stack trace where there was an attempt to do a recursive import of the runtests.py script. The recursion struck a bell, so I tried using freeze_support and then everything worked. The Windows thing in the Python documentation may be referring to programs that spawn rather than fork. macOS+Python<3.8 used to use fork, now on Python3.8<= spawn is default. I'm not using |
Looking at the implementation for |
Pure speculation, but is it possible that adding |
@peterbell10, you're correct. Just tested out on local machine.
It's a bit crufty that just the presence of the import is enough... |
* the `setup.py` MacOS Python `3.8` CI job is starting to fail with timeouts for multiprocessing `Pool`-related code in recent PRs like scipygh-17777 and scipygh-17829 * nothing stands out in the build logs when I do a side-by-side diff, and the GHA job history seems to suggest the failure doesn't happen every time the CI is flushed, but perhaps more often lately * make a few cleanups here in attempt to fix * first off, we can simplify the Pythran config--there's only one job left in this file after the splitting with `meson`, and it is Python `3.8` (`3.8` support may be on the way out too, but I'm assuming we delay that a tiny bit more perhaps) * previous discussions like scipy#11835 with MacOS Python 3.8 multiprocessing Pool hangs were complicated, but sometimes using `runtests.py` (which is also on the way out I think..) contributed to hangs b/c of multiprocessing import orders. So, try to use `pytest` directly instead to see if it helps * I suspect these changes are "just fine" to apply even if we're going to switch to 3.9 minimum sooner than later
* the `setup.py` MacOS Python `3.8` CI job is starting to fail with timeouts for `multiprocessing` `Pool`-related code in recent PRs like scipygh-17777 and scipygh-17829 * nothing stands out in the build logs when I do a side-by-side diff vs. succeeding versions of the job, and the GHA job history seems to suggest the failure doesn't happen every time the CI is flushed, but perhaps more often lately * make a few cleanups here in attempt to fix - previous discussions like scipygh-11835 with MacOS Python 3.8 `multiprocessing` `Pool` hangs were complicated, but sometimes using `runtests.py` contributed to hangs b/c of `multiprocessing` import orders. So, try to use `pytest` directly instead to see if it helps. - we can simplify the Pythran config--there's only one job left in this file after the splitting with `meson`, and it is Python `3.8` - I've also extended the timeout limit a little bit, and since GHA MacOS runners have 3 cores I've tried using all 3 for the testsuite here because I once had the job pass on my fork during testing, but it took 66 minutes: #65 * I think there's still something a bit weird here, so it is a combination of measures that seemed to help a bit on my fork but perhaps long-term the removal of Python 3.8 support along with sunsetting `runtests.py` means that this mess may disappear before we need to dig much deeper [skip azp] [skip circle] [skip cirrus]
* the `setup.py` MacOS Python `3.8` CI job is starting to fail with timeouts for `multiprocessing` `Pool`-related code in recent PRs like scipygh-17777 and scipygh-17829 * nothing stands out in the build logs when I do a side-by-side diff vs. succeeding versions of the job, and the GHA job history seems to suggest the failure doesn't happen every time the CI is flushed, but perhaps more often lately * make a few cleanups here in attempt to fix - previous discussions like scipygh-11835 with MacOS Python 3.8 `multiprocessing` `Pool` hangs were complicated, but sometimes using `runtests.py` contributed to hangs b/c of `multiprocessing` import orders. So, try to use `pytest` directly instead to see if it helps. - we can simplify the Pythran config--there's only one job left in this file after the splitting with `meson`, and it is Python `3.8` - I've also extended the timeout limit a little bit, and since GHA MacOS runners have 3 cores I've tried using all 3 for the testsuite here because I once had the job pass on my fork during testing, but it took 66 minutes: #65 * I think there's still something a bit weird here, so it is a combination of measures that seemed to help a bit on my fork but perhaps long-term the removal of Python 3.8 support along with sunsetting `runtests.py` means that this mess may disappear before we need to dig much deeper [skip azp] [skip circle] [skip cirrus]
I'm running the whole test suite on macOS on Python 3.8 (via Github Actions on my local machine (Catalina)). I'm seeing several issues with multiprocessing, such as #11827.
They're related to https://bugs.python.org/issue38501. The tests that hang are:
integrate.tests.test__quad_vec.test_quad_vec_pool
I think the best thing to do at the moment is to use the skipif decorator (possibly also coupled to Darwin):
The text was updated successfully, but these errors were encountered: