tests with `multiprocessing` hang on Python 3.8 on macOS #11835

andyfaff · 2020-04-10T03:57:19Z

I'm running the whole test suite on macOS on Python 3.8 (via Github Actions on my local machine (Catalina)). I'm seeing several issues with multiprocessing, such as #11827.

They're related to https://bugs.python.org/issue38501. The tests that hang are:

integrate.tests.test__quad_vec.test_quad_vec_pool

I think the best thing to do at the moment is to use the skipif decorator (possibly also coupled to Darwin):

    @pytest.mark.skipif(multiprocessing.get_start_method() != 'fork',
                        reason=('multiprocessing with spawn method is not'
                                ' compatible with pytest.'))

The text was updated successfully, but these errors were encountered:

rgommers · 2020-04-10T10:59:55Z

So much fun, I'll ping again on the CPython PR that seems to have introduced the issue.

andyfaff · 2020-04-10T23:04:49Z

@larsoner @rgommers, there's something fishy going on here.
I copied the failing test into a separate file, test_vec.py:

import numpy as np
from numpy.testing import assert_allclose
from scipy.integrate import quad_vec


def _lorenzian(x):
    return 1 / (1 + x**2)


def test_quad_vec_pool():
    from multiprocessing.dummy import Pool

    f = _lorenzian
    res, err = quad_vec(f, -np.inf, np.inf, norm='max', epsabs=1e-4, workers=4)
    assert_allclose(res, np.pi, rtol=0, atol=1e-4)

    with Pool(10) as pool:
        f = lambda x: 1 / (1 + x**2)
        res, err = quad_vec(f, -np.inf, np.inf, norm='max', epsabs=1e-4, workers=pool.map)
        assert_allclose(res, np.pi, rtol=0, atol=1e-4)

If I do a pytest test_vec.py it passes. This is the same test for which I see a failure if I run the scipy tests via python runtests.py.

I tried to see if it was because a dummy Pool was being used, so I made it into a real Pool. Then I had a complaint about pickling the lambda, but when I made it into a regular function the cutdown test_vec.py still passed. Instead of the problem being in Python, I'm wondering if there's some other problem with our pytest and runtests.py setup.

andyfaff · 2020-04-10T23:25:54Z

I can go into optimize.tests.test__differential_evolution, remove the known fails (which are supposed to be 'due to cleaning up MapWrapper on Python 3.8'), and that individual test file (after moving it somewhere else) runs to completion without issue if I call it via pytest test__differential_evolution.py.

(this is with Python 3.8.2 on macOS).

Also pinging @pv and @peterbell10 , because I think we've added too many skips for tests with multiprocessing (thinking it's Python's/MapWrapper's/"using 'spawn' on macOS" fault), when I think the fault lies somewhere in runtests.py or pytest

andyfaff · 2020-04-10T23:33:14Z

test_mapwrapper_parallel in scipy._lib.test__util also works on macOS + Python3.8, which uses 'spawn' by default.

--> multiprocessing with spawn method is compatible with pytest (I've just done it), just not how we test it.

larsoner · 2020-04-10T23:36:29Z

From the python issue it seems like if/when del is called can matter, which in turn can depend on garbage collection timing. So I'm not sure running an individual test necessarily rules in or out that test as the problem unless we check deletion states.

Maybe one option is to atexit.register the closing of the pool to ensure the threads are dealt with properly

larsoner · 2020-04-10T23:38:47Z

And by "the pool" I mean the one created by quad_vec when "workers" is supplied. Basically we need to ensure that it's closed, and just relying on garbage collection is not safe enough

larsoner · 2020-04-11T00:46:43Z

I can confirm that my MWE still fails on my system (running python whatever.py just hangs instead of exiting):

from multiprocessing import Pool

class A(object):
    def __init__(self):
        self.pool = Pool(processes=2)

solver = A()

But modifying it with atexit fixes things (python whatever.py runs and exits without issue):

import atexit
from multiprocessing import Pool

class A(object):
    def __init__(self):
        self.pool = Pool(processes=2)
        atexit.register(self.pool.close)

solver = A()

So I think doing this sort of thing wherever we create a Pool (without having a context manager to automatically close it for us) is a good idea.

larsoner · 2020-04-11T01:16:09Z

Actually it looks like MapWrapper if used in a context manager should already close the pool, and that quad_vec uses it in this way (?). So maybe this isn't the same bug...

andyfaff · 2020-04-11T01:18:53Z

I've just been trying to investigate this. In most of our test cases either MapWrapper or Pool is being used in a situation where the resources are being closed.

andyfaff · 2020-04-11T02:01:33Z

There is a test in _lib.tests.test__util called test_mapwrapper_parallel. It's currently got a skipif decorator on it. This is because the test hangs on Python 3.8 (macOS) if 'spawn' is used. Note that the MapWrapper implementation does attempt to clear up all resources.

To check if it's the scipy MapWrapperimplementation being faulty I then created another test function that uses multiprocessing directly:

# place in test__util.py
def test_pool():
    with Pool(2) as p:
        p.map(np.sin, [1, 2, 3, 4])

When I try to run this via runtests.py the test stalls as well (i.e. no fault with MapWrapper). When I run it in an interpreter (from scipy._lib.tests import test__util; test__util.test_pool()), or run it directly via pytest XXXX there is no hang.

when the test is:

def test_pool():
    with Pool(2) as p:
        pass

there is no hang when run via runtests.py.

So:

the hang is in the map call of a Pool object, not in the creation or destruction of resources.
the hang only occurs if a test uses multiprocessing, the default start method is 'spawn' (Python 3.8 and darwin), and the test is started via runtests.py.
if runtests.py isn't used the tests pass.

ergo it's how we use pytest in runtests.py

andyfaff · 2020-04-11T05:13:08Z

Well...
It seems that if I add multiprocessing.freeze_support() in runtests.py.__main__, then the multiprocessing hangs don't occur, see #11836

peterbell10 · 2020-04-11T13:23:29Z

It seems that if I add multiprocessing.freeze_support() in runtests.py.__main__, then the multiprocessing hangs don't occur

That's very weird. Compare to the documentation:

Calling freeze_support() has no effect when invoked on any operating system other than Windows.

it's how we use pytest in runtests.py

Are you using the -n flag with runtests.py? That runs the tests in a subprocesses so could be relevant.

andyfaff · 2020-04-11T23:35:47Z

It's not the first time, and it certainly won't be the last, that documentation doesn't keep up.

The story goes as follows... On a GUI app I'm developing whenever I started multiprocessing the GUI would start generating a cascade of replicas of the main window. On investigating further the suggestion was that freeze_support was what was needed, because the script was doing something like recursively importing the main script at which point another window would appear. freeze_support fixed the issue.

When I was interrupting the hung runtests.py yesterday at some point I was seeing a stack trace where there was an attempt to do a recursive import of the runtests.py script. The recursion struck a bell, so I tried using freeze_support and then everything worked.

The Windows thing in the Python documentation may be referring to programs that spawn rather than fork. macOS+Python<3.8 used to use fork, now on Python3.8<= spawn is default.

I'm not using -n.

peterbell10 · 2020-04-12T17:35:07Z

The Windows thing in the Python documentation may be referring to programs that spawn rather than fork. macOS+Python<3.8 used to use fork, now on Python3.8<= spawn is default

Looking at the implementation for freeze_support, it literally does nothing if sys.platform != 'win32':
https://github.com/python/cpython/blob/61511488cf4e7a1cb57a38efba7e0a84a387fe58/Lib/multiprocessing/context.py#L148

peterbell10 · 2020-04-12T17:54:00Z

Pure speculation, but is it possible that adding import multiprocessing to runtests.py was what made the difference? It might change how the python module is initialised on the spawned process.

andyfaff · 2020-04-12T22:24:53Z

@peterbell10, you're correct. Just tested out on local machine.

just the presence of import multiprocessing is enough to prevent hanging (freeze_support not required).
the tests don't hang if there's no import of multiprocessing and runtests.py uses parallelisation, I used -j 4.

It's a bit crufty that just the presence of the import is enough...

See scipy/scipy#11835

* the `setup.py` MacOS Python `3.8` CI job is starting to fail with timeouts for multiprocessing `Pool`-related code in recent PRs like scipygh-17777 and scipygh-17829 * nothing stands out in the build logs when I do a side-by-side diff, and the GHA job history seems to suggest the failure doesn't happen every time the CI is flushed, but perhaps more often lately * make a few cleanups here in attempt to fix * first off, we can simplify the Pythran config--there's only one job left in this file after the splitting with `meson`, and it is Python `3.8` (`3.8` support may be on the way out too, but I'm assuming we delay that a tiny bit more perhaps) * previous discussions like scipy#11835 with MacOS Python 3.8 multiprocessing Pool hangs were complicated, but sometimes using `runtests.py` (which is also on the way out I think..) contributed to hangs b/c of multiprocessing import orders. So, try to use `pytest` directly instead to see if it helps * I suspect these changes are "just fine" to apply even if we're going to switch to 3.9 minimum sooner than later

* the `setup.py` MacOS Python `3.8` CI job is starting to fail with timeouts for `multiprocessing` `Pool`-related code in recent PRs like scipygh-17777 and scipygh-17829 * nothing stands out in the build logs when I do a side-by-side diff vs. succeeding versions of the job, and the GHA job history seems to suggest the failure doesn't happen every time the CI is flushed, but perhaps more often lately * make a few cleanups here in attempt to fix - previous discussions like scipygh-11835 with MacOS Python 3.8 `multiprocessing` `Pool` hangs were complicated, but sometimes using `runtests.py` contributed to hangs b/c of `multiprocessing` import orders. So, try to use `pytest` directly instead to see if it helps. - we can simplify the Pythran config--there's only one job left in this file after the splitting with `meson`, and it is Python `3.8` - I've also extended the timeout limit a little bit, and since GHA MacOS runners have 3 cores I've tried using all 3 for the testsuite here because I once had the job pass on my fork during testing, but it took 66 minutes: #65 * I think there's still something a bit weird here, so it is a combination of measures that seemed to help a bit on my fork but perhaps long-term the removal of Python 3.8 support along with sunsetting `runtests.py` means that this mess may disappear before we need to dig much deeper [skip azp] [skip circle] [skip cirrus]

andyfaff added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Apr 10, 2020

andyfaff mentioned this issue Apr 10, 2020

MAINT: re-introduce multiprocessing tests on Python3.8 #11836

Merged

rgommers mentioned this issue Apr 10, 2020

bpo-35493: Use Process.sentinel instead of sleeping for polling worker status in multiprocessing.Pool python/cpython#11488

Merged

andyfaff mentioned this issue Apr 11, 2020

CI: add github actions for macOS #11822

Merged

peterbell10 closed this as completed in #11836 Apr 13, 2020

tylerjereddy added this to the 1.5.0 milestone Apr 13, 2020

Hexcles mentioned this issue Aug 5, 2020

Python unit tests using multiprocessing fail/hang on macOS + Python 3.8 web-platform-tests/wpt#24880

Closed

effigies mentioned this issue Dec 1, 2020

Restore multiproc tests in Python 3.8+ nipy/nipype#3279

Open

Miffyli mentioned this issue Jul 20, 2021

Multiple instance hang infinitely within pytest Farama-Foundation/ViZDoom#487

Closed

rgommers mentioned this issue Nov 5, 2021

TST, BUG (?): windows failure for integrate/tests/test__quad_vec.py::test_quad_vec_pool #14985

Closed

marinafridman mentioned this issue Nov 29, 2021

RunTime Error on MacOS with Python 3.9 YYuanAnyVision/mxnet_mtcnn_face_detection#58

Closed

rht mentioned this issue Mar 20, 2022

ci: Add 6 min timeout for the test jobs projectmesa/mesa#1194

Merged

rht added a commit to rht/mesa that referenced this issue Mar 21, 2022

test: Explicitly import multiprocessing

4ce45fe

See scipy/scipy#11835

rht mentioned this issue Mar 21, 2022

test: Explicitly import multiprocessing to prevent batch_run hang projectmesa/mesa#1198

Closed

rht added a commit to rht/mesa that referenced this issue Mar 21, 2022

test: Explicitly import multiprocessing

90297fd

See scipy/scipy#11835

rht added a commit to rht/mesa that referenced this issue Mar 21, 2022

test: Explicitly import multiprocessing to prevent batch_run hang

4e9ba6c

See scipy/scipy#11835

tylerjereddy mentioned this issue Jan 20, 2023

MAINT: GHA MacOS setup.py update tylerjereddy/scipy#65

Closed

tylerjereddy mentioned this issue Jan 21, 2023

MAINT, CI: GHA MacOS setup.py update #17831

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests with `multiprocessing` hang on Python 3.8 on macOS #11835

tests with `multiprocessing` hang on Python 3.8 on macOS #11835

andyfaff commented Apr 10, 2020 •

edited

Loading

rgommers commented Apr 10, 2020

andyfaff commented Apr 10, 2020

andyfaff commented Apr 10, 2020

andyfaff commented Apr 10, 2020

larsoner commented Apr 10, 2020

larsoner commented Apr 10, 2020

larsoner commented Apr 11, 2020

larsoner commented Apr 11, 2020

andyfaff commented Apr 11, 2020

andyfaff commented Apr 11, 2020 •

edited

Loading

andyfaff commented Apr 11, 2020 •

edited

Loading

peterbell10 commented Apr 11, 2020 •

edited

Loading

andyfaff commented Apr 11, 2020

peterbell10 commented Apr 12, 2020

peterbell10 commented Apr 12, 2020

andyfaff commented Apr 12, 2020

tests with multiprocessing hang on Python 3.8 on macOS #11835

tests with multiprocessing hang on Python 3.8 on macOS #11835

Comments

andyfaff commented Apr 10, 2020 • edited Loading

rgommers commented Apr 10, 2020

andyfaff commented Apr 10, 2020

andyfaff commented Apr 10, 2020

andyfaff commented Apr 10, 2020

larsoner commented Apr 10, 2020

larsoner commented Apr 10, 2020

larsoner commented Apr 11, 2020

larsoner commented Apr 11, 2020

andyfaff commented Apr 11, 2020

andyfaff commented Apr 11, 2020 • edited Loading

andyfaff commented Apr 11, 2020 • edited Loading

peterbell10 commented Apr 11, 2020 • edited Loading

andyfaff commented Apr 11, 2020

peterbell10 commented Apr 12, 2020

peterbell10 commented Apr 12, 2020

andyfaff commented Apr 12, 2020

tests with `multiprocessing` hang on Python 3.8 on macOS #11835

tests with `multiprocessing` hang on Python 3.8 on macOS #11835

andyfaff commented Apr 10, 2020 •

edited

Loading

andyfaff commented Apr 11, 2020 •

edited

Loading

andyfaff commented Apr 11, 2020 •

edited

Loading

peterbell10 commented Apr 11, 2020 •

edited

Loading