Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Test failed in macOS Monterey due to a bug in default multiprocessing module #22203

Closed
vxst opened this issue Sep 5, 2022 · 1 comment
Closed
Labels

Comments

@vxst
Copy link
Contributor

vxst commented Sep 5, 2022

Describe the issue:

In a full test,(numpy.test('full')), using pthread model(the default) with Darwin kernel:

lib/tests/test_io.py::TestSaveTxt::test_large_zip test is FAILED due to an AttributeError.

FAILED  - AttributeError: Can't pickle local object 'TestSaveTxt.test_large_zip.<locals>.check_large_zip'

The failed function is:

    def dump(obj, file, protocol=None):
        '''Replacement for pickle.dump() using ForkingPickler.'''
>       ForkingPickler(file, protocol).dump(obj)
E       AttributeError: Can't pickle local object 'TestSaveTxt.test_large_zip.<locals>.check_large_zip'

file       = <_io.BytesIO object at 0x16fc39ae0>
obj        = <Process name='Process-1' parent=1569 initial>
protocol   = None

/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py:60: AttributeError

The stack trace at multiprocessing is

/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py:121: in start
    self._popen = self._Popen(self)
        self       = <Process name='Process-1' parent=1569 initial>
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py:224: in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
        process_obj = <Process name='Process-1' parent=1569 initial>
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py:284: in _Popen
    return Popen(process_obj)
        Popen      = <class 'multiprocessing.popen_spawn_posix.Popen'>
        process_obj = <Process name='Process-1' parent=1569 initial>
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py:32: in __init__
    super().__init__(process_obj)
        __class__  = <class 'multiprocessing.popen_spawn_posix.Popen'>
        process_obj = <Process name='Process-1' parent=1569 initial>
        self       = <multiprocessing.popen_spawn_posix.Popen object at 0x16fbbb070>
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py:19: in __init__
    self._launch(process_obj)
        process_obj = <Process name='Process-1' parent=1569 initial>
        self       = <multiprocessing.popen_spawn_posix.Popen object at 0x16fbbb070>
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py:47: in _launch
    reduction.dump(process_obj, fp)
        fp         = <_io.BytesIO object at 0x16fc39ae0>
        prep_data  = {'authkey': b'\xe7\xf6K\x8ceo\xe6se\x82\x11#q\xa9\xde\x19SX\x917\xf0M\x07\xa9-\xfd\x15\xa8~\xc5\x98\x14', 'dir': '/Use...s', 'init_main_from_path': '/Us
ers/vxst/vxst_env_verified/py39_sci_test/tests/0_numpy.py', 'log_to_stderr': False, ...}
        process_obj = <Process name='Process-1' parent=1569 initial>
        resource_tracker = <module 'multiprocessing.resource_tracker' from '/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/m
ultiprocessing/resource_tracker.py'>
        self       = <multiprocessing.popen_spawn_posix.Popen object at 0x16fbbb070>
        tracker_fd = 19

Test hardwares include M1 Max with 32 core GPU and 64GB memory and M1 with 16GB memory.

Test python version include cpython 3.8 and cpython 3.9.

Four configurations all fail in the same manner.

Reproduce the code example:

py.test numpy/lib/tests/test_io.py

Error message:

../../../Library/Python/3.9/lib/python/site-packages/numpy/lib/tests/test_io.py:598:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py:121: in start
    self._popen = self._Popen(self)
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py:224: in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py:284: in _Popen
    return Popen(process_obj)
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py:32: in __init__
    super().__init__(process_obj)
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py:19: in __init__
    self._launch(process_obj)
/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py:47: in _launch
    reduction.dump(process_obj, fp)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

obj = <Process name='Process-1' parent=38364 initial>, file = <_io.BytesIO object at 0x1055a08b0>, protocol = None

    def dump(obj, file, protocol=None):
        '''Replacement for pickle.dump() using ForkingPickler.'''
>       ForkingPickler(file, protocol).dump(obj)
E       AttributeError: Can't pickle local object 'TestSaveTxt.test_large_zip.<locals>.check_large_zip'

/opt/homebrew/Cellar/python@3.9/3.9.13_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py:60: AttributeError

NumPy/Python version information:

1.21.6 at both Python 3.8 and Python 3.9 environments, built by homebrew.

1.21.6 3.9.13 (main, Aug 28 2022, 18:56:22)
[Clang 13.1.6 (clang-1316.0.21.2.5)]

and

1.21.6 3.8.13 (default, Aug 28 2022, 18:53:35)
[Clang 13.1.6 (clang-1316.0.21.2.5)]

The system uname is:

Darwin Mac 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000 arm64 arm Darwin

Context for the issue:

It cause numpy.test('full') to fail on macOS.

@vxst vxst added the 00 - Bug label Sep 5, 2022
vxst added a commit to vxst/numpy that referenced this issue Sep 5, 2022
@vxst
Copy link
Contributor Author

vxst commented Sep 5, 2022

The reason behind the failure is since Python 3.8, the default start method for multiprocessing has been changed from 'fork' to 'spawn' on macOS. The default start method is still 'fork' on Linux. It will cause a memory-sharing problem for the test "test_large_zip" on macOS. The fix is to change the start method for this context back to 'fork' so that all platforms will have the same memory sharing model and thus fix the failed test.

vxst added a commit to vxst/numpy that referenced this issue Sep 5, 2022
@seberg seberg closed this as completed in 5e9ec76 Sep 7, 2022
charris pushed a commit to charris/numpy that referenced this issue Sep 7, 2022
Since Python 3.8, the default start method for multiprocessing has been changed from fork to spawn on macOS
The default start method is still fork on other Unix platforms[1], causing inconsistency on memory sharing model
It will cause a memory-sharing problem for the test test_large_zip on macOS as the memory sharing model between spawn and fork is different
The fix

Change the start method for this test back to fork under this testcase context
In this test case context, the bug that caused default start method changed to spawn for macOS will not be triggered
It is context limited, so this change will not affect default start method other than test_large_zip
All platforms have the same memory sharing model now
After the change, test_large_zip is passed on macOS
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Closes numpygh-22203
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant