Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python crashes on macOS after fork with no exec #77906

Closed
kapilt mannequin opened this issue Jun 1, 2018 · 70 comments
Closed

Python crashes on macOS after fork with no exec #77906

kapilt mannequin opened this issue Jun 1, 2018 · 70 comments
Labels
3.8 OS-mac type-crash

Comments

@kapilt
Copy link
Mannequin

@kapilt kapilt mannequin commented Jun 1, 2018

BPO 33725
Nosy @ronaldoussoren, @ned-deily
PRs
  • #11043
  • #11044
  • #11045
  • #13603
  • #13626
  • #13841
  • #13849
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-05-29.18:09:22.220>
    created_at = <Date 2018-06-01.00:53:06.418>
    labels = ['OS-mac', '3.8', 'type-crash']
    title = 'Python crashes on macOS after fork with no exec'
    updated_at = <Date 2021-11-04.14:32:41.053>
    user = 'https://github.com/kapilt'

    bugs.python.org fields:

    activity = <Date 2021-11-04.14:32:41.053>
    actor = 'eryksun'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-05-29.18:09:22.220>
    closer = 'barry'
    components = ['macOS']
    creation = <Date 2018-06-01.00:53:06.418>
    creator = 'kapilt'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33725
    keywords = ['patch']
    message_count = 70.0
    messages = ['318352', '318361', '318396', '318397', '318528', '318529', '318708', '329871', '329880', '329885', '329919', '329922', '329923', '329926', '329927', '329933', '329941', '331101', '331406', '331407', '331409', '331411', '331435', '331438', '331459', '331610', '331733', '331735', '337587', '337591', '337733', '338819', '338873', '341452', '341455', '341475', '342042', '342071', '342412', '343704', '343773', '343779', '343782', '343807', '343826', '343828', '343830', '343832', '343833', '343838', '343841', '343842', '343844', '343895', '343898', '344590', '344608', '344710', '344762', '344763', '345841', '365249', '365251', '365252', '365262', '365263', '365266', '365281', '370296', '370331']
    nosy_count = 2.0
    nosy_names = ['ronaldoussoren', 'ned.deily']
    pr_nums = ['11043', '11044', '11045', '13603', '13626', '13841', '13849']
    priority = 'critical'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue33725'
    versions = ['Python 3.8']

    @kapilt
    Copy link
    Mannequin Author

    @kapilt kapilt mannequin commented Jun 1, 2018

    This issue seems to be reported a few times on various githubs projects. I've also reproduced using a brew install of python 2.7.15. I haven't been able to reproduce with python 3.6. Note this requires a framework build of python.

    Background on the underlying issue cause due to a change in high Sierra
    http://sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html
    A ruby perspective on the same issue exhibiting for some apps
    https://blog.phusion.nl/2017/10/13/why-ruby-app-servers-break-on-macos-high-sierra-and-what-can-be-done-about-it/

    The work around seems to be setting an environment variable OBJC_DISABLE_INITIALIZE_FORK_SAFETY prior to executing python.

    Other reports

    https://bugs.python.org/issue30837
    ansible/ansible#32499
    imWildCat/scylla#22
    elastic/beats-tester#73
    jhaals/ansible-vault#60

    @kapilt kapilt mannequin added the OS-mac label Jun 1, 2018
    @ronaldoussoren
    Copy link
    Contributor

    @ronaldoussoren ronaldoussoren commented Jun 1, 2018

    A better solution is to avoid using fork mode for multiprocessing. The spawn and fork server modes should work fine.

    The underlying problem is that macOS system frameworks (basically anything higher level than libc) are not save wrt fork(2) and fixing that appears to have no priority at all at Apple.

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Jun 1, 2018

    (As a side note, the macOS Pythons provided by python.org installers should not behave differently on macOS 10.13 High Sierra since none of them are built with a 10.13 SDK.)

    @pitrou
    Copy link
    Member

    @pitrou pitrou commented Jun 1, 2018

    I understand that Apple, with their limited resources, cannot spend expensive engineer manpower on improving POSIX support in macOS </snark>.

    In any case, I'm unsure this bug can be fixed at the Python level. If macOS APIs don't like fork(), they don't like fork(), point bar. As Ronald says, on 3.x you should use "forkserver" (for multiple reasons, not only this issue). On 2.7 you're stuck dealing with the issue by yourself.

    @ronaldoussoren
    Copy link
    Contributor

    @ronaldoussoren ronaldoussoren commented Jun 3, 2018

    Antoine, the issue is not necessarily related to POSIX compliance, AFAIK strictly POSIX compliant code should work just fine. The problem is in higher-level APIs (CoreFoundation, Foundation, AppKit, ...), and appears to be related to using multi-threading in those libraries without spending effort on pre/post fork handlers to ensure that new processes are in a sane state after fork(). In older macOS versions this could result in hard to debug issues, in newer versions APIs seem to guard against this by aborting when the detect that the pid changed.

    Anyways... I agree that we shouldn't try to work around this in CPython, there's bound to more problems that are hidden with the proposed workaround.

    ---

    <http://www.sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html\> describes what the environment variable does, and this "just" changes behavior of the ObjC runtime, and doesn't make using macOS system frameworks after a fork saver.

    @ronaldoussoren
    Copy link
    Contributor

    @ronaldoussoren ronaldoussoren commented Jun 3, 2018

    @ned: In the long run the macOS installers should be build using the latest SDK, primarily to get full API coverage and access to all system APIs.

    AFAIK building using the macOS 10.9 SDK still excludes a number of libSystem APIs that would be made available through the posix module when building with a newer SDK.

    That's something that would require some effort though to ensure that the resulting binary still works on older versions of macOS (basically similar to the work I've done in the post to weak link some other symbols in the posix module).

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Jun 4, 2018

    {Note: this is not particularly relevant to the issue here.)

    Ronald:

    In the long run the macOS installers should be build using the latest SDK [...] That's something that would require some effort though to ensure that the resulting binary still works on older versions of macOS

    I agree that being able to build with the latest SDK would be nice but it's also true it would require effort on our part, both one-time and ongoing, at least for every new macOS SDK release and update to test with each older system. It would also require that the third-party libraries we build for an installer also behave correctly. And to make full use of it, third-party Python packages with extension modules would also need to behave correctly. I see one of the primary use cases for the python.org macOS installers as being for Python app developers who want to provide apps that run on a range of macOS releases. It seems to me that the safest and simplest way to guarantee that python.org macOS Pythons fulfill that need is to continue to always build them on the oldest supported system. Yes, that means that users may miss out on a few features only supported on the more recent macOS releases but I think that's the right trade-off until we have the resources to truly investigate and decide to support weak linking from current systems.

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented Nov 13, 2018

    bpo-35219 is where I've run into this problem. I'm still trying to figure out all the details in my own case, but I can confirm that setting the environment variable does not always help.

    @warsaw warsaw changed the title High Sierra hang when using multi-processing macOS crashes after fork with no exec Nov 13, 2018
    @warsaw warsaw changed the title macOS crashes after fork with no exec Pytho crashes on macOS after fork with no exec Nov 13, 2018
    @warsaw warsaw changed the title Pytho crashes on macOS after fork with no exec Python crashes on macOS after fork with no exec Nov 14, 2018
    @warsaw
    Copy link
    Member

    @warsaw warsaw commented Nov 14, 2018

    Hoo boy. I'm not sure I have the full picture, but things are starting to come into focus. After much debugging, I've narrowed down at least one crash to urllib.request.getproxies(). On macOS (darwin), this ends up calling _scproxy.get_proxies() which calls into the SystemConfiguration framework. I'll bet dollars to donuts that that calls into the ObjC runtime. Thus it is unsafe to call between fork and exec. This certainly seems to be the case even if the environment variable is set.

    The problem is that I think requests.post() probably also ends up in here somehow (still untraced), because by removing our call to urllib.requests.getproxies(), we just crash later on when requests.post() is called.

    I don't know what, if anything can be done in Python, except perhaps to document that anything that calls into the ObjC runtime between fork and exec can potentially crash the subprocess.

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented Nov 14, 2018

    A few other things I don't understand:

    • Why does setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES only seem to work when it's set in the shell before the parent process executes? AFAICT, it does *not* work if you set that in os.environ in the parent process before the os.fork().

    • Why does it only crash on the first invocation of our app? Does getproxies() cache the results somehow? There's too much internal application code in the way to know if we're doing something that prevents getproxies() from getting called in subsequent calls.

    • I can't seem to produce a smaller test case.

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented Nov 14, 2018

    FWIW, I suspect that setting the environment variable only helps if it's done before the process starts. You cannot set it before the fork and have it affect the child.

    @applio
    Copy link
    Member

    @applio applio commented Nov 14, 2018

    Barry's effort as well as comments in other links seem to all suggest that OBJC_DISABLE_INITIALIZE_FORK_SAFETY is not comprehensive in its ability to make other threads "safe" before forking.

    "Objective-C classes defined by the OS frameworks remain fork-unsafe" (from @kapilt's first link) suggests we furthermore remain at risk using certain MacOS system libraries prior to any call to fork.

    "To guarantee that forking is safe, the application must not be running any threads at the point of fork" (from @kapilt's second link) is an old truth that we continue to fight with even when we know very well that it's the truth.

    For newly developed code, we have the alternative to employ spawn instead of fork to avoid these problems in Python, C, Ruby, etc. For existing legacy code that employed fork and now surprises us by failing-fast on MacOS 10.13 and 10.14, it seems we are forced to face a technical debt incurred back when the choice was first made to spin up threads and afterwards to use fork.

    If we didn't already have an "obvious" (zen of Python) way to avoid such problems with spawn versus fork, I would feel this was something to solve in Python. As to helping the poor unfortunate souls who must fight the good fight with legacy code, I am not sure what to do to help though I would like to be able to help.

    @pitrou
    Copy link
    Member

    @pitrou pitrou commented Nov 14, 2018

    Legacy code is easy to migrate as long as it uses Python 3. Just call

      mp.set_start_method('forkserver')

    at the top of your code and you're done. Some use cases may fail (if sharing non-picklable types), but they're probably not very common.

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Nov 14, 2018

    _scproxy has been known to be problematic for some time, see for instance bpo-31818. That issue also gives a simple workaround: setting urllib's "no_proxy" environment variable to "*" will prevent the calls to the System Configuration framework.

    @applio
    Copy link
    Member

    @applio applio commented Nov 14, 2018

    Given the original post mentioned 2.7.15, I wonder if it is feasible to fork near the beginning of execution, then maintain and pass around a multiprocessing.Pool to be used when needed instead of dynamically forking? Working with legacy code is almost always more interesting than you want it to be.

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented Nov 14, 2018

    On Nov 14, 2018, at 10:11, Davin Potts <report@bugs.python.org> wrote:

    Davin Potts <python@discontinuity.net> added the comment:

    Barry's effort as well as comments in other links seem to all suggest that OBJC_DISABLE_INITIALIZE_FORK_SAFETY is not comprehensive in its ability to make other threads "safe" before forking.

    Right. Setting the env var will definitely not make it thread safe. My understanding (please correct me if I’m wrong!) isn’t that this env var makes it safe, just that it prevents the ObjC runtime from core dumping. So it’s still up to the developer to know whether threads are involved or not. In our cases, these are single threaded applications. I’ve read elsewhere that ObjC doesn’t care if threads have actually been spun up or not.

    "Objective-C classes defined by the OS frameworks remain fork-unsafe" (from @kapilt's first link) suggests we furthermore remain at risk using certain MacOS system libraries prior to any call to fork.

    Actually, it’s unsafe to call anything between fork and exec. Note that this doesn’t just affect Python; this is a pretty common idiom in other scripting languages too, from what I can tell. It’s certainly very common in Python.

    Note too that urllib.request.getproxies() will end up calling into the ObjC runtime via _scproxy, so you can’t even use requests after a fork but before exec.

    What I am still experimenting with is to see if I can define a pthread_atfork handler that will initialize the ObjC runtime before fork is actually called. I saw a Ruby approach like this, but it’s made more difficult in Python because pthread_atfork isn’t exposed to Python. I’m trying to see if I can implement it in ctypes, before I write an extension.

    "To guarantee that forking is safe, the application must not be running any threads at the point of fork" (from @kapilt's second link) is an old truth that we continue to fight with even when we know very well that it's the truth.

    True, but do realize this problem affects you even in single threaded applications.

    For newly developed code, we have the alternative to employ spawn instead of fork to avoid these problems in Python, C, Ruby, etc. For existing legacy code that employed fork and now surprises us by failing-fast on MacOS 10.13 and 10.14, it seems we are forced to face a technical debt incurred back when the choice was first made to spin up threads and afterwards to use fork.

    It’s tech debt you incur even if you don’t spin up threads. Just fork and do some work in the child before calling exec. If that work enters the ObjC runtime (as in the getproxies example), your child will coredump,

    If we didn't already have an "obvious" (zen of Python) way to avoid such problems with spawn versus fork, I would feel this was something to solve in Python. As to helping the poor unfortunate souls who must fight the good fight with legacy code, I am not sure what to do to help though I would like to be able to help.

    *If* we can provide a hook to initialize the ObjC runtime in pthread_atfork, I think that’s something we could expose in Python. Then we can say legacy code can just invoke that, and at least you will avoid the worst outcome.

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented Nov 15, 2018

    I have a reliable way to call *something* in the pthread_atfork prepare handler, but I honestly don't know what to call to prevent the crash.

    In the Ruby thread, it seemed to say that you could just dlopen /System/Library/Frameworks/Foundation.framework/Foundation but that does not work for me. Neither does also loading the CoreFoundation and SystemConfiguration frameworks.

    If anybody has something that will reliably initialize the runtime, I can post my approach (there are a few subtleties). Short of that, I think there's nothing that can be done except ensure that exec is called right after fork.

    @ronaldoussoren
    Copy link
    Contributor

    @ronaldoussoren ronaldoussoren commented Dec 5, 2018

    AFAIK there is nothing you can do between after calling fork(2) to "reinitialise" the ObjC runtime. And I don't think that's the issue anyway: I suspect that the actual problem is that Apple's system frameworks use multithreading (in particular libdispatch) and don't have code to ensure a sane state after calling fork.

    In Python 3 there is another workaround to avoid problems using multiprocessing: use multiprocessing.set_start_method() to switch away from the "fork" startup handler to "spawn" or "forkserver" (the latter only when calling set_start_method before calling any code that might call into Apple system frameworks.

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Dec 9, 2018

    New changeset ac218bc by Ned Deily in branch 'master':
    bpo-33725: skip test_multiprocessing_fork on macOS (GH-11043)
    ac218bc

    @miss-islington
    Copy link
    Contributor

    @miss-islington miss-islington commented Dec 9, 2018

    New changeset d4bcf13 by Miss Islington (bot) in branch '3.7':
    bpo-33725: skip test_multiprocessing_fork on macOS (GH-11043)
    d4bcf13

    @miss-islington
    Copy link
    Contributor

    @miss-islington miss-islington commented Dec 9, 2018

    New changeset df5d884 by Miss Islington (bot) in branch '3.6':
    bpo-33725: skip test_multiprocessing_fork on macOS (GH-11043)
    df5d884

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Dec 9, 2018

    Since it looks like multiprocessing_fork is not going to be fixable for macOS, the main issue remaining is how to help users avoid this trap (literally). Should we add a check and issues a warning or error at run time? Or is a doc change sufficient?

    In the meantime, I've merged changes to disable running test_multiprocessing_fork which will sometimes (but not always) segfault on 10.14 Mojave. I should apologize to Barry and others who have run into this. I did notice the occasional segfault when testing with Mojave just prior to its release but it wasn't always reproducible and I didn't follow up on it. Now that the change in 10.14 behavior makes this existing problem with fork no exec more obvious, it's clear that the test segfaults are another manifestation of this.

    @applio
    Copy link
    Member

    @applio applio commented Dec 9, 2018

    Do we really need to disable the running of test_multiprocessing_fork entirely on MacOS?

    My understanding so far is that not *all* of the system libraries on the mac are spinning up threads and so we should expect that there are situations where fork alone may be permissible, but of course we don't yet know what those are. Pragmatically speaking, I have not yet seen a report of test_multiprocessing_fork tests triggering this problem but I would like to see/hear that when it is observed (that's my pitch for leaving the tests enabled).

    @applio
    Copy link
    Member

    @applio applio commented Dec 9, 2018

    @ned.deily: Apologies, I misread what you wrote -- I would like to see the random segfaults that you were seeing on Mojave if you can still point me to a few.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented May 29, 2019

    To be clear, what is unsafe on macOS (as of 10.13, but even more so on 10.14) is calling into the Objective-C runtime between fork and exec. The problem for Python is that it’s way too easy to do that implicitly, thus causing the macOS to abort the subprocess in surprising ways.

    Do only a few Python module use the Objective-C runtime? Or is it basically "everything"?

    If it's just a few, would it be possible to emit a warning or even an exception if called in a child process after fork?

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented May 29, 2019

    To be clear, what is unsafe on macOS (as of 10.13, but even more so on 10.14) is calling into the Objective-C runtime between fork and exec.

    No, it has *always* been unsafe. What's new as of 10.13/14 is that macOS tries much harder at runtime to detect such cases and more predictably cause an error rather than let the process run on and possibly fail nondeterministically.

    Do only a few Python module use the Objective-C runtime? Or is it basically "everything"?

    I don't think we should try to second-guess this. We now recognize that using fork like this on macOS has always been dangerous. For some programs it will be fine, for others it won't. People have had many macOS and Python releases to deal with this; if it works for their application, we shouldn't be changing the default for them. But let's make it easier for new users to do the right thing - first by documenting the pitfall, then, in 3.8, changing the default.

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented May 29, 2019

    On May 28, 2019, at 17:21, STINNER Victor <report@bugs.python.org> wrote:

    STINNER Victor <vstinner@redhat.com> added the comment:

    > To be clear, what is unsafe on macOS (as of 10.13, but even more so on 10.14) is calling into the Objective-C runtime between fork and exec. The problem for Python is that it’s way too easy to do that implicitly, thus causing the macOS to abort the subprocess in surprising ways.

    Do only a few Python module use the Objective-C runtime? Or is it basically "everything"?

    If it's just a few, would it be possible to emit a warning or even an exception if called in a child process after fork?

    I think it’s hard to know, but I found it through a path that lead from requests to _scproxy.c. Here’s everything I know about the subject:

    https://wefearchange.org/2018/11/forkmacos.rst.html

    So yes, it’s theoretically possible to do *some* between fork and exec and not crash, and it’s of course perfectly safe to call exec pretty much right after fork. It’s just hard to know for sure, and there are surprising ways to get into the Objective-C runtime.

    I think we won’t be able to work around all of Apple’s choices here. Documentation is the best way to handle it in <=3.7, and changing the default makes sense to me for 3.8.

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented May 29, 2019

    On May 28, 2019, at 17:38, Ned Deily <report@bugs.python.org> wrote:

    Ned Deily <nad@python.org> added the comment:

    > To be clear, what is unsafe on macOS (as of 10.13, but even more so on 10.14) is calling into the Objective-C runtime between fork and exec.

    No, it has *always* been unsafe. What's new as of 10.13/14 is that macOS tries much harder at runtime to detect such cases and more predictably cause an error rather than letter than let the process run on and possibly fail nondeterministically.

    Right, thanks for the additional nuance. I think what changed is that in 10.13, Apple added a warning output when this condition occurred, and in 10.14 they actually abort the subprocess.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jun 4, 2019

    Ned Deily:

    No, it has *always* been unsafe. What's new as of 10.13/14 is that macOS tries much harder at runtime to detect such cases and more predictably cause an error rather than letter than let the process run on and possibly fail nondeterministically.

    Hum, in the doc, I wrote:

    .. versionchanged:: 3.8

    On macOS, *spawn* start method is now the default: *fork* start method is no
    longer reliable on macOS, see :issue:`33725`.

    Should we change this text? Any proposition?

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented Jun 4, 2019

    On Jun 4, 2019, at 08:11, STINNER Victor <report@bugs.python.org> wrote:

    Ned Deily:
    > No, it has *always* been unsafe. What's new as of 10.13/14 is that macOS tries much harder at runtime to detect such cases and more predictably cause an error rather than letter than let the process run on and possibly fail nondeterministically.

    Hum, in the doc, I wrote:

    .. versionchanged:: 3.8

    On macOS, *spawn* start method is now the default: *fork* start method is no
    longer reliable on macOS, see :issue:`33725`.

    Should we change this text? Any proposition?

    Thanks Victor. I don’t think “reliable” is strong enough, since this will definitely lead to core dumps under certain conditions. What about:

    On macOS, the *spawn* start method is now the default. The *fork* start method should
    be considered unsafe as it can lead to crashes of the subprocess. See :issue:`33725`.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jun 5, 2019

    Thanks Victor. I don’t think “reliable” is strong enough, since this will definitely lead to core dumps under certain conditions. What about: (...)

    That sounds better: I wrote PR 13841.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jun 5, 2019

    New changeset 1e77ab0 by Victor Stinner in branch 'master':
    bpo-33725, multiprocessing doc: rephase warning against fork on macOS (GH-13841)
    1e77ab0

    @miss-islington
    Copy link
    Contributor

    @miss-islington miss-islington commented Jun 5, 2019

    New changeset d74438b by Miss Islington (bot) in branch '3.8':
    bpo-33725, multiprocessing doc: rephase warning against fork on macOS (GH-13841)
    d74438b

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Jun 17, 2019

    As far as I can tell, the only thing left to do for this issue is to add a documentation warning to the 3.7 documents similar to what was added to 3.8 but without the change in default. A PR would be nice.

    @mouse07410
    Copy link
    Mannequin

    @mouse07410 mouse07410 mannequin commented Mar 29, 2020

    The fix applied for this problem actually broke multiprocessing on MacOS. The change to the new default 'spawn' from 'fork' causes program to crash in spawn.py with FileNotFoundError: [Errno 2] No such file or directory.

    I've tested this on MacOS Catalina 10.15.3 and 10.15.4, with Python-3.8.2 and Python-3.7.7.

    With Python-3.7.7 everything works as expected.

    Here's the output:
    {{{
    $ python3.8 multi1.py 
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
        self = reduction.pickle.load(from_parent)
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/synchronize.py", line 110, in __setstate__
        self._semlock = _multiprocessing.SemLock._rebuild(*state)
    FileNotFoundError: [Errno 2] No such file or directory
    }}}

    Here's the program:
    {{{
    #!/usr/bin/env python3

    # Test "multiprocessing" package included with Python-3.6+

    # Usage:
    # ./mylti1.py [nElements [nProcesses [tSleep]]]

    # nElements - total number of integers to put in the queue
    # default: 100
    # nProcesses - total number of parallel processes/threads
    # default: number of physical cores available
    # tSleep - number of milliseconds for a thread to sleep
    # after it retrieved an element from the queue
    # default: 17

    # Algorithm:
    # 1. Creates a queue and adds nElements integers to it,
    # 2. Creates nProcesses threads
    # 3. Each thread extracts an element from the queue and sleeps for tSleep milliseconds

    import sys, queue, time
    import multiprocessing as mp
    
    
    def getElements(q, tSleep, idx):
        l = []  # list of pulled numbers
        while True:
            try:
                l.append(q.get(True, .001))
                time.sleep(tSleep)
            except queue.Empty:
                if q.empty():
                    print(f'worker {idx} done, got {len(l)} numbers')
                    return
    
    
    if __name__ == '__main__':
        nElements = int(sys.argv[1]) if len(sys.argv) > 1 else 100
        nProcesses = int(sys.argv[2]) if len(sys.argv) > 2 else mp.cpu_count()
        tSleep = float(sys.argv[3]) if len(sys.argv) > 3 else 17
    # Uncomment the following line to make it working with Python-3.8+
    #mp.set_start_method('fork')
    
        # Fill the queue with numbers from 0 to nElements
        q = mp.Queue()
        for k in range(nElements):
            q.put(k)
    
        # Start worker processes
        for m in range(nProcesses):
            p = mp.Process(target=getElements, args=(q, tSleep / 1000, m))
            p.start()
    }}}

    @mouse07410 mouse07410 mannequin added the type-crash label Mar 29, 2020
    @mouse07410
    Copy link
    Mannequin

    @mouse07410 mouse07410 mannequin commented Mar 29, 2020

    Tried 'spawn', 'fork', 'forkserver'.

    • 'spawn' causes consistent FileNotFoundError: [Errno 2] No such file or directory;
    • 'fork' consistently works (tested on machines with 4 and 20 cores);
    • 'forkserver' causes roughly half of the processes to crash with FileNotFoundError, the other half succeeds (weird!).

    @mdickinson
    Copy link
    Member

    @mdickinson mdickinson commented Mar 29, 2020

    @Mouse: see bpo-28965. The fix for the code you show is to join the child processes before the main process starts exiting.

    @mouse07410
    Copy link
    Mannequin

    @mouse07410 mouse07410 mannequin commented Mar 29, 2020

    @mark.dickinson, the issue you referred to did not show a working sample. Could you demonstrate on my example how it should be applied? Thanks!

    @mouse07410
    Copy link
    Mannequin

    @mouse07410 mouse07410 mannequin commented Mar 29, 2020

    Also, adding p.join() immediately after p.start() in my sample code showed this timing:

    $ time python3.8 multi1.py 
    worker 0 done, got 100 numbers
    worker 1 done, got 0 numbers
    worker 2 done, got 0 numbers
    worker 3 done, got 0 numbers
    
    real	0m2.342s
    user	0m0.227s
    sys	0m0.111s
    $ 
    

    Setting instead start to fork showed this timing:

    $ time python3.8 multi1.py 
    worker 2 done, got 25 numbers
    worker 0 done, got 25 numbers
    worker 1 done, got 25 numbers
    worker 3 done, got 25 numbers
    
    real	0m0.537s
    user	0m0.064s
    sys	0m0.040s
    $ 
    

    The proposed fix is roughly four times slower, compared to reverting start to fork.

    @mdickinson
    Copy link
    Member

    @mdickinson mdickinson commented Mar 29, 2020

    @Mouse: replace the last block of your code with something like this:

        # Start worker processes
        workers = []
        for m in range(nProcesses):
            p = mp.Process(target=getElements, args=(q, tSleep / 1000, m))
            workers.append(p)
            p.start()
    
        # Wait for all workers to complete
        for p in workers:
            p.join()

    But I don't think this tracker issue is the right place to have this conversation. It's highly unlikely that this change will be reverted - there are strong reasons to avoid fork on macOS, and the issue you're reporting isn't directly related to this one: it's an issue with using "spawn" on any OS.

    However, there may be scope for improving the documentation so that fewer users fall into this trap. I'd suggest opening another issue for that, or continuing the conversation on bpo-28965.

    @mouse07410
    Copy link
    Mannequin

    @mouse07410 mouse07410 mannequin commented Mar 29, 2020

    @mark.dickinson, thank you. Following your suggestion, I've added a comment in bpo-28965, and created a new issue https://bugs.python.org/issue40106.

    @terryjreedy
    Copy link
    Member

    @terryjreedy terryjreedy commented May 29, 2020

    Since the default has been different on different systems for as long as I remember, I see no reason to break code on *nix in the name of 'consistency'. Asyncio also works different on different systems.

    Aside from that idea, is there anything else left for this issue? Especially at this time as opposed to some possible future when macOS changes?

    @warsaw
    Copy link
    Member

    @warsaw warsaw commented May 29, 2020

    I don't think there's really anything more to do here. I'm closing the issue. Let's open a new one if needed at some future point.

    @warsaw warsaw closed this as completed May 29, 2020
    @ahmedsayeed1982 ahmedsayeed1982 mannequin added stdlib 3.7 and removed OS-mac 3.8 labels Nov 4, 2021
    @eryksun eryksun added OS-mac 3.8 and removed stdlib 3.7 labels Nov 4, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 OS-mac type-crash
    Projects
    None yet
    Development

    No branches or pull requests