Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use os.posix_spawn in subprocess #79718

Closed
nanjekyejoannah opened this issue Dec 19, 2018 · 61 comments
Closed

use os.posix_spawn in subprocess #79718

nanjekyejoannah opened this issue Dec 19, 2018 · 61 comments
Assignees
Labels
3.8 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@nanjekyejoannah
Copy link
Contributor

BPO 35537
Nosy @gpshead, @vstinner, @serhiy-storchaka, @koobs, @izbyshev, @pablogsal, @miss-islington, @nanjekyejoannah, @kevans91
PRs
  • bpo-35537: use os.posix_spawn in subprocess #11242
  • bpo-35537: subprocess uses os.posix_spawn in some cases #11452
  • bpo-35537: subprocess can use posix_spawn with pipes #11575
  • bpo-35537: subprocess can now use os.posix_spawnp #11579
  • Revert "bpo-35537: subprocess can now use os.posix_spawnp (GH-11579)" #11582
  • bpo-35537: Add setsid parameter to os.posix_spawn() and os.posix_spawnp(). #11608
  • bpo-35537: Document posix_spawn() change in subprocess #11668
  • bpo-35537: Fix docstring typos in _use_posix_spawn() #11684
  • subprocess: close pipes/fds by using ExitStack #11686
  • bpo-35537: Skip test_start_new_session() of posix_spawn #11718
  • [WIP] bpo-35537: Fix function name in os.posix_spawnp() errors #11719
  • bpo-35537: Rewrite setsid test for os.posix_spawn #11721
  • bpo-35537: Change subprocess to use posix_spawnp instead of posix_spawn #11917
  • [3.8] bpo-35537: Rewrite setsid test for os.posix_spawn (GH-11721) #14093
  • bpo-35537: Add a comment to _Py_RestoreSignals() #18792
  • Files
  • subprocess_bench.py
  • subprocess_bench_stdout.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vstinner'
    closed_at = <Date 2019-06-05.08:27:39.280>
    created_at = <Date 2018-12-19.16:47:59.385>
    labels = ['3.8', 'type-feature', 'library']
    title = 'use os.posix_spawn in subprocess'
    updated_at = <Date 2020-03-05.16:41:09.872>
    user = 'https://github.com/nanjekyejoannah'

    bugs.python.org fields:

    activity = <Date 2020-03-05.16:41:09.872>
    actor = 'vstinner'
    assignee = 'vstinner'
    closed = True
    closed_date = <Date 2019-06-05.08:27:39.280>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2018-12-19.16:47:59.385>
    creator = 'nanjekyejoannah'
    dependencies = []
    files = ['48011', '48057']
    hgrepos = []
    issue_num = 35537
    keywords = ['patch', 'patch', 'patch']
    message_count = 61.0
    messages = ['332151', '332204', '332210', '332212', '332213', '332216', '332217', '332219', '332222', '332223', '332235', '332236', '332237', '332238', '332239', '332266', '332270', '332272', '332393', '332565', '333123', '333126', '333127', '333131', '333571', '333629', '333646', '333656', '333657', '333738', '333739', '333740', '333744', '333745', '333749', '333771', '333794', '333795', '333799', '333800', '333803', '333805', '333820', '333825', '333839', '333889', '333902', '333994', '334268', '334269', '334288', '334292', '334305', '334337', '334661', '334662', '334685', '340838', '344683', '345617', '345621']
    nosy_count = 9.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'serhiy.storchaka', 'koobs', 'izbyshev', 'pablogsal', 'miss-islington', 'nanjekyejoannah', 'kevans']
    pr_nums = ['11242', '11452', '11575', '11579', '11582', '11608', '11668', '11684', '11686', '11718', '11719', '11721', '11917', '14093', '18792']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue35537'
    versions = ['Python 3.8']

    @nanjekyejoannah
    Copy link
    Contributor Author

    On Linux, posix_spawn() uses vfork() instead of fork() which blocks
    the parent process. The child process executes exec() early and so we
    don't pay the price of duplicating the memory pages (the thing which
    tracks memory pages of a process).

    On macOS, posix_spawn() is a system call, so the kernel is free to use
    fast-paths to optimize it as they want.

    posix_spawn() is faster but it's also safer: it allows us to do a lot of
    "actions" before exec(), before executing the new program. For
    example, you can close files and control signals. Again, on macOS,
    these actions are "atomic" since it's a system call. On Linux, glibc uses a very good implementation which has no race condition.

    Currently, Python uses a C extension _posixsubprocess which more or
    less reimplements posix_spawn(): close file descriptors, make some
    file descriptors inheritable or not, etc. It is very tricky to write
    correct code: code run around fork() is very fragile. In theory, the
    POSIX specification only allows to use "async-signal-safe" functions
    after fork()...

    So it would be great to avoid _posixsubprocess whenever possible for
    (1) speed (2) correctness.

    @nanjekyejoannah nanjekyejoannah added 3.8 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Dec 19, 2018
    @vstinner vstinner self-assigned this Dec 20, 2018
    @vstinner
    Copy link
    Member

    "posix_spawn() is faster"

    This assumption needs a benchmark!

    I ran a benchmark on Linux (Fedora 29, Linux kernel 4.19.9-300.fc29.x86_64, glibc 2.28) using attached subprocess_bench.py.

    I rebased the PR 11242 on master and run the benchmark in virtual environment:

    git co pr/11242
    # current branch: PR 11242
    make distclean
    ./configure
    make
    ./python -m venv env
    env/bin/python -m pip install perf
    env/bin/python subprocess_bench.py -v -o posix_spawn.json
    git co master
    env/bin/python subprocess_bench.py -v -o fork_exec.json
    env/bin/python -m perf compare_to fork_exec.json posix_spawn.json

    The result is quite explicit: the PR makes subprocess.Popen 61x faster!!!

    $ env/bin/python -m perf compare_to fork_exec.json posix_spawn.json 
    Mean +- std dev: [fork_exec] 27.1 ms +- 0.4 ms -> [posix_spawn] 447 us +- 163 us: 60.55x faster (-98%)

    That's the best case:

    • The parent process (Python) allocated 2 GiB of memory: that's not uncommon for large application. On OpenStack for example, it's common that a process takes more than 1 GiB.
    • The child process has a short execution time and allocates few memory.

    On my config (Fedora 29, Linux kernel 4.19.9-300.fc29.x86_64, glibc 2.28), os.posix_spawn() uses vfork():

    $ strace -o trace ./python -c 'import subprocess; subprocess.run(["/bin/true"], close_fds=False, restore_signals=False)'
    $ grep clone trace
    clone(child_stack=0x7fab28ac0ff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) = 23073

    I guess that the 61x speedup mostly comes from vfork().

    See also bpo-34663 for previous discussion about vfork() and os.posix_spawn(). In short, the glibc is smart and detects when vfork() can be used or not.

    @vstinner
    Copy link
    Member

    Benchmark on macOS Mojave (10.14.2), x86_64:

    $ ./python.exe -m perf compare_to fork_exec.json posix_spawn.json 
    Mean +- std dev: [fork_exec] 2.79 ms +- 0.06 ms -> [posix_spawn] 1.64 ms +- 0.03 ms: 1.70x faster (-41%)

    The speedup is "only" 1.7x faster, it's less impressive. Maybe fork()+exec() is more efficient on macOS than on Linux.

    @vstinner
    Copy link
    Member

    Benchmark on FreeBSD:

    $ env/bin/python -m perf compare_to fork_exec.json posix_spawn.json 
    Mean +- std dev: [fork_exec] 52.8 ms +- 4.7 ms -> [posix_spawn] 499 us +- 45 us: 105.91x faster (-99%)

    Wow! 106x faster!

    @vstinner
    Copy link
    Member

    See also:

    "How a fix in Go 1.9 sped up our Gitaly service by 30x" (hint: posix_spawn)
    https://about.gitlab.com/2018/01/23/how-a-fix-in-go-19-sped-up-our-gitaly-service-by-30x/

    fork/exec vs. posix_spawn benchmarks:
    https://github.com/rtomayko/posix-spawn#benchmarks

    @serhiy-storchaka
    Copy link
    Member

    We need to add tests for all corner cases. I.e. disabling any of conditions for posix_spawn should cause the failure.

    We need to do also something with PyOS_BeforeFork/PyOS_AfterFork_Child/PyOS_AfterFork_Parent.

    @vstinner
    Copy link
    Member

    We need to add tests for all corner cases. I.e. disabling any of conditions for posix_spawn should cause the failure.

    Do you mean running tests twice, one with posix_spawn(), one without?

    We need to do also something with PyOS_BeforeFork/PyOS_AfterFork_Child/PyOS_AfterFork_Parent.

    Can you please elaborate? posix_spawn() may or may not use fork() internally.

    @serhiy-storchaka
    Copy link
    Member

    Do you mean running tests twice, one with posix_spawn(), one without?

    I mean that after writing tests they can be tested manually by disabling conditions for posix_spawn one by one. I.e. some tests should fail if remove "stdout is None" and some tests should fail if remove "not close_fds", etc.

    Can you please elaborate? posix_spawn() may or may not use fork() internally.

    _posixsubprocess.fork_exec() calls PyOS_BeforeFork/PyOS_AfterFork_Child/PyOS_AfterFork_Parent. If use os.posix_spawn(), these calls will be omitted. This is a behavior change. We should either call these functions manually before/after os.posix_spawn() (but I do not know what to do with PyOS_AfterFork_Child), or disable using os.posix_spawn() if non-trivial callbacks were added, or use other methods for calling registered callbacks. But the stdlib already adds callbacks in the threading and random modules.

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Dec 20, 2018

    Serhiy, PyOS_* functions are called only if preexec_fn != None. But it will never be possible to implement support for preexec_fn (and some other subprocess features, e.g. close_fds) if processes are run via posix_spawn, so I don't see why anything should be done with those functions.

    @serhiy-storchaka
    Copy link
    Member

    Oh, nice! So this is not an obstacle for using os.posix_spawn().

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Dec 20, 2018

    Victor and Joannah, thanks for working on adding vfork() support to subprocess. Regarding speedups in the real world, I can share a personal anecdote. Back at the time when AOSP was built with make (I think it was AOSP 5) I've observed ~2x slowdown (or something close to that) if fork() is used instead of vfork() in make. This is the slowdown of the *whole* build (not just process creation time), and it's dramatic given the amount of pure computation (compilation of C++/Java code) involved in building AOSP. The underlying reason was that make merged all AOSP subproject makefiles into a single gigantic one, so make consumed more than 1 GB of RAM, and each shell invocation in recipes resulted in copying a large page table.

    That said, I'm not sure that the chosen approach of adding posix_spawn() "fast paths" for particular combinations of arguments is an optimal way to expose vfork() benefits to most users. For example, close_fds is True by default in subprocess, and I don't think there is a reliable way to use posix_spawn() in this case. So users would need to use "magic" combinations of arguments to subprocess.Popen to benefit from vfork(). Another example is "cwd" parameter: there is no standard posix_spawn file action to change the working directory in the child, so such a seemingly trivial operation would trigger the "slow" fork-exec path.

    Another approach would be to use vfork-exec instead fork-exec in _posixsubprocess. I know that previous discussions were resolved against vfork(), but I'm still not convinced and suggest to reevaluate those decisions again. Some arguments:

    1. While POSIX contain scare-wording forbidding to do pretty anything in a vfork-spawned child, real systems where posix_spawn() is not a system call call vfork() and then execute all specified actions in the child. Cases where vfork() is avoided seem to be a quality-of-implementation issue, not a fundamental issue (for example, until recently glibc used fork() in some cases because of heap memory allocations, but they could be avoided). In practice, there is no problem with calling at least async-signal-safe functions in vfork-children, and there is no technical reason why there would be any problem.

    2. _posixsubprocess is already very careful and calls only async-signal-safe functions in almost all cases (an obvious exception is preexec_fn, which is discouraged anyway).

    3. Our use case for vfork() is restricted, so some concerns don't apply to us. For example, the setuid() problem outlined in Rich Felker's post (https://ewontfix.com/7) doesn't seem to apply. The problem with signal handlers from the same post should be possible to avoid except for preexec_fn, but we'd have to fallback to fork() in this case anyway due to memory allocation.

    4. In the standard example of fork() unsafety in multithreaded processes (state of memory-based locks is copied to the child, and there could be nobody to unlock them), vfork() is *safer* than fork() because it still shares memory with the parent, and all threads other than the parent one are running. In particular, it should be perfectly safe to use any memory allocator that protects its state with something like futexes and atomics in a vfork-child even if other threads of the parent are using it concurrently.

    5. Other language runtimes are already using vfork(). Java has been doing it for ages, and Victor referenced Go.

    Some possible counter-arguments:

    1. We seem to be reimplementing posix_spawn(). In fact, due to (2) above, we can fully reuse the existing fork-exec code, with additional tweaks that doesn't seem hard at this point.

    2. My points above are based on Linux, but I don't know much about other Unix-likes, so in theory additional complications could exist. I can't, however, imagine any fundamental complication.

    In the end, I think that migrating subprocess to vfork-exec would have more impact for users than adding "fast paths" and have consistent performance regardless of subprocess.Popen arguments (with few exceptions). Please consider it. Thanks!

    @vstinner
    Copy link
    Member

    In the end, I think that migrating subprocess to vfork-exec would have more impact for users than adding "fast paths" and have consistent performance regardless of subprocess.Popen arguments (with few exceptions). Please consider it. Thanks!

    I'm open to experiment to use vfork() in _posixsubprocess, but I still want to use posix_spawn():

    • it's standard
    • it's safer
    • it can be more efficient

    On macOS, posix_spawn() is implemented as a syscall.

    Using vfork() can cause new issues: that's why there is a POSIX_SPAWN_USE_VFORK flag (the caller had to explicitly enable it). See also bpo-34663 the history of vfork in posix_spawn() in the glibc. I'm not against using it, but I'm not sure that vfork() can be used in all cases, when all subprocess options are used. Maybe you will end up with the same state: vfork() used or not depending on subprocess.Popen options...

    For example, close_fds is True by default in subprocess, and I don't think there is a reliable way to use posix_spawn() in this case.

    Since PEP-446 has been implemented in Python 3.4, I would be interested to revisit close_fds default value: don't close by default... But I'm not sure if it's *safe* to change the default... At least, we may promote close_fds=False in the doc explaining that it's faster and should be safer in the common case.

    Another example is "cwd" parameter: there is no standard posix_spawn file action to change the working directory in the child, so such a seemingly trivial operation would trigger the "slow" fork-exec path.

    I don't think that it's a bug that subprocess have different performances depending on the flags. It would be a bug if the behavior would be diffrent. I don't think that it's the case.

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Dec 20, 2018

    I'm open to experiment to use vfork() in _posixsubprocess
    Are you going to do experiments? If not, I can try to do some in early January.

    Using vfork() can cause new issues: that's why there is a POSIX_SPAWN_USE_VFORK flag (the caller had to explicitly enable it). See also bpo-34663 the history of vfork in posix_spawn() in the glibc.

    I've studied that, and that's what I referred to as "quality-of-implementation" problem. After glibc devs removed heap allocations and tweaked some other things, they could use vfork() in all cases. "musl" libc never had those problems and used vfork() from the beginning.

    @vstinner
    Copy link
    Member

    Are you going to do experiments? If not, I can try to do some in early January.

    I'm only interested to use posix_spawn(). If you want to experiment vfork() in _posixsubprocess: please go ahead!

    I've studied that, and that's what I referred to as "quality-of-implementation" problem. After glibc devs removed heap allocations and tweaked some other things, they could use vfork() in all cases. "musl" libc never had those problems and used vfork() from the beginning.

    _posixsubprocess shouldn't allocate memory on the heap *after* fork(), only before. If it does, it's a bug. Last time I checked _posixsubprocess, it looked to be written properly.

    But the subprocess module has the crazy 'preexec_fn' option which makes things complicated... Maybe we should deprecate (and then remove) it, it's dangerous and it can be avoided in many cases. Using a launcher program is safer than relying on preexec_fn. In OpenStack, I reimplemented prlimit in pure Python. It's an example of launcher to set an option in the child process:
    https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py

    @vstinner
    Copy link
    Member

    Serhiy Storchaka, Alexey Izbyshev: I read and understood your valid concerns.

    restore_signals=True will be quickly implemented, and so posix_spawn() code path will be tested by more tests of test_subprocess. If not, I will ensure that missing tests will be added.

    Enhance _posixsubprocess to use vfork() is an interesting project, but IMHO it's complementary and doesn't replace all advantages of posix_spawn().

    I am going to merge the PR tomorrow, except if someone sees a good reason to not merge it. I prefer to merge the PR early in the 3.8 development cycle to have more time to handle any issue if someone notice bugs. If something goes badly, we will be able to easily revert the change. Don't worry, I like to revert changes ;-)

    Again, I'm mentoring Joannah who is learning Python, so I prefer to move step by step (with small steps :-)). We will support more and more subprocess.Popen options.

    @vstinner
    Copy link
    Member

    I wasn't sure how posix_spawn() handles file descriptors of standard streams with O_CLOEXEC flag set.

    I wrote a test. Result: _posixsubprocess and os.posix_spawn() have the same behavior! If fd 2 (stderr) is marked with O_CLOEXEC, the fd 2 is closed in the child process, as expected. (There is no black magic to keep it open.)

    $ cat x.py 
    import subprocess, sys, os
    args = [sys.executable, 'y.py']
    os.set_inheritable(2, False)
    subprocess.run(args, close_fds=False, restore_signals=False)
    
    $ cat y.py 
    import os
    def fd_valid(fd):
        try:
            os.fstat(fd)
            return True
        except OSError:
            return False
    for fd in (0, 1, 2):
        print("fd %s valid? %s" % (fd, fd_valid(fd)))
    
    $ python3 x.py  # unpatched
    fd 0 valid? True
    fd 1 valid? True
    fd 2 valid? False
    
    $ ./python x.py # patched, use posix_spawn()
    fd 0 valid? True
    fd 1 valid? True
    fd 2 valid? False

    @vstinner
    Copy link
    Member

    Options that should be easy to support later:

    • relative executable: use shutil.which() or expose C posix_spawnp() function as os.posix_spawnp()

    • env: just pass it as the 3rd argument of os.posix_spawn() (trivial patch)

    • restore_signals: use setsigdef (Joannah has a patch)

    I don't see how to support following options:

    • preexec_fn: I really hate this feature...

    • cwd

    • start_new_session

    • close_fds: there is posix_spawn_file_actions_addclose(), but I'm not sure that we can create a list of file descriptor in a reliable way, since a different thread can open a FD in the meanwhile... Currently, _posixsubprocess iterates on /proc/self/fd/ *after* the fork, when there is a single thread.

    • pass_fds: there is not API to mark a fd as inheritable (clear O_CLOEXEC flag). We cannot just change the O_CLOEXEC temporarily since a different thread can spawn a child process "at the same time". The GIL doesn't protect C threads which don't use the GIL.

    For pipes like stdout=PIPE or stdout=fd, I didn't look yet how to implement that properly.

    Maybe one way to work around the pass_fds limitation in an applicaton is to mark the FDs to pass as inheritable (os.set_inheritable(fd, True)) and use close_fds=False:

    fd = ...
    subprocess.Popen(..., pass_fds={fd})
    os.close(fd)

    would become:

    fd = ...
    os.set_inheritable(fd, True)
    subprocess.Popen(..., close_fds=False)
    os.close(fd)

    But this pattern is not thread if the application has other threads. So that should be the responsibility of the developer, not of Python, to write "unsafe" code" which requires the knownledge of the application behavior.

    @vstinner
    Copy link
    Member

    I am going to merge the PR tomorrow, except if someone sees a good reason to not merge it.

    I changed my mind and I decided to wait until I'm back from holiday instead :-)
    #11242 (comment)

    The function is more tricky than what I expected!

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Dec 23, 2018

    • cwd

    posix_spawn_file_actions_addchdir_np() is scheduled for glibc 2.29 [1] and exists in Solaris [2] (though its suffix indicates that it's "non-portable" -- not in POSIX). POSIX also has a bug for this [7].

    • start_new_session

    posix_spawnattr_setflags() supports POSIX_SPAWN_SETSID in musl [3] and glibc 2.26 [4] and is scheduled to be added into POSIX. Solaris also has POSIX_SPAWN_SETSID_NP.

    • pass_fds: there is not API to mark a fd as inheritable (clear O_CLOEXEC flag)

    POSIX has a bug for this [5]. It's marked fixed, but the current POSIX docs doesn't reflect the changes. The idea is to make posix_spawn_file_actions_adddup2() clear O_CLOEXEC if the source and the destination are the same (unlike dup2(), which would do nothing). musl has already implemented the new behavior [6], but glibc hasn't.

    Another alternative (also described in the POSIX bug report) is to remove O_CLOEXEC via a side-effect of posix_spawn_file_actions_adddup2() to a dummy descriptor and then dup2() it back again. This is not correct in general case -- record locks (fcntl(F_SETLK)) associated with the file referred to by the descriptor are released on the second dup2() -- but in our case those locks are not preserved for the new process anyway, so this is not a problem. I'm not aware of other correctness problems with this, but one needs to check for platforms where a file descriptor may have associated flags other than O_CLOEXEC (such flags would be cleared by dup2).

    For pipes like stdout=PIPE or stdout=fd, I didn't look yet how to implement that properly

    You'd need to reimplement the corresponding logic from _posixsubprocess. Basically, posix_spawn_file_actions_adddup2() and posix_spawn_file_actions_addclose() do the job, but you need to avoid corner cases regarding low fds (0, 1, 2) that might have been reused for pipes.

    I don't know correct solutions for close_fds. It is already hard: it's implemented in async-safe way only for Linux now (and only if /proc is available) or if /proc is not available and the brute-force implementation is used.

    And of course, in theory you could implement any of the above by a helper binary as you did for prlimit.

    [1] https://sourceware.org/bugzilla/show_bug.cgi?id=17405
    [2] https://docs.oracle.com/cd/E88353_01/html/E37843/posix-spawn-file-actions-addchdir-np-3c.html
    [3] https://git.musl-libc.org/cgit/musl/commit/?id=bb439bb17108b67f3df9c9af824d3a607b5b059d
    [4] https://www.sourceware.org/ml/libc-alpha/2017-08/msg00010.html
    [5] http://austingroupbugs.net/view.php?id=411
    [6] https://git.musl-libc.org/cgit/musl/commit/?id=6fc6ca1a323bc0b6b9e9cdc8fa72221ae18fe206
    [7] http://austingroupbugs.net/view.php?id=1208

    @gpshead
    Copy link
    Member

    gpshead commented Dec 26, 2018

    Thanks for all your research and reference links on this! As a _posixsubprocess maintainer, I am not against either posix_spawn or vfork being used directly in the future when feasible.

    A challenge, especially with platform specific vfork, is making sure we understand exactly which platforms it can work properly on and checking for those both at compile time _and_ runtime (running kernel version and potentially the runtime libc version?) so that we can only use it in situations we are sure it is supposed to behave as desired in. My guiding philosophy: Be conservative on choosing when such a thing is safe to use.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2019

    Gregory:

    Thanks for all your research and reference links on this! As a _posixsubprocess maintainer, I am not against either posix_spawn or vfork being used directly in the future when feasible.

    Are you against using posix_spawn() in subprocess? Or only in _posixsubprocess?

    --

    Alexey Izbyshev identified a difference between _posixsubprocess and posix_spawn() PR 11242: error handling depends a lot on the libc implementation.
    #11242 (comment)

    • On recent glibc, posix_spawn() fails (non-zero return value) and set errno to ENOENT if the executed program doesn't exist... but not all platforms have this behavior.
    • On FreeBSD, if setting posix_spawn() "attributes" or execute posix_spawn() "file actions" fails, posix_spawn() succeed but the child process exits immediately with exit code 127 without trying to call execv(). If execv() fails, posix_spawn() succeed, but the child process exit with exit code 127.
    • The worst seems to be: "In my test on Ubuntu 14.04, ./python -c "import subprocess; subprocess.call(['/xxx'], close_fds=False, restore_signals=False)" silently returns with zero exit code."

    execv() can fail for a lot of different reasons. Extract of Linux execve() manual page:
    ---
    E2BIG The total number of bytes in the environment (envp) and argument list (argv) is too large.

       EACCES Search permission is denied on a component of the path prefix of filename or the name of a script interpreter.  (See also
              path_resolution(7).)
    
       EACCES The file or a script interpreter is not a regular file.
    
       EACCES Execute permission is denied for the file or a script or ELF interpreter.
    
       EACCES The filesystem is mounted noexec.
    
       EAGAIN (since Linux 3.1)
              Having  changed  its  real  UID  using one of the set*uid() calls, the caller was—and is now still—above its RLIMIT_NPROC
              resource limit (see setrlimit(2)).  For a more detailed explanation of this error, see NOTES.
    
       EFAULT filename or one of the pointers in the vectors argv or envp points outside your accessible address space.
    
       EINVAL An ELF executable had more than one PT_INTERP segment (i.e., tried to name more than one interpreter).
    
       EIO    An I/O error occurred.
    
       EISDIR An ELF interpreter was a directory.
    
       ELIBBAD
              An ELF interpreter was not in a recognized format.
    
       ELOOP  Too many symbolic links were encountered in resolving filename or the name of a script or ELF interpreter.
    
       ELOOP  The maximum recursion limit was reached during  recursive  script  interpretation  (see  "Interpreter  scripts",  above).
              Before Linux 3.8, the error produced for this case was ENOEXEC.
    
       EMFILE The per-process limit on the number of open file descriptors has been reached.
    
       ENAMETOOLONG
              filename is too long.
    
       ENFILE The system-wide limit on the total number of open files has been reached.
    
       ENOENT The  file  filename or a script or ELF interpreter does not exist, or a shared library needed for the file or interpreter
              cannot be found.
    
       ENOEXEC
              An executable is not in a recognized format, is for the wrong architecture, or has some other format error that means  it
              cannot be executed.
    
       ENOMEM Insufficient kernel memory was available.
    
       ENOTDIR
              A component of the path prefix of filename or a script or ELF interpreter is not a directory.
    
       EPERM  The  filesystem  is  mounted  nosuid, the user is not the superuser, and the file has the set-user-ID or set-group-ID bit
              set.
    
       EPERM  The process is being traced, the user is not the superuser and the file has the set-user-ID or set-group-ID bit set.
    
       EPERM  A "capability-dumb" applications would not obtain the full set of permitted capabilities granted by the executable  file.
              See capabilities(7).
    
       ETXTBSY
              The specified executable was open for writing by one or more processes.
    

    ---

    Simply exit with exit code 127 (FreeBSD behavior) doesn't allow to know if the program have been executed or not.

    For example, "git bisect run" depends on the exit code: exit code 127 means "command not found". But with posix_spawn(), exit code 127 can means something else...

    I'm disappointed by the qualify of the posix_spawn() implementation on FreeBSD and old glibc...

    --

    I'm now confused.

    Should we still use posix_spawn() on some platforms? For example, posix_spawn() seems to have a well defined error reporting according to its manual page:
    https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/posix_spawn.2.html

    """
    RETURN VALUES
    If the pid argument is NULL, no pid is returned to the calling process;
    if it is non-NULL, then posix_spawn() and posix_spawnp() functions return
    the process ID of the child process into the pid_t variable pointed to by
    the pid argument and return a 0 on success. If an error occurs, they
    return a non-zero error code as the function return value, and no child
    process is created.

    ERRORS
    The posix_spawn() and posix_spawnp() functions will fail and return to
    the calling process if:

     [EINVAL]           The value specified by file_actions or attrp is
                        invalid.
    
     [E2BIG]            The number of bytes in the new process's argument list
                        is larger than the system-imposed limit.  This limit
                        is specified by the sysctl(3) MIB variable
                        KERN_ARGMAX.
    
     [EACCES]           Search permission is denied for a component of the
                        path prefix.
    
     [EACCES]           The new process file is not an ordinary file.
    
     [EACCES]           The new process file mode denies execute permission.
    
     [EACCES]           The new process file is on a filesystem mounted with
                        execution disabled (MNT_NOEXEC in <sys/mount.h>).
    
     [EFAULT]           The new process file is not as long as indicated by
                        the size values in its header.
    
     [EFAULT]           Path, argv, or envp point to an illegal address.
    
     [EIO]              An I/O error occurred while reading from the file sys-tem. system.
                        tem.
    
     [ELOOP]            Too many symbolic links were encountered in translat-ing translating
                        ing the pathname.  This is taken to be indicative of a
                        looping symbolic link.
    
     [ENAMETOOLONG]     A component of a pathname exceeded {NAME_MAX} charac-ters, characters,
                        ters, or an entire path name exceeded {PATH_MAX} char-acters. characters.
                        acters.
    
     [ENOENT]           The new process file does not exist.
    
     [ENOEXEC]          The new process file has the appropriate access per-mission, permission,
                        mission, but has an unrecognized format (e.g., an
                        invalid magic number in its header).
    
     [ENOMEM]           The new process requires more virtual memory than is
                        allowed by the imposed maximum (getrlimit(2)).
    
     [ENOTDIR]          A component of the path prefix is not a directory.
    
     [ETXTBSY]          The new process file is a pure procedure (shared text)
                        file that is currently open for writing or reading by
                        some process.
    

    """

    Would it be reasonable to use posix_spawn() but only on platforms where the implementation is known to be "good"?

    Good would mean that execv() and posix_spawn() errors (setting attributes or file actions failures) are properly reported to the parent process.

    For example, use posix_spawn() in subprocess on "recent" glibc (which minimum version?) and macOS (where it's a syscall)?

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2019

    I wrote PR 11452 which is based on PR 11242 but adds support for 'env' and 'restore_signals' parameters and checks the operating system and libc version to decide if posix_spawn() can be used by subprocess.

    In my implementation, posix_spawn() is only used on macOS and glibc 2.26 or newer (and glibc 2.24 and newer on Linux).

    According to Alexey Izbyshev, musl should be safe as well, but I don't know how to test musl on my Fedora, nor how to check if Python is linked to musl, nor what is the minimum musl version which is safe.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2019

    "The native spawn(2) system introduced in Oracle Solaris 11.4 shares a lot of code with the forkx(2) and execve(2)."
    https://blogs.oracle.com/solaris/posix_spawn-as-an-actual-system-call

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2019

    I created bpo-35674: "Expose os.posix_spawnp()" to later support executable with no directory in subprocess.

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Jan 13, 2019

    • On FreeBSD, if setting posix_spawn() "attributes" or execute posix_spawn() "file actions" fails, posix_spawn() succeed but the child process exits immediately with exit code 127 without trying to call execv(). If execv() fails, posix_spawn() succeed, but the child process exit with exit code 127.
    • The worst seems to be: "In my test on Ubuntu 14.04, ./python -c "import subprocess; subprocess.call(['/xxx'], close_fds=False, restore_signals=False)" silently returns with zero exit code."

    To clarify, in the Ubuntu example only the parent process returns with zero exit code. The child exits with 127, so the behavior is the same as on FreeBSD, as demonstrated by check_call():

    $ ./python -c "import subprocess; subprocess.check_call(['/xxx'], close_fds=False, restore_signals=False)"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/izbyshev/workspace/cpython/Lib/subprocess.py", line 348, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['/xxx']' returned non-zero exit status 127.

    According to Alexey Izbyshev, musl should be safe as well, but I don't know how to test musl on my Fedora, nor how to check if Python is linked to musl, nor what is the minimum musl version which is safe.

    musl doesn't have an equivalent of __GLIBC__ macro by design as explained in its FAQ [1], and the same FAQ suggests to use something like "#ifndef __GLIBC__ portable_code() #else glibc_code()". So far musl-related fixes followed this advice (see PR 9288, PR 9224), but in this case it doesn't seem like a good idea to me. POSIX allows asynchronous error notification, so "portable" code shouldn't make assumptions about that, and technically old glibc and FreeBSD are conforming implementations, but not suitable as a replacement for _posixsubprocess.

    Maybe we could use configure-time musl detection instead? It seems like config.guess has some support for musl[2], but it uses "ldd" from PATH and hence appears to work out-of-the-box only on musl-based distros like Alpine. See also [3].

    Regarding the minimum musl version, in this case it's not important since the relevant change was committed in 2013[4], and given that musl is relatively young, such old versions shouldn't have users now.

    [1] https://wiki.musl-libc.org/faq.html
    [2]

    # If ldd exists, use it to detect musl libc.

    [3] RcppCore/Rcpp#449
    [4] https://git.musl-libc.org/cgit/musl/commit/?id=fb6b159d9ec7cf1e037daa974eeeacf3c8b3b3f1

    @vstinner
    Copy link
    Member

    https://wiki.musl-libc.org/faq.html

    """
    Q: Why is there no __MUSL__ macro?

    It’s a bug to assume a certain implementation has particular properties rather than testing. So far, every time somebody’s asked for this with a particular usage case in mind, the usage case was badly wrong, and would have broken support for the next release of musl. The official explanation: http://openwall.com/lists/musl/2013/03/29/13
    """

    IMHO that's wrong. A software like Python heavily rely on the *exact* implementation of a libc.

    https://github.com/python/cpython/pull/9224/files looks like a coarse heuristic to detect musl for example.

    Until muscl decides to provide an "#ifdef __MUSL__"-like or any way that it's musl, I propose to not support musl: don't use os.posix_spawn() but _posixsubprocess.

    @vstinner
    Copy link
    Member

    Oh... Using directly posix_spawnp() in subprocess to locate the executable in the PATH introduces subtle behavior changes...
    #11579

    @vstinner
    Copy link
    Member

    One of the issue that I have with using posix_spawn() is that the *exact* behavior of subprocess is not properly defined by test_subprocess. Should we more more tests, or document that the exact behavior is "an implementation detail"? I guess that the best for users is get the same behavior on all platforms, but can we really warranty that? Python rely on the operating system and the libc, but each platform has subtle behavior differneces. Handling time and date is a good example: strptime() and strftime() have big differences between platforms. posix_spawn() is another example where the implementation is very different depending on the platform.

    @pablogsal
    Copy link
    Member

    One of the issue that I have with using posix_spawn() is that the *exact* behavior of subprocess is not properly defined by test_subprocess. Should we more more tests, or document that the exact behavior is "an implementation detail"? I guess that the best for users is get the same behavior on all platforms, but can we really warranty that? Python rely on the operating system and the libc, but each platform has subtle behavior differneces. Handling time and date is a good example: strptime() and strftime() have big differences between platforms. posix_spawn() is another example where the implementation is very different depending on the platform.

    I don't think we can get the same behaviour in all platforms, but I would want to know what Gregory P. Smith thinks about this potential divergence in behaviour and what are the guarantees that posix_subprocess should maintain.

    @vstinner
    Copy link
    Member

    New changeset 8c34956 by Victor Stinner in branch 'master':
    Revert "bpo-35537: subprocess can now use os.posix_spawnp (GH-11579)" (GH-11582)
    8c34956

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Jan 16, 2019

    Hi,

    As a disclaimer, I'm a FreeBSD developer interested in making sure we're doing the right thing here. =)

    May I ask what the above assessment is based on, and specifically what we need to address?

    Hello, Kyle! That assessment is based on my quick and incorrect reading of posix_spawn source code. I didn't notice that "error" is visible in the parent due to address space sharing. Sorry for that, and thank you for the correction. A proper error notification is required for posix_spawn to be used in subprocess as a replacement for fork/exec, and FreeBSD does it right.

    While we're here, would you be kind to answer several questions about posix_spawn on FreeBSD?

    1. On Linux, without taking additional precautions, signals may be delivered to a child process created via vfork(). If a signal handler runs in such a child, it can leave traces of its work in the (shared) memory, potentially surprising the parent process. To avoid this, glibc[1] and musl[2] disable all signals (even signals used internally for thread cancellation) before creating a child, but FreeBSD calls vfork() right away. Is this not considered a problem, or is something hidden in vfork() implementation?

    2. Another problem with vfork() is potential sharing of address space between processes with different privileges (see Rich Felker's post[3] about this and the previous problem). Is this a valid concern on FreeBSD?

    3. Does FreeBSD kernel guarantee that when waitpid(WNOHANG) is called at [4], the child is ready for reaping? On Linux, it's not always the case[5].

    [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/spawni.c;h=353bcf5b333457d191320e358d35775a2e9b319b;hb=HEAD#l372
    [2] http://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c?id=5ce3737931bb411a8d167356d4d0287b53b0cbdc#n171
    [3] https://ewontfix.com/7
    [4] https://svnweb.freebsd.org/base/head/lib/libc/gen/posix_spawn.c?view=markup#l229
    [5] https://sourceware.org/git/?p=glibc.git;a=commit;h=aa95a2414e4f664ca740ad5f4a72d9145abbd426

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Jan 17, 2019

    One of the issue that I have with using posix_spawn() is that the *exact* behavior of subprocess is not properly defined by test_subprocess. Should we more more tests, or document that the exact behavior is "an implementation detail"?

    Regarding using PATH from "env" instead of parent's environment, it may be considered documented because subprocess docs refer to os.execvp(), where it's explicitly documented:

    """
    The variants which include a “p” near the end (execlp(), execlpe(), execvp(), and execvpe()) will use the PATH environment variable to locate the program file. When the environment is being replaced (using one of the exec*e variants, discussed in the next paragraph), the new environment is used as the source of the PATH variable.
    """

    The problem is that it differs from execvp() in libc (and POSIX), so I would consider such behavior a bug if it wasn't so old to become a feature. Thanks to Victor for noticing that, I missed it.

    So, if we can't change os.execvp() and/or current subprocess behavior, posix_spawnp seems to be ruled out. (I don't consider temporarily changing the parent environment as a solution). A naive emulation of posix_spawnp would be repeatedly calling posix_spawn for each PATH entry, but that's prohibitively expensive. Could we use a following hybrid approach instead?

    Iterate over PATH entries and perform a quick check for common exec errors (ENOENT, ENOTDIR, EACCES) manually (e.g. by stat()). If the check fails, exec would also fail so just skip the entry. (It's important not to get false negatives here, but the child process would have the same privileges as the parent since we don't use POSIX_SPAWN_RESETIDS, so I can't think of one). Otherwise, call posix_spawn with the absolute path. If it fails, skip the entry.

    Looks ugly, but are there other problems? This would seem to work just as well as posix_spawnp in the common case, but in the worst (contrived) case it would be equivalent to calling posix_spawn for each PATH entry.

    @kevans91
    Copy link
    Mannequin

    kevans91 mannequin commented Jan 17, 2019

    On Wed, Jan 16, 2019 at 5:47 PM Alexey Izbyshev <report@bugs.python.org> wrote:

    > Hi,

    > As a disclaimer, I'm a FreeBSD developer interested in making sure we're doing the right thing here. =)

    > May I ask what the above assessment is based on, and specifically what we need to address?

    Hello, Kyle! That assessment is based on my quick and incorrect reading of posix_spawn source code. I didn't notice that "error" is visible in the parent due to address space sharing. Sorry for that, and thank you for the correction. A proper error notification is required for posix_spawn to be used in subprocess as a replacement for fork/exec, and FreeBSD does it right.

    Oh, good to hear. =) koobs had pointed this thread out in hopes that we can try to reconcile and figure out what work needs to be done here. =)

    While we're here, would you be kind to answer several questions about posix_spawn on FreeBSD?

    Most definitely, if I can!

    1. On Linux, without taking additional precautions, signals may be delivered to a child process created via vfork(). If a signal handler runs in such a child, it can leave traces of its work in the (shared) memory, potentially surprising the parent process. To avoid this, glibc[1] and musl[2] disable all signals (even signals used internally for thread cancellation) before creating a child, but FreeBSD calls vfork() right away. Is this not considered a problem, or is something hidden in vfork() implementation?

    Right, after glancing over our implementation details- this is indeed a problem here. Our manpage indicates that we only block SIGTTOU for SIGTTIN for children in vfork(), and some light testing seems to indicate that's the case. Signal handlers are inherited from the parent, so that's less than stellar.

    1. Another problem with vfork() is potential sharing of address space between processes with different privileges (see Rich Felker's post[3] about this and the previous problem). Is this a valid concern on FreeBSD?

    My initial read-through of the relevant bits seem to indicate that image activation will cause the child process to be allocated a new process space, so it's looking kind of like a 'no' -- I'm outsourcing the answer to this one to someone more familiar with those bits, though, so I'll get back to you.

    1. Does FreeBSD kernel guarantee that when waitpid(WNOHANG) is called at [4], the child is ready for reaping? On Linux, it's not always the case[5].

    I want to say vfork semantics guarantee it- we got to this point, so the child hit an error and _exit(127). I'm double-checking this one, though, to verify the child's properly marked a zombie before we could possibly reschedule the parent.

    @kevans91
    Copy link
    Mannequin

    kevans91 mannequin commented Jan 17, 2019

    I'll be preparing a patch for our posix_spawn's signal handling.

    My mistake in my setuid assessment was pointed out to me- it doesn't seem like a highly likely attack vector, but it is indeed possible.

    If the child errored out, they will indeed be properly reapable at that point due to how vfork is implemented -- any observation to the contrary is to be considered a bug.

    @vstinner
    Copy link
    Member

    Alexey Izbyshev

    So, if we can't change os.execvp() and/or current subprocess behavior, posix_spawnp seems to be ruled out.

    My main motivation for using posix_spawn() is performance. An optimization should not justify to introduce a backward incompatible change. I agree wth posix_spawnp() cannot be used directly in subprocess because of that. But os.posix_spawnp() addition in Python 3.8 remains useful because it allows to use it directly (avoid subprocess).

    A naive emulation of posix_spawnp would be repeatedly calling posix_spawn for each PATH entry, but that's prohibitively expensive.

    It should be compared to the current code. Currently, _posixsubprocess uses a loop calling execv(). I don't think that calling posix_spawn() in a loop until one doesn't fail is more inefficient.

    The worst case would be when applying process attributes and run file actions would dominate performances... I don't think that such case exists. Syscalls like dup2() and close() should be way faster than the final successful execv() in the overall posix_spawn() call. I'm talking about the case when the program exists.

    Iterate over PATH entries and perform a quick check for common exec errors (ENOENT, ENOTDIR, EACCES) manually (e.g. by stat()).

    I dislike this option. There is a high risk of race condition if the program is created, deleted or modified between the check and exec. It can cause subtle bugs, or worse: security vulnerabilities. IMHO the only valid check, to test if a program exists, is to call exec().

    Alexey: Do you want to work on a PR to reimplement the "executable_list" and loop used by subprocess with _posixsubproces? You should keep the latest exception to re-raise it if no posix_spawn() successed. Don't forget to clear the exception on success, to not create a reference cycle:

    commit acb9fa7
    Author: Victor Stinner <victor.stinner@gmail.com>
    Date: Wed Sep 13 10:10:10 2017 -0700

    bpo-31234, socket.create_connection(): Fix ref cycle (bpo-3546)
    

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Jan 17, 2019

    Thank you for the answers, Kyle!

    I'll be preparing a patch for our posix_spawn's signal handling.

    Great!

    My mistake in my setuid assessment was pointed out to me- it doesn't seem like a highly likely attack vector, but it is indeed possible.

    The specific scenario with non-synchronized posix_spawn/setuid is certainly not a good practice and could probably be considered an application bug (such application effectively spawns a child with "random" privileges -- depending on whether setuid() completed before or after vfork()).

    That said, in Linux C libraries protection from that scenario is usually achieved automatically if all signals are blocked before vfork(). This is the result of a Linux-specific detail: at syscall level, setuid() affects a single thread only, but setuid() library function must affect the process as a whole. This forces C libraries to signal all threads when setuid() is called and wait until all threads perform setuid() syscall. This waiting can't complete until vfork() returns (because signals are blocked), so setuid() is effectively postponed. I don't know how setuid() behaves on FreeBSD (so the above may be not applicable at all).

    If the child errored out, they will indeed be properly reapable at that point due to how vfork is implemented -- any observation to the contrary is to be considered a bug.

    Ah, that's good, thanks for the confirmation!

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Jan 17, 2019

    It should be compared to the current code. Currently, _posixsubprocess uses a loop calling execv(). I don't think that calling posix_spawn() in a loop until one doesn't fail is more inefficient.

    The worst case would be when applying process attributes and run file actions would dominate performances... I don't think that such case exists. Syscalls like dup2() and close() should be way faster than the final successful execv() in the overall posix_spawn() call. I'm talking about the case when the program exists.

    I had vfork() used internally by posix_spawn() in mind when I considered repeatedly calling it "prohibitively expensive". While vfork() is much cheaper than fork(), it doesn't mean that its performance is comparable to dup2() and close(). But on systems where posix_spawn is a syscall the overhead could indeed be lesser. Anyway, it should be measured.

    > Iterate over PATH entries and perform a quick check for common exec errors (ENOENT, ENOTDIR, EACCES) manually (e.g. by stat()).

    I dislike this option. There is a high risk of race condition if the program is created, deleted or modified between the check and exec. It can cause subtle bugs, or worse: security vulnerabilities. IMHO the only valid check, to test if a program exists, is to call exec().

    While exec() is of course cleaner, IMHO races don't matter in this case. If our stat()-like check fails, we could as well imagine that it is exec() that failed (while doing the same checks as our stat()), so it doesn't matter what happens with the tested entry afterwards. If the check succeeds, we simply call posix_spawn() that will perform the same checks again. If any race condition occurs, the problem will be detected by posix_spawn().

    Alexey: Do you want to work on a PR to reimplement the "executable_list" and loop used by subprocess with _posixsubproces?

    I'm currently focused on researching whether it's possible to use vfork()-like approach instead of fork() on Linux, so if anyone wants to implement the PR you suggested, feel free to do it.

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Jan 18, 2019

    > * pass_fds: there is not API to mark a fd as inheritable (clear O_CLOEXEC flag)

    POSIX has a bug for this [5]. It's marked fixed, but the current POSIX docs doesn't reflect the changes. The idea is to make posix_spawn_file_actions_adddup2() clear O_CLOEXEC if the source and the destination are the same (unlike dup2(), which would do nothing). musl has already implemented the new behavior [6], but glibc hasn't.

    Update: glibc has implemented it for 2.29 (https://sourceware.org/bugzilla/show_bug.cgi?id=23640).

    @vstinner
    Copy link
    Member

    New changeset f6243ac by Victor Stinner in branch 'master':
    bpo-35537: subprocess can use posix_spawn with pipes (GH-11575)
    f6243ac

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Jan 23, 2019

    Another problem with posix_spawn() on glibc: it doesn't report errors to the parent process when run under QEMU user-space emulation and Windows Subsystem for Linux. This is because starting with commit [1] (glibc 2.25) posix_spawn() relies on address space sharing semantics of clone(CLONE_VM) to report errors, but it's not implemented in QEMU and WSL, so clone(CLONE_VM) and vfork() behave like fork(). See also [2], [3].

    This can be easily demonstrated:
    $ cat test.c
    #include <spawn.h>
    
    int main(int argc, char *argv[], char *envp[]) {
        return posix_spawn(0, "non-existing", 0, 0, argv, envp);
    }
    $ gcc test.c
    $ ./a.out
    $ echo $?
    2
    $ aarch64-linux-gnu-gcc -static test.c
    $ qemu-aarch64 ./a.out
    $ echo $?
    0

    There is no problem with musl (it doesn't rely on address space sharing).

    [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=4b4d4056bb154603f36c6f8845757c1012758158
    [2] https://bugs.launchpad.net/qemu/+bug/1673976
    [3] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04890.html

    @vstinner
    Copy link
    Member

    Another problem with posix_spawn() on glibc: it doesn't report errors to the parent process when run under QEMU user-space emulation and Windows Subsystem for Linux. This is because starting with commit [1] (glibc 2.25) posix_spawn() relies on address space sharing semantics of clone(CLONE_VM) to report errors, but it's not implemented in QEMU and WSL, so clone(CLONE_VM) and vfork() behave like fork(). See also [2], [3].

    Oh, you know a lot of stuff about vfork and posix_spawn, you are impressive :-)

    Is sys.platform equal to 'linux' on WSL? Sorry, I don't know WSL. If it's equal, is it possible to explicitly exclude WSL in the subprocess test, _use_posix_spawn()?

    For QEMU, I was very surprised that a full VM wouldn't implement vfork. So I tried Fedora Rawhide and I didn't notice the bug that you described.

    ---

    $ cat test.c
    #include <spawn.h>
    
    int main(int argc, char *argv[], char *envp[]) {
        return posix_spawn(0, "non-existing", 0, 0, argv, envp);
    }
    $ gcc test.c
    $ ./a.out
    $ echo $?
    2

    In the VM, I get exit status 2 as expected.

    You wrote:

    ---

    $ aarch64-linux-gnu-gcc -static test.c
    $ qemu-aarch64 ./a.out
    $ echo $?
    0

    So the bug only affects "QEMU User Space". I never used that before. I understand that it's a bug in the glibc and in "QEMU User Emulation" which doesn't implement vfork "properly".

    There is no problem with musl (it doesn't rely on address space sharing).

    I'm not interested to support musl at this point because I don't see any obvious way to detect that we are running musl nor get the musl version.

    The history of this issue shows the posix_spawn() only works in some very specific cases, so we have to be careful and only opt-in for specific libc, on specific platform with specific libc version (read at runtime if possible).

    @vstinner
    Copy link
    Member

    I discussed the following issue with Florian Weimer (who works on the glibc):
    ---

    $ cat test.c
    #include <spawn.h>
    
    int main(int argc, char *argv[], char *envp[]) {
        return posix_spawn(0, "non-existing", 0, 0, argv, envp);
    }
    $ aarch64-linux-gnu-gcc -static test.c
    $ qemu-aarch64 ./a.out
    $ echo $?
    0

    While the parent gets a "success", in subprocess we get the pid and later read the exit status of the child process. Even if posix_spawn() doesn't report a failure, waitpid() will still report a failure. It's a subtle behavior change. I'm not sure yet if it's acceptable for Python in term of semantics:

    • without posix_spawn(), subprocess.Popen constructor raises FileNotFoundError: the parent knows that the execution failed because the program doesn't exist
    • with posix_spawn() with the vfork bug, subprocess.Popen succeed but Popen.wait() returns something like exit code 127. The parent doesn't know if the child process explcitly used exit(127) or if execv() failed.

    Does the difference matters? The bug only occurs in some very specific cases:

    • WSL
    • QEMU User Emulation

    Are these use cases common enough to block the whole idea of using posix_spawn() on Linux. I don't think so. I really want to use posix_spawn() for best performances! Moreover, I expect that glibc implementation of posix_spawn() is safer than Python _posixsubprocess. The glibc has access to low-level stuff like it's internal signals, cancellation points, etc. _posixsubprocess is more generic and doesn't worry about such low-level stuff, whereas they might cause bad surprised.

    @vstinner
    Copy link
    Member

    I wrote PR 11668 to propose to *document* the subtle behavior change when posix_spawn() is used on QEMU User Emulation and WSL platforms, rather than trying to opt-out for posix_spawn() on these platforms.

    @izbyshev
    Copy link
    Mannequin

    izbyshev mannequin commented Jan 25, 2019

    Is sys.platform equal to 'linux' on WSL? Sorry, I don't know WSL. If it's equal, is it possible to explicitly exclude WSL in the subprocess test, _use_posix_spawn()?

    I don't have immediate access to WSL right now, but I'll try to get one and investigate what we have there wrt. posix_spawn() and vfork().

    So the bug only affects "QEMU User Space". I never used that before.

    Yes, I was specifically referring to QEMU user-space. One use case that heavily relies on it is Tizen. Its packages are built in a chroot jail containing the build environment with binaries native to the target architecture, making an illusion that packages are built on the target system and are not cross-compiled. So the binaries are run under QEMU user-space emulation. But in reality, because of unacceptable performance of binary translation, many frequently-used binaries, like coreutils and compilers, are replaced with host-native binaries in a way transparent to the build system (so it has no idea whether it runs host-native or target-native binaries).

    Does the difference matters? The bug only occurs in some very specific cases:

    • WSL
    • QEMU User Emulation

    Are these use cases common enough to block the whole idea of using posix_spawn() on Linux. I don't think so. I really want to use posix_spawn() for best performances! Moreover, I expect that glibc implementation of posix_spawn() is safer than Python _posixsubprocess. The glibc has access to low-level stuff like it's internal signals, cancellation points, etc. _posixsubprocess is more generic and doesn't worry about such low-level stuff, whereas they might cause bad surprised.

    It's true that a C library is in a better position to implement something like posix_spawn(), but I still think that vfork()/exec() is worth to consider at least on Linux. See bpo-35823, which should also work under QEMU user-mode and WSL (but needs testing).

    @vstinner
    Copy link
    Member

    vstinner commented Feb 1, 2019

    New changeset 80c5dfe by Victor Stinner (Joannah Nanjekye) in branch 'master':
    bpo-35537: Add setsid parameter to os.posix_spawn() and os.posix_spawnp() (GH-11608)
    80c5dfe

    @vstinner
    Copy link
    Member

    vstinner commented Feb 1, 2019

    New changeset 1e39b83 by Victor Stinner in branch 'master':
    bpo-35537: Skip test_start_new_session() of posix_spawn (GH-11718)
    1e39b83

    @vstinner
    Copy link
    Member

    vstinner commented Feb 1, 2019

    New changeset 325e4ba by Victor Stinner in branch 'master':
    bpo-35537: Fix function name in os.posix_spawnp() errors (GH-11719)
    325e4ba

    @vstinner
    Copy link
    Member

    New changeset d7befad by Victor Stinner in branch 'master':
    bpo-35537: Document posix_spawn() change in subprocess (GH-11668)
    d7befad

    @vstinner
    Copy link
    Member

    vstinner commented Jun 5, 2019

    Python 3.8 beta 1 is released with this feature. I documented the change. Thnaks everybody who was involved to make this possible. I close the issue. Open new issues for follow-up.

    @vstinner vstinner closed this as completed Jun 5, 2019
    @vstinner
    Copy link
    Member

    New changeset 5884043 by Victor Stinner in branch 'master':
    bpo-35537: Rewrite setsid test for os.posix_spawn (GH-11721)
    5884043

    @miss-islington
    Copy link
    Contributor

    New changeset e696b15 by Miss Islington (bot) in branch '3.8':
    bpo-35537: Rewrite setsid test for os.posix_spawn (GH-11721)
    e696b15

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants