use os.posix_spawn in subprocess #79718

nanjekyejoannah · 2018-12-19T16:47:59Z

BPO	35537
Nosy	@gpshead, @vstinner, @serhiy-storchaka, @koobs, @izbyshev, @pablogsal, @miss-islington, @nanjekyejoannah, @kevans91
PRs	bpo-35537: use os.posix_spawn in subprocess #11242 bpo-35537: subprocess uses os.posix_spawn in some cases #11452 bpo-35537: subprocess can use posix_spawn with pipes #11575 bpo-35537: subprocess can now use os.posix_spawnp #11579 Revert "bpo-35537: subprocess can now use os.posix_spawnp (GH-11579)" #11582 bpo-35537: Add setsid parameter to os.posix_spawn() and os.posix_spawnp(). #11608 bpo-35537: Document posix_spawn() change in subprocess #11668 bpo-35537: Fix docstring typos in _use_posix_spawn() #11684 subprocess: close pipes/fds by using ExitStack #11686 bpo-35537: Skip test_start_new_session() of posix_spawn #11718 [WIP] bpo-35537: Fix function name in os.posix_spawnp() errors #11719 bpo-35537: Rewrite setsid test for os.posix_spawn #11721 bpo-35537: Change subprocess to use posix_spawnp instead of posix_spawn #11917 [3.8] bpo-35537: Rewrite setsid test for os.posix_spawn (GH-11721) #14093 bpo-35537: Add a comment to _Py_RestoreSignals() #18792
Files	subprocess_bench.py subprocess_bench_stdout.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/vstinner'
closed_at = <Date 2019-06-05.08:27:39.280>
created_at = <Date 2018-12-19.16:47:59.385>
labels = ['3.8', 'type-feature', 'library']
title = 'use os.posix_spawn in subprocess'
updated_at = <Date 2020-03-05.16:41:09.872>
user = 'https://github.com/nanjekyejoannah'

bugs.python.org fields:

activity = <Date 2020-03-05.16:41:09.872>
actor = 'vstinner'
assignee = 'vstinner'
closed = True
closed_date = <Date 2019-06-05.08:27:39.280>
closer = 'vstinner'
components = ['Library (Lib)']
creation = <Date 2018-12-19.16:47:59.385>
creator = 'nanjekyejoannah'
dependencies = []
files = ['48011', '48057']
hgrepos = []
issue_num = 35537
keywords = ['patch', 'patch', 'patch']
message_count = 61.0
messages = ['332151', '332204', '332210', '332212', '332213', '332216', '332217', '332219', '332222', '332223', '332235', '332236', '332237', '332238', '332239', '332266', '332270', '332272', '332393', '332565', '333123', '333126', '333127', '333131', '333571', '333629', '333646', '333656', '333657', '333738', '333739', '333740', '333744', '333745', '333749', '333771', '333794', '333795', '333799', '333800', '333803', '333805', '333820', '333825', '333839', '333889', '333902', '333994', '334268', '334269', '334288', '334292', '334305', '334337', '334661', '334662', '334685', '340838', '344683', '345617', '345621']
nosy_count = 9.0
nosy_names = ['gregory.p.smith', 'vstinner', 'serhiy.storchaka', 'koobs', 'izbyshev', 'pablogsal', 'miss-islington', 'nanjekyejoannah', 'kevans']
pr_nums = ['11242', '11452', '11575', '11579', '11582', '11608', '11668', '11684', '11686', '11718', '11719', '11721', '11917', '14093', '18792']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue35537'
versions = ['Python 3.8']

nanjekyejoannah · 2018-12-19T16:47:59Z

On Linux, posix_spawn() uses vfork() instead of fork() which blocks
the parent process. The child process executes exec() early and so we
don't pay the price of duplicating the memory pages (the thing which
tracks memory pages of a process).

On macOS, posix_spawn() is a system call, so the kernel is free to use
fast-paths to optimize it as they want.

posix_spawn() is faster but it's also safer: it allows us to do a lot of
"actions" before exec(), before executing the new program. For
example, you can close files and control signals. Again, on macOS,
these actions are "atomic" since it's a system call. On Linux, glibc uses a very good implementation which has no race condition.

Currently, Python uses a C extension _posixsubprocess which more or
less reimplements posix_spawn(): close file descriptors, make some
file descriptors inheritable or not, etc. It is very tricky to write
correct code: code run around fork() is very fragile. In theory, the
POSIX specification only allows to use "async-signal-safe" functions
after fork()...

So it would be great to avoid _posixsubprocess whenever possible for
(1) speed (2) correctness.

vstinner · 2018-12-20T09:35:24Z

"posix_spawn() is faster"

This assumption needs a benchmark!

I ran a benchmark on Linux (Fedora 29, Linux kernel 4.19.9-300.fc29.x86_64, glibc 2.28) using attached subprocess_bench.py.

I rebased the PR 11242 on master and run the benchmark in virtual environment:

git co pr/11242
# current branch: PR 11242
make distclean
./configure
make
./python -m venv env
env/bin/python -m pip install perf
env/bin/python subprocess_bench.py -v -o posix_spawn.json
git co master
env/bin/python subprocess_bench.py -v -o fork_exec.json
env/bin/python -m perf compare_to fork_exec.json posix_spawn.json

The result is quite explicit: the PR makes subprocess.Popen 61x faster!!!

$ env/bin/python -m perf compare_to fork_exec.json posix_spawn.json 
Mean +- std dev: [fork_exec] 27.1 ms +- 0.4 ms -> [posix_spawn] 447 us +- 163 us: 60.55x faster (-98%)

That's the best case:

The parent process (Python) allocated 2 GiB of memory: that's not uncommon for large application. On OpenStack for example, it's common that a process takes more than 1 GiB.
The child process has a short execution time and allocates few memory.

On my config (Fedora 29, Linux kernel 4.19.9-300.fc29.x86_64, glibc 2.28), os.posix_spawn() uses vfork():

$ strace -o trace ./python -c 'import subprocess; subprocess.run(["/bin/true"], close_fds=False, restore_signals=False)'
$ grep clone trace
clone(child_stack=0x7fab28ac0ff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) = 23073

I guess that the 61x speedup mostly comes from vfork().

See also bpo-34663 for previous discussion about vfork() and os.posix_spawn(). In short, the glibc is smart and detects when vfork() can be used or not.

vstinner · 2018-12-20T10:50:05Z

Benchmark on macOS Mojave (10.14.2), x86_64:

$ ./python.exe -m perf compare_to fork_exec.json posix_spawn.json 
Mean +- std dev: [fork_exec] 2.79 ms +- 0.06 ms -> [posix_spawn] 1.64 ms +- 0.03 ms: 1.70x faster (-41%)

The speedup is "only" 1.7x faster, it's less impressive. Maybe fork()+exec() is more efficient on macOS than on Linux.

vstinner · 2018-12-20T11:01:06Z

Benchmark on FreeBSD:

$ env/bin/python -m perf compare_to fork_exec.json posix_spawn.json 
Mean +- std dev: [fork_exec] 52.8 ms +- 4.7 ms -> [posix_spawn] 499 us +- 45 us: 105.91x faster (-99%)

Wow! 106x faster!

vstinner · 2018-12-20T11:05:05Z

See also:

"How a fix in Go 1.9 sped up our Gitaly service by 30x" (hint: posix_spawn)
https://about.gitlab.com/2018/01/23/how-a-fix-in-go-19-sped-up-our-gitaly-service-by-30x/

fork/exec vs. posix_spawn benchmarks:
https://github.com/rtomayko/posix-spawn#benchmarks

serhiy-storchaka · 2018-12-20T11:15:37Z

We need to add tests for all corner cases. I.e. disabling any of conditions for posix_spawn should cause the failure.

We need to do also something with PyOS_BeforeFork/PyOS_AfterFork_Child/PyOS_AfterFork_Parent.

vstinner · 2018-12-20T11:19:54Z

We need to add tests for all corner cases. I.e. disabling any of conditions for posix_spawn should cause the failure.

Do you mean running tests twice, one with posix_spawn(), one without?

We need to do also something with PyOS_BeforeFork/PyOS_AfterFork_Child/PyOS_AfterFork_Parent.

Can you please elaborate? posix_spawn() may or may not use fork() internally.

serhiy-storchaka · 2018-12-20T11:34:32Z

Do you mean running tests twice, one with posix_spawn(), one without?

I mean that after writing tests they can be tested manually by disabling conditions for posix_spawn one by one. I.e. some tests should fail if remove "stdout is None" and some tests should fail if remove "not close_fds", etc.

Can you please elaborate? posix_spawn() may or may not use fork() internally.

_posixsubprocess.fork_exec() calls PyOS_BeforeFork/PyOS_AfterFork_Child/PyOS_AfterFork_Parent. If use os.posix_spawn(), these calls will be omitted. This is a behavior change. We should either call these functions manually before/after os.posix_spawn() (but I do not know what to do with PyOS_AfterFork_Child), or disable using os.posix_spawn() if non-trivial callbacks were added, or use other methods for calling registered callbacks. But the stdlib already adds callbacks in the threading and random modules.

izbyshev · 2018-12-20T12:42:49Z

Serhiy, PyOS_* functions are called only if preexec_fn != None. But it will never be possible to implement support for preexec_fn (and some other subprocess features, e.g. close_fds) if processes are run via posix_spawn, so I don't see why anything should be done with those functions.

serhiy-storchaka · 2018-12-20T12:54:21Z

Oh, nice! So this is not an obstacle for using os.posix_spawn().

izbyshev · 2018-12-20T15:50:47Z

Victor and Joannah, thanks for working on adding vfork() support to subprocess. Regarding speedups in the real world, I can share a personal anecdote. Back at the time when AOSP was built with make (I think it was AOSP 5) I've observed ~2x slowdown (or something close to that) if fork() is used instead of vfork() in make. This is the slowdown of the *whole* build (not just process creation time), and it's dramatic given the amount of pure computation (compilation of C++/Java code) involved in building AOSP. The underlying reason was that make merged all AOSP subproject makefiles into a single gigantic one, so make consumed more than 1 GB of RAM, and each shell invocation in recipes resulted in copying a large page table.

That said, I'm not sure that the chosen approach of adding posix_spawn() "fast paths" for particular combinations of arguments is an optimal way to expose vfork() benefits to most users. For example, close_fds is True by default in subprocess, and I don't think there is a reliable way to use posix_spawn() in this case. So users would need to use "magic" combinations of arguments to subprocess.Popen to benefit from vfork(). Another example is "cwd" parameter: there is no standard posix_spawn file action to change the working directory in the child, so such a seemingly trivial operation would trigger the "slow" fork-exec path.

Another approach would be to use vfork-exec instead fork-exec in _posixsubprocess. I know that previous discussions were resolved against vfork(), but I'm still not convinced and suggest to reevaluate those decisions again. Some arguments:

While POSIX contain scare-wording forbidding to do pretty anything in a vfork-spawned child, real systems where posix_spawn() is not a system call call vfork() and then execute all specified actions in the child. Cases where vfork() is avoided seem to be a quality-of-implementation issue, not a fundamental issue (for example, until recently glibc used fork() in some cases because of heap memory allocations, but they could be avoided). In practice, there is no problem with calling at least async-signal-safe functions in vfork-children, and there is no technical reason why there would be any problem.
_posixsubprocess is already very careful and calls only async-signal-safe functions in almost all cases (an obvious exception is preexec_fn, which is discouraged anyway).
Our use case for vfork() is restricted, so some concerns don't apply to us. For example, the setuid() problem outlined in Rich Felker's post (https://ewontfix.com/7) doesn't seem to apply. The problem with signal handlers from the same post should be possible to avoid except for preexec_fn, but we'd have to fallback to fork() in this case anyway due to memory allocation.
In the standard example of fork() unsafety in multithreaded processes (state of memory-based locks is copied to the child, and there could be nobody to unlock them), vfork() is *safer* than fork() because it still shares memory with the parent, and all threads other than the parent one are running. In particular, it should be perfectly safe to use any memory allocator that protects its state with something like futexes and atomics in a vfork-child even if other threads of the parent are using it concurrently.
Other language runtimes are already using vfork(). Java has been doing it for ages, and Victor referenced Go.

Some possible counter-arguments:

We seem to be reimplementing posix_spawn(). In fact, due to (2) above, we can fully reuse the existing fork-exec code, with additional tweaks that doesn't seem hard at this point.
My points above are based on Linux, but I don't know much about other Unix-likes, so in theory additional complications could exist. I can't, however, imagine any fundamental complication.

In the end, I think that migrating subprocess to vfork-exec would have more impact for users than adding "fast paths" and have consistent performance regardless of subprocess.Popen arguments (with few exceptions). Please consider it. Thanks!

vstinner · 2018-12-20T16:05:57Z

In the end, I think that migrating subprocess to vfork-exec would have more impact for users than adding "fast paths" and have consistent performance regardless of subprocess.Popen arguments (with few exceptions). Please consider it. Thanks!

I'm open to experiment to use vfork() in _posixsubprocess, but I still want to use posix_spawn():

it's standard
it's safer
it can be more efficient

On macOS, posix_spawn() is implemented as a syscall.

Using vfork() can cause new issues: that's why there is a POSIX_SPAWN_USE_VFORK flag (the caller had to explicitly enable it). See also bpo-34663 the history of vfork in posix_spawn() in the glibc. I'm not against using it, but I'm not sure that vfork() can be used in all cases, when all subprocess options are used. Maybe you will end up with the same state: vfork() used or not depending on subprocess.Popen options...

For example, close_fds is True by default in subprocess, and I don't think there is a reliable way to use posix_spawn() in this case.

Since PEP-446 has been implemented in Python 3.4, I would be interested to revisit close_fds default value: don't close by default... But I'm not sure if it's *safe* to change the default... At least, we may promote close_fds=False in the doc explaining that it's faster and should be safer in the common case.

Another example is "cwd" parameter: there is no standard posix_spawn file action to change the working directory in the child, so such a seemingly trivial operation would trigger the "slow" fork-exec path.

I don't think that it's a bug that subprocess have different performances depending on the flags. It would be a bug if the behavior would be diffrent. I don't think that it's the case.

izbyshev · 2018-12-20T16:30:46Z

I'm open to experiment to use vfork() in _posixsubprocess
Are you going to do experiments? If not, I can try to do some in early January.

Using vfork() can cause new issues: that's why there is a POSIX_SPAWN_USE_VFORK flag (the caller had to explicitly enable it). See also bpo-34663 the history of vfork in posix_spawn() in the glibc.

I've studied that, and that's what I referred to as "quality-of-implementation" problem. After glibc devs removed heap allocations and tweaked some other things, they could use vfork() in all cases. "musl" libc never had those problems and used vfork() from the beginning.

vstinner · 2018-12-20T16:35:43Z

Are you going to do experiments? If not, I can try to do some in early January.

I'm only interested to use posix_spawn(). If you want to experiment vfork() in _posixsubprocess: please go ahead!

I've studied that, and that's what I referred to as "quality-of-implementation" problem. After glibc devs removed heap allocations and tweaked some other things, they could use vfork() in all cases. "musl" libc never had those problems and used vfork() from the beginning.

_posixsubprocess shouldn't allocate memory on the heap *after* fork(), only before. If it does, it's a bug. Last time I checked _posixsubprocess, it looked to be written properly.

But the subprocess module has the crazy 'preexec_fn' option which makes things complicated... Maybe we should deprecate (and then remove) it, it's dangerous and it can be avoided in many cases. Using a launcher program is safer than relying on preexec_fn. In OpenStack, I reimplemented prlimit in pure Python. It's an example of launcher to set an option in the child process:
https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py

vstinner · 2018-12-20T16:42:39Z

Serhiy Storchaka, Alexey Izbyshev: I read and understood your valid concerns.

restore_signals=True will be quickly implemented, and so posix_spawn() code path will be tested by more tests of test_subprocess. If not, I will ensure that missing tests will be added.

Enhance _posixsubprocess to use vfork() is an interesting project, but IMHO it's complementary and doesn't replace all advantages of posix_spawn().

I am going to merge the PR tomorrow, except if someone sees a good reason to not merge it. I prefer to merge the PR early in the 3.8 development cycle to have more time to handle any issue if someone notice bugs. If something goes badly, we will be able to easily revert the change. Don't worry, I like to revert changes ;-)

Again, I'm mentoring Joannah who is learning Python, so I prefer to move step by step (with small steps :-)). We will support more and more subprocess.Popen options.

vstinner · 2018-12-20T21:45:35Z

I wasn't sure how posix_spawn() handles file descriptors of standard streams with O_CLOEXEC flag set.

I wrote a test. Result: _posixsubprocess and os.posix_spawn() have the same behavior! If fd 2 (stderr) is marked with O_CLOEXEC, the fd 2 is closed in the child process, as expected. (There is no black magic to keep it open.)

$ cat x.py 
import subprocess, sys, os
args = [sys.executable, 'y.py']
os.set_inheritable(2, False)
subprocess.run(args, close_fds=False, restore_signals=False)

$ cat y.py 
import os
def fd_valid(fd):
    try:
        os.fstat(fd)
        return True
    except OSError:
        return False
for fd in (0, 1, 2):
    print("fd %s valid? %s" % (fd, fd_valid(fd)))

$ python3 x.py  # unpatched
fd 0 valid? True
fd 1 valid? True
fd 2 valid? False

$ ./python x.py # patched, use posix_spawn()
fd 0 valid? True
fd 1 valid? True
fd 2 valid? False

vstinner · 2018-12-20T21:58:46Z

Options that should be easy to support later:

relative executable: use shutil.which() or expose C posix_spawnp() function as os.posix_spawnp()
env: just pass it as the 3rd argument of os.posix_spawn() (trivial patch)
restore_signals: use setsigdef (Joannah has a patch)

I don't see how to support following options:

preexec_fn: I really hate this feature...
cwd
start_new_session
close_fds: there is posix_spawn_file_actions_addclose(), but I'm not sure that we can create a list of file descriptor in a reliable way, since a different thread can open a FD in the meanwhile... Currently, _posixsubprocess iterates on /proc/self/fd/ *after* the fork, when there is a single thread.
pass_fds: there is not API to mark a fd as inheritable (clear O_CLOEXEC flag). We cannot just change the O_CLOEXEC temporarily since a different thread can spawn a child process "at the same time". The GIL doesn't protect C threads which don't use the GIL.

For pipes like stdout=PIPE or stdout=fd, I didn't look yet how to implement that properly.

Maybe one way to work around the pass_fds limitation in an applicaton is to mark the FDs to pass as inheritable (os.set_inheritable(fd, True)) and use close_fds=False:

fd = ...
subprocess.Popen(..., pass_fds={fd})
os.close(fd)

would become:

fd = ...
os.set_inheritable(fd, True)
subprocess.Popen(..., close_fds=False)
os.close(fd)

But this pattern is not thread if the application has other threads. So that should be the responsibility of the developer, not of Python, to write "unsafe" code" which requires the knownledge of the application behavior.

vstinner · 2018-12-20T22:11:24Z

I am going to merge the PR tomorrow, except if someone sees a good reason to not merge it.

I changed my mind and I decided to wait until I'm back from holiday instead :-)
#11242 (comment)

The function is more tricky than what I expected!

izbyshev · 2018-12-23T19:12:46Z

cwd

posix_spawn_file_actions_addchdir_np() is scheduled for glibc 2.29 [1] and exists in Solaris [2] (though its suffix indicates that it's "non-portable" -- not in POSIX). POSIX also has a bug for this [7].

start_new_session

posix_spawnattr_setflags() supports POSIX_SPAWN_SETSID in musl [3] and glibc 2.26 [4] and is scheduled to be added into POSIX. Solaris also has POSIX_SPAWN_SETSID_NP.

pass_fds: there is not API to mark a fd as inheritable (clear O_CLOEXEC flag)

POSIX has a bug for this [5]. It's marked fixed, but the current POSIX docs doesn't reflect the changes. The idea is to make posix_spawn_file_actions_adddup2() clear O_CLOEXEC if the source and the destination are the same (unlike dup2(), which would do nothing). musl has already implemented the new behavior [6], but glibc hasn't.

Another alternative (also described in the POSIX bug report) is to remove O_CLOEXEC via a side-effect of posix_spawn_file_actions_adddup2() to a dummy descriptor and then dup2() it back again. This is not correct in general case -- record locks (fcntl(F_SETLK)) associated with the file referred to by the descriptor are released on the second dup2() -- but in our case those locks are not preserved for the new process anyway, so this is not a problem. I'm not aware of other correctness problems with this, but one needs to check for platforms where a file descriptor may have associated flags other than O_CLOEXEC (such flags would be cleared by dup2).

For pipes like stdout=PIPE or stdout=fd, I didn't look yet how to implement that properly

You'd need to reimplement the corresponding logic from _posixsubprocess. Basically, posix_spawn_file_actions_adddup2() and posix_spawn_file_actions_addclose() do the job, but you need to avoid corner cases regarding low fds (0, 1, 2) that might have been reused for pipes.

I don't know correct solutions for close_fds. It is already hard: it's implemented in async-safe way only for Linux now (and only if /proc is available) or if /proc is not available and the brute-force implementation is used.

And of course, in theory you could implement any of the above by a helper binary as you did for prlimit.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=17405
[2] https://docs.oracle.com/cd/E88353_01/html/E37843/posix-spawn-file-actions-addchdir-np-3c.html
[3] https://git.musl-libc.org/cgit/musl/commit/?id=bb439bb17108b67f3df9c9af824d3a607b5b059d
[4] https://www.sourceware.org/ml/libc-alpha/2017-08/msg00010.html
[5] http://austingroupbugs.net/view.php?id=411
[6] https://git.musl-libc.org/cgit/musl/commit/?id=6fc6ca1a323bc0b6b9e9cdc8fa72221ae18fe206
[7] http://austingroupbugs.net/view.php?id=1208

gpshead · 2018-12-26T22:36:22Z

Thanks for all your research and reference links on this! As a _posixsubprocess maintainer, I am not against either posix_spawn or vfork being used directly in the future when feasible.

A challenge, especially with platform specific vfork, is making sure we understand exactly which platforms it can work properly on and checking for those both at compile time _and_ runtime (running kernel version and potentially the runtime libc version?) so that we can only use it in situations we are sure it is supposed to behave as desired in. My guiding philosophy: Be conservative on choosing when such a thing is safe to use.

vstinner · 2019-01-06T22:46:47Z

Gregory:

Thanks for all your research and reference links on this! As a _posixsubprocess maintainer, I am not against either posix_spawn or vfork being used directly in the future when feasible.

Are you against using posix_spawn() in subprocess? Or only in _posixsubprocess?

--

Alexey Izbyshev identified a difference between _posixsubprocess and posix_spawn() PR 11242: error handling depends a lot on the libc implementation.
#11242 (comment)

On recent glibc, posix_spawn() fails (non-zero return value) and set errno to ENOENT if the executed program doesn't exist... but not all platforms have this behavior.
On FreeBSD, if setting posix_spawn() "attributes" or execute posix_spawn() "file actions" fails, posix_spawn() succeed but the child process exits immediately with exit code 127 without trying to call execv(). If execv() fails, posix_spawn() succeed, but the child process exit with exit code 127.
The worst seems to be: "In my test on Ubuntu 14.04, ./python -c "import subprocess; subprocess.call(['/xxx'], close_fds=False, restore_signals=False)" silently returns with zero exit code."

execv() can fail for a lot of different reasons. Extract of Linux execve() manual page:
---
E2BIG The total number of bytes in the environment (envp) and argument list (argv) is too large.

   EACCES Search permission is denied on a component of the path prefix of filename or the name of a script interpreter.  (See also
          path_resolution(7).)

   EACCES The file or a script interpreter is not a regular file.

   EACCES Execute permission is denied for the file or a script or ELF interpreter.

   EACCES The filesystem is mounted noexec.

   EAGAIN (since Linux 3.1)
          Having  changed  its  real  UID  using one of the set*uid() calls, the caller was—and is now still—above its RLIMIT_NPROC
          resource limit (see setrlimit(2)).  For a more detailed explanation of this error, see NOTES.

   EFAULT filename or one of the pointers in the vectors argv or envp points outside your accessible address space.

   EINVAL An ELF executable had more than one PT_INTERP segment (i.e., tried to name more than one interpreter).

   EIO    An I/O error occurred.

   EISDIR An ELF interpreter was a directory.

   ELIBBAD
          An ELF interpreter was not in a recognized format.

   ELOOP  Too many symbolic links were encountered in resolving filename or the name of a script or ELF interpreter.

   ELOOP  The maximum recursion limit was reached during  recursive  script  interpretation  (see  "Interpreter  scripts",  above).
          Before Linux 3.8, the error produced for this case was ENOEXEC.

   EMFILE The per-process limit on the number of open file descriptors has been reached.

   ENAMETOOLONG
          filename is too long.

   ENFILE The system-wide limit on the total number of open files has been reached.

   ENOENT The  file  filename or a script or ELF interpreter does not exist, or a shared library needed for the file or interpreter
          cannot be found.

   ENOEXEC
          An executable is not in a recognized format, is for the wrong architecture, or has some other format error that means  it
          cannot be executed.

   ENOMEM Insufficient kernel memory was available.

   ENOTDIR
          A component of the path prefix of filename or a script or ELF interpreter is not a directory.

   EPERM  The  filesystem  is  mounted  nosuid, the user is not the superuser, and the file has the set-user-ID or set-group-ID bit
          set.

   EPERM  The process is being traced, the user is not the superuser and the file has the set-user-ID or set-group-ID bit set.

   EPERM  A "capability-dumb" applications would not obtain the full set of permitted capabilities granted by the executable  file.
          See capabilities(7).

   ETXTBSY
          The specified executable was open for writing by one or more processes.

---

Simply exit with exit code 127 (FreeBSD behavior) doesn't allow to know if the program have been executed or not.

For example, "git bisect run" depends on the exit code: exit code 127 means "command not found". But with posix_spawn(), exit code 127 can means something else...

I'm disappointed by the qualify of the posix_spawn() implementation on FreeBSD and old glibc...

--

I'm now confused.

Should we still use posix_spawn() on some platforms? For example, posix_spawn() seems to have a well defined error reporting according to its manual page:
https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/posix_spawn.2.html

"""
RETURN VALUES
If the pid argument is NULL, no pid is returned to the calling process;
if it is non-NULL, then posix_spawn() and posix_spawnp() functions return
the process ID of the child process into the pid_t variable pointed to by
the pid argument and return a 0 on success. If an error occurs, they
return a non-zero error code as the function return value, and no child
process is created.

ERRORS
The posix_spawn() and posix_spawnp() functions will fail and return to
the calling process if:

 [EINVAL]           The value specified by file_actions or attrp is
                    invalid.

 [E2BIG]            The number of bytes in the new process's argument list
                    is larger than the system-imposed limit.  This limit
                    is specified by the sysctl(3) MIB variable
                    KERN_ARGMAX.

 [EACCES]           Search permission is denied for a component of the
                    path prefix.

 [EACCES]           The new process file is not an ordinary file.

 [EACCES]           The new process file mode denies execute permission.

 [EACCES]           The new process file is on a filesystem mounted with
                    execution disabled (MNT_NOEXEC in <sys/mount.h>).

 [EFAULT]           The new process file is not as long as indicated by
                    the size values in its header.

 [EFAULT]           Path, argv, or envp point to an illegal address.

 [EIO]              An I/O error occurred while reading from the file sys-tem. system.
                    tem.

 [ELOOP]            Too many symbolic links were encountered in translat-ing translating
                    ing the pathname.  This is taken to be indicative of a
                    looping symbolic link.

 [ENAMETOOLONG]     A component of a pathname exceeded {NAME_MAX} charac-ters, characters,
                    ters, or an entire path name exceeded {PATH_MAX} char-acters. characters.
                    acters.

 [ENOENT]           The new process file does not exist.

 [ENOEXEC]          The new process file has the appropriate access per-mission, permission,
                    mission, but has an unrecognized format (e.g., an
                    invalid magic number in its header).

 [ENOMEM]           The new process requires more virtual memory than is
                    allowed by the imposed maximum (getrlimit(2)).

 [ENOTDIR]          A component of the path prefix is not a directory.

 [ETXTBSY]          The new process file is a pure procedure (shared text)
                    file that is currently open for writing or reading by
                    some process.

"""

Would it be reasonable to use posix_spawn() but only on platforms where the implementation is known to be "good"?

Good would mean that execv() and posix_spawn() errors (setting attributes or file actions failures) are properly reported to the parent process.

For example, use posix_spawn() in subprocess on "recent" glibc (which minimum version?) and macOS (where it's a syscall)?

vstinner · 2019-01-07T00:13:23Z

I wrote PR 11452 which is based on PR 11242 but adds support for 'env' and 'restore_signals' parameters and checks the operating system and libc version to decide if posix_spawn() can be used by subprocess.

In my implementation, posix_spawn() is only used on macOS and glibc 2.26 or newer (and glibc 2.24 and newer on Linux).

According to Alexey Izbyshev, musl should be safe as well, but I don't know how to test musl on my Fedora, nor how to check if Python is linked to musl, nor what is the minimum musl version which is safe.

vstinner · 2019-01-07T00:15:40Z

"The native spawn(2) system introduced in Oracle Solaris 11.4 shares a lot of code with the forkx(2) and execve(2)."
https://blogs.oracle.com/solaris/posix_spawn-as-an-actual-system-call

vstinner · 2019-01-07T00:49:49Z

I created bpo-35674: "Expose os.posix_spawnp()" to later support executable with no directory in subprocess.

izbyshev · 2019-01-13T21:50:38Z

On FreeBSD, if setting posix_spawn() "attributes" or execute posix_spawn() "file actions" fails, posix_spawn() succeed but the child process exits immediately with exit code 127 without trying to call execv(). If execv() fails, posix_spawn() succeed, but the child process exit with exit code 127.

The worst seems to be: "In my test on Ubuntu 14.04, ./python -c "import subprocess; subprocess.call(['/xxx'], close_fds=False, restore_signals=False)" silently returns with zero exit code."

To clarify, in the Ubuntu example only the parent process returns with zero exit code. The child exits with 127, so the behavior is the same as on FreeBSD, as demonstrated by check_call():

$ ./python -c "import subprocess; subprocess.check_call(['/xxx'], close_fds=False, restore_signals=False)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/izbyshev/workspace/cpython/Lib/subprocess.py", line 348, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/xxx']' returned non-zero exit status 127.

According to Alexey Izbyshev, musl should be safe as well, but I don't know how to test musl on my Fedora, nor how to check if Python is linked to musl, nor what is the minimum musl version which is safe.

musl doesn't have an equivalent of __GLIBC__ macro by design as explained in its FAQ [1], and the same FAQ suggests to use something like "#ifndef __GLIBC__ portable_code() #else glibc_code()". So far musl-related fixes followed this advice (see PR 9288, PR 9224), but in this case it doesn't seem like a good idea to me. POSIX allows asynchronous error notification, so "portable" code shouldn't make assumptions about that, and technically old glibc and FreeBSD are conforming implementations, but not suitable as a replacement for _posixsubprocess.

Maybe we could use configure-time musl detection instead? It seems like config.guess has some support for musl[2], but it uses "ldd" from PATH and hence appears to work out-of-the-box only on musl-based distros like Alpine. See also [3].

Regarding the minimum musl version, in this case it's not important since the relevant change was committed in 2013[4], and given that musl is relatively young, such old versions shouldn't have users now.

[1] https://wiki.musl-libc.org/faq.html
[2]

cpython/config.guess

Line 154 in 5bb146a

# If ldd exists, use it to detect musl libc.

[3] RcppCore/Rcpp#449
[4] https://git.musl-libc.org/cgit/musl/commit/?id=fb6b159d9ec7cf1e037daa974eeeacf3c8b3b3f1

vstinner · 2019-01-14T16:50:54Z

https://wiki.musl-libc.org/faq.html

"""
Q: Why is there no __MUSL__ macro?

It’s a bug to assume a certain implementation has particular properties rather than testing. So far, every time somebody’s asked for this with a particular usage case in mind, the usage case was badly wrong, and would have broken support for the next release of musl. The official explanation: http://openwall.com/lists/musl/2013/03/29/13
"""

IMHO that's wrong. A software like Python heavily rely on the *exact* implementation of a libc.

https://github.com/python/cpython/pull/9224/files looks like a coarse heuristic to detect musl for example.

Until muscl decides to provide an "#ifdef __MUSL__"-like or any way that it's musl, I propose to not support musl: don't use os.posix_spawn() but _posixsubprocess.

vstinner · 2019-01-16T22:08:25Z

Oh... Using directly posix_spawnp() in subprocess to locate the executable in the PATH introduces subtle behavior changes...
#11579

vstinner · 2019-01-16T22:14:51Z

One of the issue that I have with using posix_spawn() is that the *exact* behavior of subprocess is not properly defined by test_subprocess. Should we more more tests, or document that the exact behavior is "an implementation detail"? I guess that the best for users is get the same behavior on all platforms, but can we really warranty that? Python rely on the operating system and the libc, but each platform has subtle behavior differneces. Handling time and date is a good example: strptime() and strftime() have big differences between platforms. posix_spawn() is another example where the implementation is very different depending on the platform.

pablogsal · 2019-01-16T22:35:31Z

One of the issue that I have with using posix_spawn() is that the *exact* behavior of subprocess is not properly defined by test_subprocess. Should we more more tests, or document that the exact behavior is "an implementation detail"? I guess that the best for users is get the same behavior on all platforms, but can we really warranty that? Python rely on the operating system and the libc, but each platform has subtle behavior differneces. Handling time and date is a good example: strptime() and strftime() have big differences between platforms. posix_spawn() is another example where the implementation is very different depending on the platform.

I don't think we can get the same behaviour in all platforms, but I would want to know what Gregory P. Smith thinks about this potential divergence in behaviour and what are the guarantees that posix_subprocess should maintain.

vstinner · 2019-01-16T22:38:12Z

New changeset 8c34956 by Victor Stinner in branch 'master':
Revert "bpo-35537: subprocess can now use os.posix_spawnp (GH-11579)" (GH-11582)
8c34956

izbyshev · 2019-01-16T23:47:27Z

Hi,

As a disclaimer, I'm a FreeBSD developer interested in making sure we're doing the right thing here. =)

May I ask what the above assessment is based on, and specifically what we need to address?

Hello, Kyle! That assessment is based on my quick and incorrect reading of posix_spawn source code. I didn't notice that "error" is visible in the parent due to address space sharing. Sorry for that, and thank you for the correction. A proper error notification is required for posix_spawn to be used in subprocess as a replacement for fork/exec, and FreeBSD does it right.

While we're here, would you be kind to answer several questions about posix_spawn on FreeBSD?

On Linux, without taking additional precautions, signals may be delivered to a child process created via vfork(). If a signal handler runs in such a child, it can leave traces of its work in the (shared) memory, potentially surprising the parent process. To avoid this, glibc[1] and musl[2] disable all signals (even signals used internally for thread cancellation) before creating a child, but FreeBSD calls vfork() right away. Is this not considered a problem, or is something hidden in vfork() implementation?
Another problem with vfork() is potential sharing of address space between processes with different privileges (see Rich Felker's post[3] about this and the previous problem). Is this a valid concern on FreeBSD?
Does FreeBSD kernel guarantee that when waitpid(WNOHANG) is called at [4], the child is ready for reaping? On Linux, it's not always the case[5].

[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/spawni.c;h=353bcf5b333457d191320e358d35775a2e9b319b;hb=HEAD#l372
[2] http://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c?id=5ce3737931bb411a8d167356d4d0287b53b0cbdc#n171
[3] https://ewontfix.com/7
[4] https://svnweb.freebsd.org/base/head/lib/libc/gen/posix_spawn.c?view=markup#l229
[5] https://sourceware.org/git/?p=glibc.git;a=commit;h=aa95a2414e4f664ca740ad5f4a72d9145abbd426

izbyshev · 2019-01-17T00:38:47Z

One of the issue that I have with using posix_spawn() is that the *exact* behavior of subprocess is not properly defined by test_subprocess. Should we more more tests, or document that the exact behavior is "an implementation detail"?

Regarding using PATH from "env" instead of parent's environment, it may be considered documented because subprocess docs refer to os.execvp(), where it's explicitly documented:

"""
The variants which include a “p” near the end (execlp(), execlpe(), execvp(), and execvpe()) will use the PATH environment variable to locate the program file. When the environment is being replaced (using one of the exec*e variants, discussed in the next paragraph), the new environment is used as the source of the PATH variable.
"""

The problem is that it differs from execvp() in libc (and POSIX), so I would consider such behavior a bug if it wasn't so old to become a feature. Thanks to Victor for noticing that, I missed it.

So, if we can't change os.execvp() and/or current subprocess behavior, posix_spawnp seems to be ruled out. (I don't consider temporarily changing the parent environment as a solution). A naive emulation of posix_spawnp would be repeatedly calling posix_spawn for each PATH entry, but that's prohibitively expensive. Could we use a following hybrid approach instead?

Iterate over PATH entries and perform a quick check for common exec errors (ENOENT, ENOTDIR, EACCES) manually (e.g. by stat()). If the check fails, exec would also fail so just skip the entry. (It's important not to get false negatives here, but the child process would have the same privileges as the parent since we don't use POSIX_SPAWN_RESETIDS, so I can't think of one). Otherwise, call posix_spawn with the absolute path. If it fails, skip the entry.

Looks ugly, but are there other problems? This would seem to work just as well as posix_spawnp in the common case, but in the worst (contrived) case it would be equivalent to calling posix_spawn for each PATH entry.

kevans91 · 2019-01-17T04:54:30Z

On Wed, Jan 16, 2019 at 5:47 PM Alexey Izbyshev <report@bugs.python.org> wrote:

> Hi,

> As a disclaimer, I'm a FreeBSD developer interested in making sure we're doing the right thing here. =)

> May I ask what the above assessment is based on, and specifically what we need to address?

Hello, Kyle! That assessment is based on my quick and incorrect reading of posix_spawn source code. I didn't notice that "error" is visible in the parent due to address space sharing. Sorry for that, and thank you for the correction. A proper error notification is required for posix_spawn to be used in subprocess as a replacement for fork/exec, and FreeBSD does it right.

Oh, good to hear. =) koobs had pointed this thread out in hopes that we can try to reconcile and figure out what work needs to be done here. =)

While we're here, would you be kind to answer several questions about posix_spawn on FreeBSD?

Most definitely, if I can!

On Linux, without taking additional precautions, signals may be delivered to a child process created via vfork(). If a signal handler runs in such a child, it can leave traces of its work in the (shared) memory, potentially surprising the parent process. To avoid this, glibc[1] and musl[2] disable all signals (even signals used internally for thread cancellation) before creating a child, but FreeBSD calls vfork() right away. Is this not considered a problem, or is something hidden in vfork() implementation?

Right, after glancing over our implementation details- this is indeed a problem here. Our manpage indicates that we only block SIGTTOU for SIGTTIN for children in vfork(), and some light testing seems to indicate that's the case. Signal handlers are inherited from the parent, so that's less than stellar.

Another problem with vfork() is potential sharing of address space between processes with different privileges (see Rich Felker's post[3] about this and the previous problem). Is this a valid concern on FreeBSD?

My initial read-through of the relevant bits seem to indicate that image activation will cause the child process to be allocated a new process space, so it's looking kind of like a 'no' -- I'm outsourcing the answer to this one to someone more familiar with those bits, though, so I'll get back to you.

Does FreeBSD kernel guarantee that when waitpid(WNOHANG) is called at [4], the child is ready for reaping? On Linux, it's not always the case[5].

I want to say vfork semantics guarantee it- we got to this point, so the child hit an error and _exit(127). I'm double-checking this one, though, to verify the child's properly marked a zombie before we could possibly reschedule the parent.

kevans91 · 2019-01-17T06:42:39Z

I'll be preparing a patch for our posix_spawn's signal handling.

My mistake in my setuid assessment was pointed out to me- it doesn't seem like a highly likely attack vector, but it is indeed possible.

If the child errored out, they will indeed be properly reapable at that point due to how vfork is implemented -- any observation to the contrary is to be considered a bug.

vstinner · 2019-01-17T10:00:47Z

Alexey Izbyshev

So, if we can't change os.execvp() and/or current subprocess behavior, posix_spawnp seems to be ruled out.

My main motivation for using posix_spawn() is performance. An optimization should not justify to introduce a backward incompatible change. I agree wth posix_spawnp() cannot be used directly in subprocess because of that. But os.posix_spawnp() addition in Python 3.8 remains useful because it allows to use it directly (avoid subprocess).

A naive emulation of posix_spawnp would be repeatedly calling posix_spawn for each PATH entry, but that's prohibitively expensive.

It should be compared to the current code. Currently, _posixsubprocess uses a loop calling execv(). I don't think that calling posix_spawn() in a loop until one doesn't fail is more inefficient.

The worst case would be when applying process attributes and run file actions would dominate performances... I don't think that such case exists. Syscalls like dup2() and close() should be way faster than the final successful execv() in the overall posix_spawn() call. I'm talking about the case when the program exists.

Iterate over PATH entries and perform a quick check for common exec errors (ENOENT, ENOTDIR, EACCES) manually (e.g. by stat()).

I dislike this option. There is a high risk of race condition if the program is created, deleted or modified between the check and exec. It can cause subtle bugs, or worse: security vulnerabilities. IMHO the only valid check, to test if a program exists, is to call exec().

Alexey: Do you want to work on a PR to reimplement the "executable_list" and loop used by subprocess with _posixsubproces? You should keep the latest exception to re-raise it if no posix_spawn() successed. Don't forget to clear the exception on success, to not create a reference cycle:

commit acb9fa7
Author: Victor Stinner <victor.stinner@gmail.com>
Date: Wed Sep 13 10:10:10 2017 -0700

bpo-31234, socket.create_connection(): Fix ref cycle (bpo-3546)

izbyshev · 2019-01-17T18:09:35Z

Thank you for the answers, Kyle!

I'll be preparing a patch for our posix_spawn's signal handling.

Great!

My mistake in my setuid assessment was pointed out to me- it doesn't seem like a highly likely attack vector, but it is indeed possible.

The specific scenario with non-synchronized posix_spawn/setuid is certainly not a good practice and could probably be considered an application bug (such application effectively spawns a child with "random" privileges -- depending on whether setuid() completed before or after vfork()).

That said, in Linux C libraries protection from that scenario is usually achieved automatically if all signals are blocked before vfork(). This is the result of a Linux-specific detail: at syscall level, setuid() affects a single thread only, but setuid() library function must affect the process as a whole. This forces C libraries to signal all threads when setuid() is called and wait until all threads perform setuid() syscall. This waiting can't complete until vfork() returns (because signals are blocked), so setuid() is effectively postponed. I don't know how setuid() behaves on FreeBSD (so the above may be not applicable at all).

If the child errored out, they will indeed be properly reapable at that point due to how vfork is implemented -- any observation to the contrary is to be considered a bug.

Ah, that's good, thanks for the confirmation!

izbyshev · 2019-01-17T22:25:48Z

It should be compared to the current code. Currently, _posixsubprocess uses a loop calling execv(). I don't think that calling posix_spawn() in a loop until one doesn't fail is more inefficient.

The worst case would be when applying process attributes and run file actions would dominate performances... I don't think that such case exists. Syscalls like dup2() and close() should be way faster than the final successful execv() in the overall posix_spawn() call. I'm talking about the case when the program exists.

I had vfork() used internally by posix_spawn() in mind when I considered repeatedly calling it "prohibitively expensive". While vfork() is much cheaper than fork(), it doesn't mean that its performance is comparable to dup2() and close(). But on systems where posix_spawn is a syscall the overhead could indeed be lesser. Anyway, it should be measured.

> Iterate over PATH entries and perform a quick check for common exec errors (ENOENT, ENOTDIR, EACCES) manually (e.g. by stat()).

I dislike this option. There is a high risk of race condition if the program is created, deleted or modified between the check and exec. It can cause subtle bugs, or worse: security vulnerabilities. IMHO the only valid check, to test if a program exists, is to call exec().

While exec() is of course cleaner, IMHO races don't matter in this case. If our stat()-like check fails, we could as well imagine that it is exec() that failed (while doing the same checks as our stat()), so it doesn't matter what happens with the tested entry afterwards. If the check succeeds, we simply call posix_spawn() that will perform the same checks again. If any race condition occurs, the problem will be detected by posix_spawn().

Alexey: Do you want to work on a PR to reimplement the "executable_list" and loop used by subprocess with _posixsubproces?

I'm currently focused on researching whether it's possible to use vfork()-like approach instead of fork() on Linux, so if anyone wants to implement the PR you suggested, feel free to do it.

izbyshev · 2019-01-18T16:59:45Z

> * pass_fds: there is not API to mark a fd as inheritable (clear O_CLOEXEC flag)

POSIX has a bug for this [5]. It's marked fixed, but the current POSIX docs doesn't reflect the changes. The idea is to make posix_spawn_file_actions_adddup2() clear O_CLOEXEC if the source and the destination are the same (unlike dup2(), which would do nothing). musl has already implemented the new behavior [6], but glibc hasn't.

Update: glibc has implemented it for 2.29 (https://sourceware.org/bugzilla/show_bug.cgi?id=23640).

vstinner · 2019-01-23T18:00:44Z

New changeset f6243ac by Victor Stinner in branch 'master':
bpo-35537: subprocess can use posix_spawn with pipes (GH-11575)
f6243ac

izbyshev · 2019-01-23T18:29:44Z

Another problem with posix_spawn() on glibc: it doesn't report errors to the parent process when run under QEMU user-space emulation and Windows Subsystem for Linux. This is because starting with commit [1] (glibc 2.25) posix_spawn() relies on address space sharing semantics of clone(CLONE_VM) to report errors, but it's not implemented in QEMU and WSL, so clone(CLONE_VM) and vfork() behave like fork(). See also [2], [3].

This can be easily demonstrated:
$ cat test.c
#include <spawn.h>

int main(int argc, char *argv[], char *envp[]) {
    return posix_spawn(0, "non-existing", 0, 0, argv, envp);
}

$ gcc test.c
$ ./a.out
$ echo $?
2
$ aarch64-linux-gnu-gcc -static test.c
$ qemu-aarch64 ./a.out
$ echo $?
0

There is no problem with musl (it doesn't rely on address space sharing).

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=4b4d4056bb154603f36c6f8845757c1012758158
[2] https://bugs.launchpad.net/qemu/+bug/1673976
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04890.html

vstinner · 2019-01-24T10:24:43Z

Another problem with posix_spawn() on glibc: it doesn't report errors to the parent process when run under QEMU user-space emulation and Windows Subsystem for Linux. This is because starting with commit [1] (glibc 2.25) posix_spawn() relies on address space sharing semantics of clone(CLONE_VM) to report errors, but it's not implemented in QEMU and WSL, so clone(CLONE_VM) and vfork() behave like fork(). See also [2], [3].

Oh, you know a lot of stuff about vfork and posix_spawn, you are impressive :-)

Is sys.platform equal to 'linux' on WSL? Sorry, I don't know WSL. If it's equal, is it possible to explicitly exclude WSL in the subprocess test, _use_posix_spawn()?

For QEMU, I was very surprised that a full VM wouldn't implement vfork. So I tried Fedora Rawhide and I didn't notice the bug that you described.

---

$ cat test.c
#include <spawn.h>

int main(int argc, char *argv[], char *envp[]) {
    return posix_spawn(0, "non-existing", 0, 0, argv, envp);
}

$ gcc test.c
$ ./a.out
$ echo $?
2

In the VM, I get exit status 2 as expected.

You wrote:

---

$ aarch64-linux-gnu-gcc -static test.c
$ qemu-aarch64 ./a.out
$ echo $?
0

So the bug only affects "QEMU User Space". I never used that before. I understand that it's a bug in the glibc and in "QEMU User Emulation" which doesn't implement vfork "properly".

There is no problem with musl (it doesn't rely on address space sharing).

I'm not interested to support musl at this point because I don't see any obvious way to detect that we are running musl nor get the musl version.

The history of this issue shows the posix_spawn() only works in some very specific cases, so we have to be careful and only opt-in for specific libc, on specific platform with specific libc version (read at runtime if possible).

vstinner · 2019-01-24T11:03:22Z

I discussed the following issue with Florian Weimer (who works on the glibc):
---

$ cat test.c
#include <spawn.h>

int main(int argc, char *argv[], char *envp[]) {
    return posix_spawn(0, "non-existing", 0, 0, argv, envp);
}

$ aarch64-linux-gnu-gcc -static test.c
$ qemu-aarch64 ./a.out
$ echo $?
0

While the parent gets a "success", in subprocess we get the pid and later read the exit status of the child process. Even if posix_spawn() doesn't report a failure, waitpid() will still report a failure. It's a subtle behavior change. I'm not sure yet if it's acceptable for Python in term of semantics:

without posix_spawn(), subprocess.Popen constructor raises FileNotFoundError: the parent knows that the execution failed because the program doesn't exist
with posix_spawn() with the vfork bug, subprocess.Popen succeed but Popen.wait() returns something like exit code 127. The parent doesn't know if the child process explcitly used exit(127) or if execv() failed.

Does the difference matters? The bug only occurs in some very specific cases:

WSL
QEMU User Emulation

Are these use cases common enough to block the whole idea of using posix_spawn() on Linux. I don't think so. I really want to use posix_spawn() for best performances! Moreover, I expect that glibc implementation of posix_spawn() is safer than Python _posixsubprocess. The glibc has access to low-level stuff like it's internal signals, cancellation points, etc. _posixsubprocess is more generic and doesn't worry about such low-level stuff, whereas they might cause bad surprised.

vstinner · 2019-01-24T16:51:30Z

I wrote PR 11668 to propose to *document* the subtle behavior change when posix_spawn() is used on QEMU User Emulation and WSL platforms, rather than trying to opt-out for posix_spawn() on these platforms.

izbyshev · 2019-01-25T02:37:20Z

Is sys.platform equal to 'linux' on WSL? Sorry, I don't know WSL. If it's equal, is it possible to explicitly exclude WSL in the subprocess test, _use_posix_spawn()?

I don't have immediate access to WSL right now, but I'll try to get one and investigate what we have there wrt. posix_spawn() and vfork().

So the bug only affects "QEMU User Space". I never used that before.

Yes, I was specifically referring to QEMU user-space. One use case that heavily relies on it is Tizen. Its packages are built in a chroot jail containing the build environment with binaries native to the target architecture, making an illusion that packages are built on the target system and are not cross-compiled. So the binaries are run under QEMU user-space emulation. But in reality, because of unacceptable performance of binary translation, many frequently-used binaries, like coreutils and compilers, are replaced with host-native binaries in a way transparent to the build system (so it has no idea whether it runs host-native or target-native binaries).

Does the difference matters? The bug only occurs in some very specific cases:

WSL

QEMU User Emulation

Are these use cases common enough to block the whole idea of using posix_spawn() on Linux. I don't think so. I really want to use posix_spawn() for best performances! Moreover, I expect that glibc implementation of posix_spawn() is safer than Python _posixsubprocess. The glibc has access to low-level stuff like it's internal signals, cancellation points, etc. _posixsubprocess is more generic and doesn't worry about such low-level stuff, whereas they might cause bad surprised.

It's true that a C library is in a better position to implement something like posix_spawn(), but I still think that vfork()/exec() is worth to consider at least on Linux. See bpo-35823, which should also work under QEMU user-mode and WSL (but needs testing).

vstinner · 2019-02-01T10:05:35Z

New changeset 80c5dfe by Victor Stinner (Joannah Nanjekye) in branch 'master':
bpo-35537: Add setsid parameter to os.posix_spawn() and os.posix_spawnp() (GH-11608)
80c5dfe

vstinner · 2019-02-01T10:40:29Z

New changeset 1e39b83 by Victor Stinner in branch 'master':
bpo-35537: Skip test_start_new_session() of posix_spawn (GH-11718)
1e39b83

vstinner · 2019-02-01T14:47:29Z

New changeset 325e4ba by Victor Stinner in branch 'master':
bpo-35537: Fix function name in os.posix_spawnp() errors (GH-11719)
325e4ba

vstinner · 2019-04-25T12:30:22Z

New changeset d7befad by Victor Stinner in branch 'master':
bpo-35537: Document posix_spawn() change in subprocess (GH-11668)
d7befad

vstinner · 2019-06-05T08:27:39Z

Python 3.8 beta 1 is released with this feature. I documented the change. Thnaks everybody who was involved to make this possible. I close the issue. Open new issues for follow-up.

vstinner · 2019-06-14T17:31:48Z

New changeset 5884043 by Victor Stinner in branch 'master':
bpo-35537: Rewrite setsid test for os.posix_spawn (GH-11721)
5884043

miss-islington · 2019-06-14T17:49:26Z

New changeset e696b15 by Miss Islington (bot) in branch '3.8':
bpo-35537: Rewrite setsid test for os.posix_spawn (GH-11721)
e696b15

nanjekyejoannah added 3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Dec 19, 2018

vstinner self-assigned this Dec 20, 2018

vstinner closed this as completed Jun 5, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

swaldhoer mentioned this issue Jun 21, 2023

Missing reference to issue in subprocess documentation #105975

Closed

mrpippy mentioned this issue Sep 8, 2023

Support the macOS-only POSIX_SPAWN_CLOEXEC_DEFAULT flag and posix_spawn_file_actions_addinherit_np() in os.posix_spawn #109154

Open

kulikjak mentioned this issue Dec 14, 2023

Add support for posix_spawn call in subprocess.Popen with close_fds=True #113117

Closed

use os.posix_spawn in subprocess #79718

use os.posix_spawn in subprocess #79718

Comments

nanjekyejoannah commented Dec 19, 2018

nanjekyejoannah commented Dec 19, 2018

vstinner commented Dec 20, 2018

vstinner commented Dec 20, 2018

vstinner commented Dec 20, 2018

vstinner commented Dec 20, 2018

serhiy-storchaka commented Dec 20, 2018

vstinner commented Dec 20, 2018

serhiy-storchaka commented Dec 20, 2018

izbyshev mannequin commented Dec 20, 2018

serhiy-storchaka commented Dec 20, 2018

izbyshev mannequin commented Dec 20, 2018

vstinner commented Dec 20, 2018

izbyshev mannequin commented Dec 20, 2018

vstinner commented Dec 20, 2018

vstinner commented Dec 20, 2018

vstinner commented Dec 20, 2018

vstinner commented Dec 20, 2018

vstinner commented Dec 20, 2018

izbyshev mannequin commented Dec 23, 2018

gpshead commented Dec 26, 2018

vstinner commented Jan 6, 2019

vstinner commented Jan 7, 2019

vstinner commented Jan 7, 2019

vstinner commented Jan 7, 2019

izbyshev mannequin commented Jan 13, 2019

vstinner commented Jan 14, 2019

vstinner commented Jan 16, 2019

vstinner commented Jan 16, 2019

pablogsal commented Jan 16, 2019

vstinner commented Jan 16, 2019

izbyshev mannequin commented Jan 16, 2019

izbyshev mannequin commented Jan 17, 2019

kevans91 mannequin commented Jan 17, 2019

kevans91 mannequin commented Jan 17, 2019

vstinner commented Jan 17, 2019

izbyshev mannequin commented Jan 17, 2019

izbyshev mannequin commented Jan 17, 2019

izbyshev mannequin commented Jan 18, 2019

vstinner commented Jan 23, 2019

izbyshev mannequin commented Jan 23, 2019

vstinner commented Jan 24, 2019

vstinner commented Jan 24, 2019

vstinner commented Jan 24, 2019

izbyshev mannequin commented Jan 25, 2019

vstinner commented Feb 1, 2019

vstinner commented Feb 1, 2019

vstinner commented Feb 1, 2019

vstinner commented Apr 25, 2019

vstinner commented Jun 5, 2019

vstinner commented Jun 14, 2019

miss-islington commented Jun 14, 2019