-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use os.posix_spawn in subprocess #79718
Comments
On Linux, posix_spawn() uses vfork() instead of fork() which blocks On macOS, posix_spawn() is a system call, so the kernel is free to use posix_spawn() is faster but it's also safer: it allows us to do a lot of Currently, Python uses a C extension _posixsubprocess which more or So it would be great to avoid _posixsubprocess whenever possible for |
"posix_spawn() is faster" This assumption needs a benchmark! I ran a benchmark on Linux (Fedora 29, Linux kernel 4.19.9-300.fc29.x86_64, glibc 2.28) using attached subprocess_bench.py. I rebased the PR 11242 on master and run the benchmark in virtual environment: git co pr/11242 The result is quite explicit: the PR makes subprocess.Popen 61x faster!!! $ env/bin/python -m perf compare_to fork_exec.json posix_spawn.json
Mean +- std dev: [fork_exec] 27.1 ms +- 0.4 ms -> [posix_spawn] 447 us +- 163 us: 60.55x faster (-98%) That's the best case:
On my config (Fedora 29, Linux kernel 4.19.9-300.fc29.x86_64, glibc 2.28), os.posix_spawn() uses vfork(): $ strace -o trace ./python -c 'import subprocess; subprocess.run(["/bin/true"], close_fds=False, restore_signals=False)'
$ grep clone trace
clone(child_stack=0x7fab28ac0ff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) = 23073 I guess that the 61x speedup mostly comes from vfork(). See also bpo-34663 for previous discussion about vfork() and os.posix_spawn(). In short, the glibc is smart and detects when vfork() can be used or not. |
Benchmark on macOS Mojave (10.14.2), x86_64: $ ./python.exe -m perf compare_to fork_exec.json posix_spawn.json
Mean +- std dev: [fork_exec] 2.79 ms +- 0.06 ms -> [posix_spawn] 1.64 ms +- 0.03 ms: 1.70x faster (-41%) The speedup is "only" 1.7x faster, it's less impressive. Maybe fork()+exec() is more efficient on macOS than on Linux. |
Benchmark on FreeBSD: $ env/bin/python -m perf compare_to fork_exec.json posix_spawn.json
Mean +- std dev: [fork_exec] 52.8 ms +- 4.7 ms -> [posix_spawn] 499 us +- 45 us: 105.91x faster (-99%) Wow! 106x faster! |
See also: "How a fix in Go 1.9 sped up our Gitaly service by 30x" (hint: posix_spawn) fork/exec vs. posix_spawn benchmarks: |
We need to add tests for all corner cases. I.e. disabling any of conditions for posix_spawn should cause the failure. We need to do also something with PyOS_BeforeFork/PyOS_AfterFork_Child/PyOS_AfterFork_Parent. |
Do you mean running tests twice, one with posix_spawn(), one without?
Can you please elaborate? posix_spawn() may or may not use fork() internally. |
I mean that after writing tests they can be tested manually by disabling conditions for posix_spawn one by one. I.e. some tests should fail if remove "stdout is None" and some tests should fail if remove "not close_fds", etc.
_posixsubprocess.fork_exec() calls PyOS_BeforeFork/PyOS_AfterFork_Child/PyOS_AfterFork_Parent. If use os.posix_spawn(), these calls will be omitted. This is a behavior change. We should either call these functions manually before/after os.posix_spawn() (but I do not know what to do with PyOS_AfterFork_Child), or disable using os.posix_spawn() if non-trivial callbacks were added, or use other methods for calling registered callbacks. But the stdlib already adds callbacks in the threading and random modules. |
Serhiy, PyOS_* functions are called only if preexec_fn != None. But it will never be possible to implement support for preexec_fn (and some other subprocess features, e.g. close_fds) if processes are run via posix_spawn, so I don't see why anything should be done with those functions. |
Oh, nice! So this is not an obstacle for using os.posix_spawn(). |
Victor and Joannah, thanks for working on adding vfork() support to subprocess. Regarding speedups in the real world, I can share a personal anecdote. Back at the time when AOSP was built with make (I think it was AOSP 5) I've observed ~2x slowdown (or something close to that) if fork() is used instead of vfork() in make. This is the slowdown of the *whole* build (not just process creation time), and it's dramatic given the amount of pure computation (compilation of C++/Java code) involved in building AOSP. The underlying reason was that make merged all AOSP subproject makefiles into a single gigantic one, so make consumed more than 1 GB of RAM, and each shell invocation in recipes resulted in copying a large page table. That said, I'm not sure that the chosen approach of adding posix_spawn() "fast paths" for particular combinations of arguments is an optimal way to expose vfork() benefits to most users. For example, close_fds is True by default in subprocess, and I don't think there is a reliable way to use posix_spawn() in this case. So users would need to use "magic" combinations of arguments to subprocess.Popen to benefit from vfork(). Another example is "cwd" parameter: there is no standard posix_spawn file action to change the working directory in the child, so such a seemingly trivial operation would trigger the "slow" fork-exec path. Another approach would be to use vfork-exec instead fork-exec in _posixsubprocess. I know that previous discussions were resolved against vfork(), but I'm still not convinced and suggest to reevaluate those decisions again. Some arguments:
Some possible counter-arguments:
In the end, I think that migrating subprocess to vfork-exec would have more impact for users than adding "fast paths" and have consistent performance regardless of subprocess.Popen arguments (with few exceptions). Please consider it. Thanks! |
I'm open to experiment to use vfork() in _posixsubprocess, but I still want to use posix_spawn():
On macOS, posix_spawn() is implemented as a syscall. Using vfork() can cause new issues: that's why there is a POSIX_SPAWN_USE_VFORK flag (the caller had to explicitly enable it). See also bpo-34663 the history of vfork in posix_spawn() in the glibc. I'm not against using it, but I'm not sure that vfork() can be used in all cases, when all subprocess options are used. Maybe you will end up with the same state: vfork() used or not depending on subprocess.Popen options...
Since PEP-446 has been implemented in Python 3.4, I would be interested to revisit close_fds default value: don't close by default... But I'm not sure if it's *safe* to change the default... At least, we may promote close_fds=False in the doc explaining that it's faster and should be safer in the common case.
I don't think that it's a bug that subprocess have different performances depending on the flags. It would be a bug if the behavior would be diffrent. I don't think that it's the case. |
I've studied that, and that's what I referred to as "quality-of-implementation" problem. After glibc devs removed heap allocations and tweaked some other things, they could use vfork() in all cases. "musl" libc never had those problems and used vfork() from the beginning. |
I'm only interested to use posix_spawn(). If you want to experiment vfork() in _posixsubprocess: please go ahead!
_posixsubprocess shouldn't allocate memory on the heap *after* fork(), only before. If it does, it's a bug. Last time I checked _posixsubprocess, it looked to be written properly. But the subprocess module has the crazy 'preexec_fn' option which makes things complicated... Maybe we should deprecate (and then remove) it, it's dangerous and it can be avoided in many cases. Using a launcher program is safer than relying on preexec_fn. In OpenStack, I reimplemented prlimit in pure Python. It's an example of launcher to set an option in the child process: |
Serhiy Storchaka, Alexey Izbyshev: I read and understood your valid concerns. restore_signals=True will be quickly implemented, and so posix_spawn() code path will be tested by more tests of test_subprocess. If not, I will ensure that missing tests will be added. Enhance _posixsubprocess to use vfork() is an interesting project, but IMHO it's complementary and doesn't replace all advantages of posix_spawn(). I am going to merge the PR tomorrow, except if someone sees a good reason to not merge it. I prefer to merge the PR early in the 3.8 development cycle to have more time to handle any issue if someone notice bugs. If something goes badly, we will be able to easily revert the change. Don't worry, I like to revert changes ;-) Again, I'm mentoring Joannah who is learning Python, so I prefer to move step by step (with small steps :-)). We will support more and more subprocess.Popen options. |
I wasn't sure how posix_spawn() handles file descriptors of standard streams with O_CLOEXEC flag set. I wrote a test. Result: _posixsubprocess and os.posix_spawn() have the same behavior! If fd 2 (stderr) is marked with O_CLOEXEC, the fd 2 is closed in the child process, as expected. (There is no black magic to keep it open.) $ cat x.py
import subprocess, sys, os
args = [sys.executable, 'y.py']
os.set_inheritable(2, False)
subprocess.run(args, close_fds=False, restore_signals=False)
$ cat y.py
import os
def fd_valid(fd):
try:
os.fstat(fd)
return True
except OSError:
return False
for fd in (0, 1, 2):
print("fd %s valid? %s" % (fd, fd_valid(fd)))
$ python3 x.py # unpatched
fd 0 valid? True
fd 1 valid? True
fd 2 valid? False
$ ./python x.py # patched, use posix_spawn()
fd 0 valid? True
fd 1 valid? True
fd 2 valid? False |
Options that should be easy to support later:
I don't see how to support following options:
For pipes like stdout=PIPE or stdout=fd, I didn't look yet how to implement that properly. Maybe one way to work around the pass_fds limitation in an applicaton is to mark the FDs to pass as inheritable (os.set_inheritable(fd, True)) and use close_fds=False: fd = ... would become: fd = ... But this pattern is not thread if the application has other threads. So that should be the responsibility of the developer, not of Python, to write "unsafe" code" which requires the knownledge of the application behavior. |
I changed my mind and I decided to wait until I'm back from holiday instead :-) The function is more tricky than what I expected! |
posix_spawn_file_actions_addchdir_np() is scheduled for glibc 2.29 [1] and exists in Solaris [2] (though its suffix indicates that it's "non-portable" -- not in POSIX). POSIX also has a bug for this [7].
posix_spawnattr_setflags() supports POSIX_SPAWN_SETSID in musl [3] and glibc 2.26 [4] and is scheduled to be added into POSIX. Solaris also has POSIX_SPAWN_SETSID_NP.
POSIX has a bug for this [5]. It's marked fixed, but the current POSIX docs doesn't reflect the changes. The idea is to make posix_spawn_file_actions_adddup2() clear O_CLOEXEC if the source and the destination are the same (unlike dup2(), which would do nothing). musl has already implemented the new behavior [6], but glibc hasn't. Another alternative (also described in the POSIX bug report) is to remove O_CLOEXEC via a side-effect of posix_spawn_file_actions_adddup2() to a dummy descriptor and then dup2() it back again. This is not correct in general case -- record locks (fcntl(F_SETLK)) associated with the file referred to by the descriptor are released on the second dup2() -- but in our case those locks are not preserved for the new process anyway, so this is not a problem. I'm not aware of other correctness problems with this, but one needs to check for platforms where a file descriptor may have associated flags other than O_CLOEXEC (such flags would be cleared by dup2).
You'd need to reimplement the corresponding logic from _posixsubprocess. Basically, posix_spawn_file_actions_adddup2() and posix_spawn_file_actions_addclose() do the job, but you need to avoid corner cases regarding low fds (0, 1, 2) that might have been reused for pipes. I don't know correct solutions for close_fds. It is already hard: it's implemented in async-safe way only for Linux now (and only if /proc is available) or if /proc is not available and the brute-force implementation is used. And of course, in theory you could implement any of the above by a helper binary as you did for prlimit. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=17405 |
Thanks for all your research and reference links on this! As a _posixsubprocess maintainer, I am not against either posix_spawn or vfork being used directly in the future when feasible. A challenge, especially with platform specific vfork, is making sure we understand exactly which platforms it can work properly on and checking for those both at compile time _and_ runtime (running kernel version and potentially the runtime libc version?) so that we can only use it in situations we are sure it is supposed to behave as desired in. My guiding philosophy: Be conservative on choosing when such a thing is safe to use. |
Gregory:
Are you against using posix_spawn() in subprocess? Or only in _posixsubprocess? -- Alexey Izbyshev identified a difference between _posixsubprocess and posix_spawn() PR 11242: error handling depends a lot on the libc implementation.
execv() can fail for a lot of different reasons. Extract of Linux execve() manual page:
--- Simply exit with exit code 127 (FreeBSD behavior) doesn't allow to know if the program have been executed or not. For example, "git bisect run" depends on the exit code: exit code 127 means "command not found". But with posix_spawn(), exit code 127 can means something else... I'm disappointed by the qualify of the posix_spawn() implementation on FreeBSD and old glibc... -- I'm now confused. Should we still use posix_spawn() on some platforms? For example, posix_spawn() seems to have a well defined error reporting according to its manual page: """ ERRORS
""" Would it be reasonable to use posix_spawn() but only on platforms where the implementation is known to be "good"? Good would mean that execv() and posix_spawn() errors (setting attributes or file actions failures) are properly reported to the parent process. For example, use posix_spawn() in subprocess on "recent" glibc (which minimum version?) and macOS (where it's a syscall)? |
I wrote PR 11452 which is based on PR 11242 but adds support for 'env' and 'restore_signals' parameters and checks the operating system and libc version to decide if posix_spawn() can be used by subprocess. In my implementation, posix_spawn() is only used on macOS and glibc 2.26 or newer (and glibc 2.24 and newer on Linux). According to Alexey Izbyshev, musl should be safe as well, but I don't know how to test musl on my Fedora, nor how to check if Python is linked to musl, nor what is the minimum musl version which is safe. |
"The native spawn(2) system introduced in Oracle Solaris 11.4 shares a lot of code with the forkx(2) and execve(2)." |
I created bpo-35674: "Expose os.posix_spawnp()" to later support executable with no directory in subprocess. |
To clarify, in the Ubuntu example only the parent process returns with zero exit code. The child exits with 127, so the behavior is the same as on FreeBSD, as demonstrated by check_call(): $ ./python -c "import subprocess; subprocess.check_call(['/xxx'], close_fds=False, restore_signals=False)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/izbyshev/workspace/cpython/Lib/subprocess.py", line 348, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/xxx']' returned non-zero exit status 127.
musl doesn't have an equivalent of __GLIBC__ macro by design as explained in its FAQ [1], and the same FAQ suggests to use something like "#ifndef __GLIBC__ portable_code() #else glibc_code()". So far musl-related fixes followed this advice (see PR 9288, PR 9224), but in this case it doesn't seem like a good idea to me. POSIX allows asynchronous error notification, so "portable" code shouldn't make assumptions about that, and technically old glibc and FreeBSD are conforming implementations, but not suitable as a replacement for _posixsubprocess. Maybe we could use configure-time musl detection instead? It seems like config.guess has some support for musl[2], but it uses "ldd" from PATH and hence appears to work out-of-the-box only on musl-based distros like Alpine. See also [3]. Regarding the minimum musl version, in this case it's not important since the relevant change was committed in 2013[4], and given that musl is relatively young, such old versions shouldn't have users now. [1] https://wiki.musl-libc.org/faq.html Line 154 in 5bb146a
[3] RcppCore/Rcpp#449 [4] https://git.musl-libc.org/cgit/musl/commit/?id=fb6b159d9ec7cf1e037daa974eeeacf3c8b3b3f1 |
""" It’s a bug to assume a certain implementation has particular properties rather than testing. So far, every time somebody’s asked for this with a particular usage case in mind, the usage case was badly wrong, and would have broken support for the next release of musl. The official explanation: http://openwall.com/lists/musl/2013/03/29/13 IMHO that's wrong. A software like Python heavily rely on the *exact* implementation of a libc. https://github.com/python/cpython/pull/9224/files looks like a coarse heuristic to detect musl for example. Until muscl decides to provide an "#ifdef __MUSL__"-like or any way that it's musl, I propose to not support musl: don't use os.posix_spawn() but _posixsubprocess. |
Oh... Using directly posix_spawnp() in subprocess to locate the executable in the PATH introduces subtle behavior changes... |
One of the issue that I have with using posix_spawn() is that the *exact* behavior of subprocess is not properly defined by test_subprocess. Should we more more tests, or document that the exact behavior is "an implementation detail"? I guess that the best for users is get the same behavior on all platforms, but can we really warranty that? Python rely on the operating system and the libc, but each platform has subtle behavior differneces. Handling time and date is a good example: strptime() and strftime() have big differences between platforms. posix_spawn() is another example where the implementation is very different depending on the platform. |
I don't think we can get the same behaviour in all platforms, but I would want to know what Gregory P. Smith thinks about this potential divergence in behaviour and what are the guarantees that posix_subprocess should maintain. |
Hello, Kyle! That assessment is based on my quick and incorrect reading of posix_spawn source code. I didn't notice that "error" is visible in the parent due to address space sharing. Sorry for that, and thank you for the correction. A proper error notification is required for posix_spawn to be used in subprocess as a replacement for fork/exec, and FreeBSD does it right. While we're here, would you be kind to answer several questions about posix_spawn on FreeBSD?
[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/spawni.c;h=353bcf5b333457d191320e358d35775a2e9b319b;hb=HEAD#l372 |
Regarding using PATH from "env" instead of parent's environment, it may be considered documented because subprocess docs refer to os.execvp(), where it's explicitly documented: """ The problem is that it differs from execvp() in libc (and POSIX), so I would consider such behavior a bug if it wasn't so old to become a feature. Thanks to Victor for noticing that, I missed it. So, if we can't change os.execvp() and/or current subprocess behavior, posix_spawnp seems to be ruled out. (I don't consider temporarily changing the parent environment as a solution). A naive emulation of posix_spawnp would be repeatedly calling posix_spawn for each PATH entry, but that's prohibitively expensive. Could we use a following hybrid approach instead? Iterate over PATH entries and perform a quick check for common exec errors (ENOENT, ENOTDIR, EACCES) manually (e.g. by stat()). If the check fails, exec would also fail so just skip the entry. (It's important not to get false negatives here, but the child process would have the same privileges as the parent since we don't use POSIX_SPAWN_RESETIDS, so I can't think of one). Otherwise, call posix_spawn with the absolute path. If it fails, skip the entry. Looks ugly, but are there other problems? This would seem to work just as well as posix_spawnp in the common case, but in the worst (contrived) case it would be equivalent to calling posix_spawn for each PATH entry. |
On Wed, Jan 16, 2019 at 5:47 PM Alexey Izbyshev <report@bugs.python.org> wrote:
Oh, good to hear. =) koobs had pointed this thread out in hopes that we can try to reconcile and figure out what work needs to be done here. =)
Most definitely, if I can!
Right, after glancing over our implementation details- this is indeed a problem here. Our manpage indicates that we only block SIGTTOU for SIGTTIN for children in vfork(), and some light testing seems to indicate that's the case. Signal handlers are inherited from the parent, so that's less than stellar.
My initial read-through of the relevant bits seem to indicate that image activation will cause the child process to be allocated a new process space, so it's looking kind of like a 'no' -- I'm outsourcing the answer to this one to someone more familiar with those bits, though, so I'll get back to you.
I want to say vfork semantics guarantee it- we got to this point, so the child hit an error and _exit(127). I'm double-checking this one, though, to verify the child's properly marked a zombie before we could possibly reschedule the parent. |
I'll be preparing a patch for our posix_spawn's signal handling. My mistake in my setuid assessment was pointed out to me- it doesn't seem like a highly likely attack vector, but it is indeed possible. If the child errored out, they will indeed be properly reapable at that point due to how vfork is implemented -- any observation to the contrary is to be considered a bug. |
Alexey Izbyshev
My main motivation for using posix_spawn() is performance. An optimization should not justify to introduce a backward incompatible change. I agree wth posix_spawnp() cannot be used directly in subprocess because of that. But os.posix_spawnp() addition in Python 3.8 remains useful because it allows to use it directly (avoid subprocess).
It should be compared to the current code. Currently, _posixsubprocess uses a loop calling execv(). I don't think that calling posix_spawn() in a loop until one doesn't fail is more inefficient. The worst case would be when applying process attributes and run file actions would dominate performances... I don't think that such case exists. Syscalls like dup2() and close() should be way faster than the final successful execv() in the overall posix_spawn() call. I'm talking about the case when the program exists.
I dislike this option. There is a high risk of race condition if the program is created, deleted or modified between the check and exec. It can cause subtle bugs, or worse: security vulnerabilities. IMHO the only valid check, to test if a program exists, is to call exec(). Alexey: Do you want to work on a PR to reimplement the "executable_list" and loop used by subprocess with _posixsubproces? You should keep the latest exception to re-raise it if no posix_spawn() successed. Don't forget to clear the exception on success, to not create a reference cycle: commit acb9fa7
|
Thank you for the answers, Kyle!
Great!
The specific scenario with non-synchronized posix_spawn/setuid is certainly not a good practice and could probably be considered an application bug (such application effectively spawns a child with "random" privileges -- depending on whether setuid() completed before or after vfork()). That said, in Linux C libraries protection from that scenario is usually achieved automatically if all signals are blocked before vfork(). This is the result of a Linux-specific detail: at syscall level, setuid() affects a single thread only, but setuid() library function must affect the process as a whole. This forces C libraries to signal all threads when setuid() is called and wait until all threads perform setuid() syscall. This waiting can't complete until vfork() returns (because signals are blocked), so setuid() is effectively postponed. I don't know how setuid() behaves on FreeBSD (so the above may be not applicable at all).
Ah, that's good, thanks for the confirmation! |
I had vfork() used internally by posix_spawn() in mind when I considered repeatedly calling it "prohibitively expensive". While vfork() is much cheaper than fork(), it doesn't mean that its performance is comparable to dup2() and close(). But on systems where posix_spawn is a syscall the overhead could indeed be lesser. Anyway, it should be measured.
While exec() is of course cleaner, IMHO races don't matter in this case. If our stat()-like check fails, we could as well imagine that it is exec() that failed (while doing the same checks as our stat()), so it doesn't matter what happens with the tested entry afterwards. If the check succeeds, we simply call posix_spawn() that will perform the same checks again. If any race condition occurs, the problem will be detected by posix_spawn().
I'm currently focused on researching whether it's possible to use vfork()-like approach instead of fork() on Linux, so if anyone wants to implement the PR you suggested, feel free to do it. |
Update: glibc has implemented it for 2.29 (https://sourceware.org/bugzilla/show_bug.cgi?id=23640). |
Another problem with posix_spawn() on glibc: it doesn't report errors to the parent process when run under QEMU user-space emulation and Windows Subsystem for Linux. This is because starting with commit [1] (glibc 2.25) posix_spawn() relies on address space sharing semantics of clone(CLONE_VM) to report errors, but it's not implemented in QEMU and WSL, so clone(CLONE_VM) and vfork() behave like fork(). See also [2], [3]. This can be easily demonstrated:
$ cat test.c
#include <spawn.h>
int main(int argc, char *argv[], char *envp[]) {
return posix_spawn(0, "non-existing", 0, 0, argv, envp);
} $ gcc test.c
$ ./a.out
$ echo $?
2
$ aarch64-linux-gnu-gcc -static test.c
$ qemu-aarch64 ./a.out
$ echo $?
0 There is no problem with musl (it doesn't rely on address space sharing). [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=4b4d4056bb154603f36c6f8845757c1012758158 |
Oh, you know a lot of stuff about vfork and posix_spawn, you are impressive :-) Is sys.platform equal to 'linux' on WSL? Sorry, I don't know WSL. If it's equal, is it possible to explicitly exclude WSL in the subprocess test, _use_posix_spawn()? For QEMU, I was very surprised that a full VM wouldn't implement vfork. So I tried Fedora Rawhide and I didn't notice the bug that you described. --- $ cat test.c
#include <spawn.h>
int main(int argc, char *argv[], char *envp[]) {
return posix_spawn(0, "non-existing", 0, 0, argv, envp);
} $ gcc test.c
$ ./a.out
$ echo $?
2 In the VM, I get exit status 2 as expected. You wrote: --- $ aarch64-linux-gnu-gcc -static test.c
$ qemu-aarch64 ./a.out
$ echo $?
0 So the bug only affects "QEMU User Space". I never used that before. I understand that it's a bug in the glibc and in "QEMU User Emulation" which doesn't implement vfork "properly".
I'm not interested to support musl at this point because I don't see any obvious way to detect that we are running musl nor get the musl version. The history of this issue shows the posix_spawn() only works in some very specific cases, so we have to be careful and only opt-in for specific libc, on specific platform with specific libc version (read at runtime if possible). |
I discussed the following issue with Florian Weimer (who works on the glibc): $ cat test.c
#include <spawn.h>
int main(int argc, char *argv[], char *envp[]) {
return posix_spawn(0, "non-existing", 0, 0, argv, envp);
} $ aarch64-linux-gnu-gcc -static test.c
$ qemu-aarch64 ./a.out
$ echo $?
0 While the parent gets a "success", in subprocess we get the pid and later read the exit status of the child process. Even if posix_spawn() doesn't report a failure, waitpid() will still report a failure. It's a subtle behavior change. I'm not sure yet if it's acceptable for Python in term of semantics:
Does the difference matters? The bug only occurs in some very specific cases:
Are these use cases common enough to block the whole idea of using posix_spawn() on Linux. I don't think so. I really want to use posix_spawn() for best performances! Moreover, I expect that glibc implementation of posix_spawn() is safer than Python _posixsubprocess. The glibc has access to low-level stuff like it's internal signals, cancellation points, etc. _posixsubprocess is more generic and doesn't worry about such low-level stuff, whereas they might cause bad surprised. |
I wrote PR 11668 to propose to *document* the subtle behavior change when posix_spawn() is used on QEMU User Emulation and WSL platforms, rather than trying to opt-out for posix_spawn() on these platforms. |
I don't have immediate access to WSL right now, but I'll try to get one and investigate what we have there wrt. posix_spawn() and vfork().
Yes, I was specifically referring to QEMU user-space. One use case that heavily relies on it is Tizen. Its packages are built in a chroot jail containing the build environment with binaries native to the target architecture, making an illusion that packages are built on the target system and are not cross-compiled. So the binaries are run under QEMU user-space emulation. But in reality, because of unacceptable performance of binary translation, many frequently-used binaries, like coreutils and compilers, are replaced with host-native binaries in a way transparent to the build system (so it has no idea whether it runs host-native or target-native binaries).
It's true that a C library is in a better position to implement something like posix_spawn(), but I still think that vfork()/exec() is worth to consider at least on Linux. See bpo-35823, which should also work under QEMU user-mode and WSL (but needs testing). |
Python 3.8 beta 1 is released with this feature. I documented the change. Thnaks everybody who was involved to make this possible. I close the issue. Open new issues for follow-up. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: