-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start the deprecation cycle for subprocess preexec_fn #82616
Comments
subprocess's preexec_fn feature needs to enter PendingDeprecationWarning state in 3.9, to become a DeprecationWarning in 3.10, with a goal of removing it in 3.11. Rationale: We now live in a world full of threads, it is entirely unsafe to call back into the python interpreter within the forked child before exec per POSIX specification. We've also already made preexec_fn no longer supported from CPython subinterpreters in 3.8. If there are not already issues open for desired features of subprocess that do not yet have replacements or workarounds for *specific* actions that preexec_fn is being used for in your application, please open feature requests for those. (ex: calling umask is https://bugs.python.org/issue38417, and group, uid, gid setting has already landed in 3.9) |
What is the recommanded way to replace preexec_fn? |
With task specific arguments. cwd, start_new_session, group, extra_groups, We cannot provide a general do everything replacement and should not try. |
Well, I proposed a solution at: I know that this solution has multiple flaws, but a bad solution may be better than no solution: breaking applications when upgrading to Python 3.11. |
calling setpgid() is a common post-fork task that would need to be an explicit parameter to Popen when preexec_fn goes away |
PR up to add setpgid support. From what I've come across, some setpgid() users can use setsid() already available via start_new_session= instead. But rather than getting into the differences between those, making both available makes sense to allow for anyone's case where setsid() isn't desired. |
https://bugs.python.org/issue42736 filed to track adding Linux prctl() support. |
Another preexec_fn use to consider: resource.setrlimit(resource.RLIMIT_CORE, (XXX, XXX)) Using an intermediate shell script wrapper that changes the rlimit and exec's the actual process is also an alternative. |
I'm also seeing a lot of os.setpgrp() calls, though those are more likely able to use start_new_session to do setsid() as a dropin replacement. |
signal.signal use case: Calls to signal.signal(x, y) to sometimes to set a non SIG_DFL behavior on a signal. SIGINT -> SIG_IGN for example. I see a lot of legacy looking code calling signal.signal in prexec_fn that appears to set SIG_DFL for the signals that Python otherwise modifies. Which restore_signals=True should already be doing. |
Doing the code inspection of existing preexec_fn= users within our codebase at work is revealing. But those seem to be the bulk of uses. I expect this deprecation to take a while. Ex: if we mark it as PendingDeprecationWarning in 3.10, I'd still wait until _at least_ 3.13 to remove it. Code using it often has a long legacy and may be written to run on a wide variety of Python versions. It's painful to do so when features you need in order to stop using it are still only in modern versions. |
IMO using Python is more portable than relying on a shell. I dislike relying on a shell since shells are not really portable (behave differently), unless you restrict yourself to a strict POSIX subset of the shell programming language. While '/bin/sh' is available on most Unix, Android uses '/system/bin/sh', and Windows and VxWorks have no shell (Windows provides cmd.exe which uses Batch programming language, and there are other scripting languages like VBS or PowerShell: so you need a complete different implementation for Windows). For the oslo.concurrency project, I wrote the Python script prlimit.py: a wrapper calling resource.setrlimit() and then execute a new program. It's similar to the Unix prlimit program, but written in Python to be portable (the "prlimit" program is not available on all platforms). https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py I suggest to not provide a builtin wrapper to replace preexec_fn, but suggest replacements in the subprocess and What's New in Python 3.11 documentation (explain how to port existing code). More generally, the whole preeexc_fn feature could be reimplemented a third-party project by spawning a *new* Python process, run the Python code, and *then* execute the final process. The main feature of preexec_fn is to give the ability to run a function of the current process, whereas what I'm discussing would be code written as a string. -- preexec_fn can be used for non-trivial issues like only sharing some Windows HANDLE, see: Note: This specific problem has been solved the proper way in Python by adding support for PROC_THREAD_ATTRIBUTE_HANDLE_LIST in subprocess.STARTUPINFO: lpAttributeList['handle_list'] parameter. |
I just created bpo-42738: "subprocess: don't close all file descriptors by default (close_fds=False)". |
Not in environments I use. :) There isn't an installed python interpreter that can be executed when deploying Python as an embedded interpreter such as anyone using pyoxidizer or similar. Plus "using python" means adding a Python startup time delay to anything that triggered such an action. That added latency isn't acceptable in some situations. When I suggest a workaround for something as involving an intermediate shell script, read that to mean "the user needs an intermediate program to do this complicated work for them - how is up to them - we aren't going to provide it from the stdlib". A shell script is merely one easy pretty-fast solution - in environments where that is possible. TL;DR - there's no one size fits all solution here. But third party libraries could indeed implement any/all of these options including abstracting how and what gets used when if someone wanted to do that. |
Would not be more consistent with other parameters to name the new parameter "pgid" or "process_group"? And why not use None as default, like for user and group? |
A worthwhile general suggestion on a new path forward for the mess of things between (v)fork+exec from Victor is over in https://bugs.python.org/issue42736#msg383869 TL;DR creating a subprocess.Preexec() recording object with specific interfaces for things that can be done, an instance of which gets passed in and the recorded actions are done as appropriate. |
Another use case someone had for preexec_fn popped up today: prctl(PR_SET_PDEATHSIG, SIGTERM) |
Another use case for preexec_fn: establishing a new controlling terminal, typically in conjunction with start_new_session=True. A preexec_fn may do something like
with discussion at https://chromium-review.googlesource.com/c/chromium/src/+/3524204/comments/59f03e7c_f103cd7e. |
One more thing that can help prevent people from using `preexec_fn`. Also adds conditional skips to two tests exposing ASAN flakiness on the Ubuntu 20.04 Address Sanitizer Github CI system. When that build is run on more modern systems the "problem" does not show up. It seems ASAN implementation related. Co-authored-by: Zackery Spytz <zspytz@gmail.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Landed as |
_posixsubprocess: add a static assertion to ensure that the pid_t type is signed. Replace _Py_IntegralTypeSigned() with _Py_IS_TYPE_SIGNED().
Not very relevant for performance here, but preexec_fn is about to be deprecated, so let's just remove it while we're at it. python/cpython#82616
How would you refactor the following code without using
(untested) |
Creating your own intermediate executable that does the same thing as your preexec code before calling exec to replace itself with the ultimate desired process rather than expecting it to be done in the restricted environment of process between fork+exec should work well for things like that. |
My work codebase has a fair amount of preexec_fn use, more than half of it is prctl PR_SET_PDEATHSIG |
Would it make sense to have a portable and generic way to:
Rather than We should design carefully "preexec" about signals, threads, file descriptor inheritance, but also report properly exec() errors to the parent process! The main difference is that |
Hi all. As the original subprocess module author, I'm a bit concerned over the expansion of the API. The number of arguments for the subprocess.Popen constructor is, to be honest, crazy. I am grateful for all work and bugfixes that has been done over the years, but the API is now quite far from the my original vision of a "small and clean API", which was the reason to "start from scratch", thus create the subprocess module. Removing preexec_fn and then adding additional arguments for cases where preexec_fn was used will further contribute to the "bloat" of the API, if I may use that word. I do understand the technical difficulties, but as always there are multiple alternatives. Better documentation as well as #82616 (comment) are very interesting, as I see it. |
Major kudos for its creation @astrand! Your work largely succeeded! If it had not, it would've needed replacing again. =) The original overall The remaining conundrum: A library using the If told they cannot expect do either they will have no good options that satisfy their needs. Their mere existence of both within the same process is the source of conflict. Use of threading is far more important to the world than realistic In absence of ways to replace the functionality people use it for, Thus our dirty plethora of options added to replace many common things people were using preexec_fn= for. Those also greatly increase subprocess launching performance (done in C, vfork can be used, etc). We've provided for some common cases. More exist. There is no way for us to predict and provide everything. This is the largest detractor to "small and clean" that I hear users complain about today. It is true! But most people do not need these options in daily life. So that is perhaps best considered a documentation organization issue. |
Stepping back from a "What do users actually need?" point of view, nothing has provided that:
Idea from above: Launching an intermediate Python interpreter process to run the preexec_fn code. This is a good idea in terms of code compatibility. It would add a large Python interpreter startup time to the child process launch. But functionality wise it would work and the caveat can be documented. It also requires that a Brainstorming further Idea: A smaller binary to execute as the intermediate is one possibility. A microscopic tiny Python-ish interpreter with a restricted set of APIs (no import capability, presume |
@gpshead thanks for your comments. preexec_fn is just a way to execute code between fork and exec, so as far as I can tell, the problem is exactly the same if you manually code fork() and exec() and some code in between. I assume that neither fork() nor exec() will be removed from the Python API, so why should preexec_fn be removed? As I understand it, Python 3.12 already has a mechanism in place to raise a DeprecationWarning if threads are detected. Why not use the same approach for preexec_fn? |
The same warning can and should be added there (I'll get to it). |
I've been trying to figure out if I can change from a preexec_fn that does os.setuid "to irrevocably drop permissions", to the user= kwarg. It makes for fairly obtuse reading, since the subprocess module documentation somewhat vaguely says "the setreuid() system call will be made" but not what arguments the low-level description of syscalls will be called with, and my underlying goal was actually to just "irrevocably switch to that user" (a high-level goal). I think it is safe "depending on your version of POSIX conformance" assuming that cpython passes the user kwarg twice to setreuid, (rather than 3 times to setresuid ;)). Do I understand correctly? |
I asked on the forum instead. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: