Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start the deprecation cycle for subprocess preexec_fn #82616

Open
gpshead opened this issue Oct 10, 2019 · 29 comments
Open

Start the deprecation cycle for subprocess preexec_fn #82616

gpshead opened this issue Oct 10, 2019 · 29 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@gpshead
Copy link
Member

gpshead commented Oct 10, 2019

BPO 38435
Nosy @gpshead, @vstinner, @diekhans, @markmentovai, @serhiy-storchaka, @izbyshev
PRs
  • gh-82616: Add process_group support to subprocess.Popen #23930
  • bpo-38435: Detect preexec_fn=os.setsid as start_new_session=True #23936
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = None
    created_at = <Date 2019-10-10.18:18:17.382>
    labels = ['type-feature', 'library', '3.11']
    title = 'Start the deprecation cycle for subprocess preexec_fn'
    updated_at = <Date 2022-03-16.16:54:37.730>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2022-03-16.16:54:37.730>
    actor = 'markmentovai'
    assignee = 'gregory.p.smith'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-10-10.18:18:17.382>
    creator = 'gregory.p.smith'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38435
    keywords = ['patch']
    message_count = 18.0
    messages = ['354397', '354404', '354407', '354439', '383708', '383720', '383724', '383725', '383726', '383727', '383728', '383736', '383740', '383863', '383988', '400498', '403026', '415352']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'diekhans', 'markmentovai', 'serhiy.storchaka', 'izbyshev']
    pr_nums = ['23930', '23936']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue38435'
    versions = ['Python 3.11']

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 10, 2019

    subprocess's preexec_fn feature needs to enter PendingDeprecationWarning state in 3.9, to become a DeprecationWarning in 3.10, with a goal of removing it in 3.11.

    Rationale: We now live in a world full of threads, it is entirely unsafe to call back into the python interpreter within the forked child before exec per POSIX specification.

    We've also already made preexec_fn no longer supported from CPython subinterpreters in 3.8.

    If there are not already issues open for desired features of subprocess that do not yet have replacements or workarounds for *specific* actions that preexec_fn is being used for in your application, please open feature requests for those. (ex: calling umask is https://bugs.python.org/issue38417, and group, uid, gid setting has already landed in 3.9)

    @gpshead gpshead added the 3.9 only security fixes label Oct 10, 2019
    @gpshead gpshead self-assigned this Oct 10, 2019
    @gpshead gpshead added the type-feature A feature request or enhancement label Oct 10, 2019
    @vstinner
    Copy link
    Member

    What is the recommanded way to replace preexec_fn?

    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 10, 2019

    With task specific arguments. cwd, start_new_session, group, extra_groups,
    user, etc..

    We cannot provide a general do everything replacement and should not try.
    It not possible.

    @vstinner
    Copy link
    Member

    We cannot provide a general do everything replacement and should not try. It not possible.

    Well, I proposed a solution at:
    https://bugs.python.org/issue38417#msg354242

    I know that this solution has multiple flaws, but a bad solution may be better than no solution: breaking applications when upgrading to Python 3.11.

    @diekhans
    Copy link
    Mannequin

    diekhans mannequin commented Dec 24, 2020

    calling setpgid() is a common post-fork task that would need to be an explicit parameter to Popen when preexec_fn goes away

    @gpshead gpshead added 3.10 only security fixes and removed 3.9 only security fixes labels Dec 25, 2020
    @gpshead
    Copy link
    Member Author

    gpshead commented Dec 25, 2020

    PR up to add setpgid support. From what I've come across, some setpgid() users can use setsid() already available via start_new_session= instead. But rather than getting into the differences between those, making both available makes sense to allow for anyone's case where setsid() isn't desired.

    @gpshead
    Copy link
    Member Author

    gpshead commented Dec 25, 2020

    https://bugs.python.org/issue42736 filed to track adding Linux prctl() support.

    @gpshead
    Copy link
    Member Author

    gpshead commented Dec 25, 2020

    Another preexec_fn use to consider:

     resource.setrlimit(resource.RLIMIT_CORE, (XXX, XXX))

    Using an intermediate shell script wrapper that changes the rlimit and exec's the actual process is also an alternative.

    @gpshead
    Copy link
    Member Author

    gpshead commented Dec 25, 2020

    I'm also seeing a lot of os.setpgrp() calls, though those are more likely able to use start_new_session to do setsid() as a dropin replacement.

    @gpshead
    Copy link
    Member Author

    gpshead commented Dec 25, 2020

    signal.signal use case:

    Calls to signal.signal(x, y) to sometimes to set a non SIG_DFL behavior on a signal. SIGINT -> SIG_IGN for example.

    I see a lot of legacy looking code calling signal.signal in prexec_fn that appears to set SIG_DFL for the signals that Python otherwise modifies. Which restore_signals=True should already be doing.

    @gpshead
    Copy link
    Member Author

    gpshead commented Dec 25, 2020

    Doing the code inspection of existing preexec_fn= users within our codebase at work is revealing. But those seem to be the bulk of uses.

    I expect this deprecation to take a while. Ex: if we mark it as PendingDeprecationWarning in 3.10, I'd still wait until _at least_ 3.13 to remove it.

    Code using it often has a long legacy and may be written to run on a wide variety of Python versions. It's painful to do so when features you need in order to stop using it are still only in modern versions.

    @gpshead gpshead added the stdlib Python modules in the Lib dir label Dec 25, 2020
    @vstinner
    Copy link
    Member

    Using an intermediate shell script wrapper that changes the rlimit and exec's the actual process is also an alternative.

    IMO using Python is more portable than relying on a shell.

    I dislike relying on a shell since shells are not really portable (behave differently), unless you restrict yourself to a strict POSIX subset of the shell programming language. While '/bin/sh' is available on most Unix, Android uses '/system/bin/sh', and Windows and VxWorks have no shell (Windows provides cmd.exe which uses Batch programming language, and there are other scripting languages like VBS or PowerShell: so you need a complete different implementation for Windows).

    For the oslo.concurrency project, I wrote the Python script prlimit.py: a wrapper calling resource.setrlimit() and then execute a new program. It's similar to the Unix prlimit program, but written in Python to be portable (the "prlimit" program is not available on all platforms).

    https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/prlimit.py

    I suggest to not provide a builtin wrapper to replace preexec_fn, but suggest replacements in the subprocess and What's New in Python 3.11 documentation (explain how to port existing code).

    More generally, the whole preeexc_fn feature could be reimplemented a third-party project by spawning a *new* Python process, run the Python code, and *then* execute the final process. The main feature of preexec_fn is to give the ability to run a function of the current process, whereas what I'm discussing would be code written as a string.

    --

    preexec_fn can be used for non-trivial issues like only sharing some Windows HANDLE, see:
    https://www.python.org/dev/peps/pep-0446/#only-inherit-some-handles-on-windows

    Note: This specific problem has been solved the proper way in Python by adding support for PROC_THREAD_ATTRIBUTE_HANDLE_LIST in subprocess.STARTUPINFO: lpAttributeList['handle_list'] parameter.

    @vstinner
    Copy link
    Member

    I just created bpo-42738: "subprocess: don't close all file descriptors by default (close_fds=False)".

    @gpshead
    Copy link
    Member Author

    gpshead commented Dec 27, 2020

    "using Python is more portable than relying on a shell."

    Not in environments I use. :) There isn't an installed python interpreter that can be executed when deploying Python as an embedded interpreter such as anyone using pyoxidizer or similar. Plus "using python" means adding a Python startup time delay to anything that triggered such an action. That added latency isn't acceptable in some situations.

    When I suggest a workaround for something as involving an intermediate shell script, read that to mean "the user needs an intermediate program to do this complicated work for them - how is up to them - we aren't going to provide it from the stdlib". A shell script is merely one easy pretty-fast solution - in environments where that is possible.

    TL;DR - there's no one size fits all solution here. But third party libraries could indeed implement any/all of these options including abstracting how and what gets used when if someone wanted to do that.

    @serhiy-storchaka
    Copy link
    Member

    Would not be more consistent with other parameters to name the new parameter "pgid" or "process_group"?

    And why not use None as default, like for user and group?

    @gpshead
    Copy link
    Member Author

    gpshead commented Aug 28, 2021

    A worthwhile general suggestion on a new path forward for the mess of things between (v)fork+exec from Victor is over in https://bugs.python.org/issue42736#msg383869

    TL;DR creating a subprocess.Preexec() recording object with specific interfaces for things that can be done, an instance of which gets passed in and the recorded actions are done as appropriate.

    @gpshead gpshead added 3.11 only security fixes and removed 3.10 only security fixes labels Aug 28, 2021
    @gpshead
    Copy link
    Member Author

    gpshead commented Oct 1, 2021

    Another use case someone had for preexec_fn popped up today:

     prctl(PR_SET_PDEATHSIG, SIGTERM)

    @markmentovai
    Copy link
    Mannequin

    markmentovai mannequin commented Mar 16, 2022

    Another use case for preexec_fn: establishing a new controlling terminal, typically in conjunction with start_new_session=True. A preexec_fn may do something like

    os.close(os.open(os.ttyname(sys.stdin.fileno()), os.O_RDWR))
    

    with discussion at https://chromium-review.googlesource.com/c/chromium/src/+/3524204/comments/59f03e7c_f103cd7e.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    gpshead added a commit that referenced this issue May 5, 2022
    One more thing that can help prevent people from using `preexec_fn`.
    
    Also adds conditional skips to two tests exposing ASAN flakiness on the Ubuntu 20.04 Address Sanitizer Github CI system. When that build is run on more modern systems the "problem" does not show up. It seems ASAN implementation related.
    
    Co-authored-by: Zackery Spytz <zspytz@gmail.com>
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    @gpshead
    Copy link
    Member Author

    gpshead commented May 5, 2022

    calling setpgid() is a common post-fork task that would need to be an explicit parameter to Popen

    Landed as process_group= on Popen for 3.11.

    vstinner added a commit that referenced this issue May 27, 2022
    _posixsubprocess: add a static assertion to ensure that the pid_t
    type is signed.
    
    Replace _Py_IntegralTypeSigned() with _Py_IS_TYPE_SIGNED().
    @gpshead gpshead added 3.13 bugs and security fixes and removed 3.11 only security fixes labels Sep 22, 2023
    vain added a commit to bundlewrap/bundlewrap that referenced this issue Nov 10, 2023
    Not very relevant for performance here, but preexec_fn is about to be
    deprecated, so let's just remove it while we're at it.
    
    python/cpython#82616
    @charmoniumQ
    Copy link

    How would you refactor the following code without using preexec_fn:

    import os
    import subprocess
    import select
    
    read_from_parent, write_to_child = os.pipe()
    
    def preexec_fn():
        print(f"Child waiting for ready (select)")
        select.select([read_from_parent], [], [])
        print("Child waiting for ready (read)")
        os.read(read_from_parent)
        os.close(read_from_parent)
    
    proc = subprocess.Popen(
        ["pwd"],
        close_fds=True,
        pass_fds=(read_from_parent,),
        preexec_fn=preexec_fn,
    )
    
    print("Parent has control here")
    os.close(read_from_parent)
    
    # do setup actions (maybe start tracers)
    # these actions depend on proc.pid
    # so we start the proc in a "paused" state, get the pid, and then tell it to go when the parent is ready
    
    # Ok, we're ready
    os.write(write_to_child, b"ready")
    os.close(write_to_child)
    

    (untested)

    @gpshead
    Copy link
    Member Author

    gpshead commented Feb 14, 2024

    How would you refactor the following code without using preexec_fn:

    Creating your own intermediate executable that does the same thing as your preexec code before calling exec to replace itself with the ultimate desired process rather than expecting it to be done in the restricted environment of process between fork+exec should work well for things like that.

    @gpshead gpshead removed the 3.13 bugs and security fixes label Feb 15, 2024
    @hauntsaninja
    Copy link
    Contributor

    My work codebase has a fair amount of preexec_fn use, more than half of it is prctl PR_SET_PDEATHSIG

    @vstinner
    Copy link
    Member

    Would it make sense to have a portable and generic way to:

    • spawn a first Python "preexec" process which runs arbitrary code.
    • then this process exec() the target process.

    Rather than preexec_fn=<function>, it would be something like preexec_code: str.

    We should design carefully "preexec" about signals, threads, file descriptor inheritance, but also report properly exec() errors to the parent process!

    The main difference is that preexec_fn is called between fork() and exec(), rather than preexec_code would be called after fork()+exec("python").

    @astrand
    Copy link
    Contributor

    astrand commented Mar 26, 2024

    Hi all. As the original subprocess module author, I'm a bit concerned over the expansion of the API. The number of arguments for the subprocess.Popen constructor is, to be honest, crazy. I am grateful for all work and bugfixes that has been done over the years, but the API is now quite far from the my original vision of a "small and clean API", which was the reason to "start from scratch", thus create the subprocess module.

    Removing preexec_fn and then adding additional arguments for cases where preexec_fn was used will further contribute to the "bloat" of the API, if I may use that word.

    I do understand the technical difficulties, but as always there are multiple alternatives. Better documentation as well as #82616 (comment) are very interesting, as I see it.

    @gpshead
    Copy link
    Member Author

    gpshead commented Mar 26, 2024

    Major kudos for its creation @astrand! Your work largely succeeded! If it had not, it would've needed replacing again. =)

    The original overall subprocess public API design was sound with one major exception: preexec_fn= was a conflicted with the existence of threads, thus why I label it as a fundamentally flawed feature. Yet its existence also allowed for a lot of use cases that were otherwise unimplemented by subprocess and would've resulted in people coming up with even worse custom (read: buggy x1000) workarounds. So despite the problem, please don't consider this a criticism. It made the most sense at the time.

    The remaining conundrum:

    A library using the subprocess API with preexec_fn= is often not be owned by the same people owning code elsewhere in a broader application that uses threads. Both code owners believe they are right to do what they did. Afterall, they were just using documented public APIs with an expectation that they just work thanks to Python. Python programmers are often operating at a high level and not expected to need to know low level system programming details for what appear to them to be normal mundane tasks like spawning a process or using threads.

    If told they cannot expect do either they will have no good options that satisfy their needs. Their mere existence of both within the same process is the source of conflict. Use of threading is far more important to the world than realistic preexec_fn= use cases, thus nobody is advising "nobody should use threads, a minority of people won't be able to do these esoteric things with child processes easily!". 😄

    In absence of ways to replace the functionality people use it for, preexec_fn= continues to exist, just with the giant warning about the random deadlocks that will happen to end users of such unfortunate applications when the entire application stack is not tightly controlled.

    Thus our dirty plethora of options added to replace many common things people were using preexec_fn= for. Those also greatly increase subprocess launching performance (done in C, vfork can be used, etc). We've provided for some common cases. More exist. There is no way for us to predict and provide everything.

    This is the largest detractor to "small and clean" that I hear users complain about today. It is true! But most people do not need these options in daily life. So that is perhaps best considered a documentation organization issue.

    @gpshead
    Copy link
    Member Author

    gpshead commented Mar 26, 2024

    Stepping back from a "What do users actually need?" point of view, nothing has provided that:

    • Users need to be able to make subsets of a pretty long set of possible system calls with options before the ultimate child process is exec'd.
    • Users want doing that to perform well.
    • It is not possible to use arbitrary Python code and meet both of the above two constraints.

    Idea from above: Launching an intermediate Python interpreter process to run the preexec_fn code.

    This is a good idea in terms of code compatibility. It would add a large Python interpreter startup time to the child process launch. But functionality wise it would work and the caveat can be documented. It also requires that a sys.executable interpreter binary to launch exists (excluding users of embedded Python, and possibly users running under an emulator? A deliberate choice we could make).

    Brainstorming further Idea: A smaller binary to execute as the intermediate is one possibility. A microscopic tiny Python-ish interpreter with a restricted set of APIs (no import capability, presume os and sys exist, etc) sufficient to just allow the kinds of system calls people need to make? I don't like the complexity, but it'd be entertaining to see how feasible this is. People can already manually use things like /bin/sh for some fraction of pieces of this concept today (very system dependent).

    @astrand
    Copy link
    Contributor

    astrand commented Mar 31, 2024

    @gpshead thanks for your comments. preexec_fn is just a way to execute code between fork and exec, so as far as I can tell, the problem is exactly the same if you manually code fork() and exec() and some code in between. I assume that neither fork() nor exec() will be removed from the Python API, so why should preexec_fn be removed? As I understand it, Python 3.12 already has a mechanism in place to raise a DeprecationWarning if threads are detected. Why not use the same approach for preexec_fn?

    @gpshead
    Copy link
    Member Author

    gpshead commented Mar 31, 2024

    The same warning can and should be added there (I'll get to it).

    @eli-schwartz
    Copy link
    Contributor

    I've been trying to figure out if I can change from a preexec_fn that does os.setuid "to irrevocably drop permissions", to the user= kwarg. It makes for fairly obtuse reading, since the subprocess module documentation somewhat vaguely says "the setreuid() system call will be made" but not what arguments the low-level description of syscalls will be called with, and my underlying goal was actually to just "irrevocably switch to that user" (a high-level goal).

    I think it is safe "depending on your version of POSIX conformance" assuming that cpython passes the user kwarg twice to setreuid, (rather than 3 times to setresuid ;)). Do I understand correctly?

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants