Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run() - unified high-level interface for subprocess #67531

Closed
takluyver mannequin opened this issue Jan 28, 2015 · 29 comments
Closed

run() - unified high-level interface for subprocess #67531

takluyver mannequin opened this issue Jan 28, 2015 · 29 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@takluyver
Copy link
Mannequin

takluyver mannequin commented Jan 28, 2015

BPO 23342
Nosy @warsaw, @gpshead, @ncoghlan, @bitdancer, @ethanfurman, @takluyver, @berkerpeksag, @vadmium
Files
  • subprocess_run.patch
  • subprocess_run2.patch
  • subprocess_run3.patch
  • subprocess_run4.patch
  • process.py
  • subprocess_run5.patch
  • subprocess_run6a.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2015-04-26.05:13:16.179>
    created_at = <Date 2015-01-28.22:13:38.931>
    labels = ['type-feature', 'library']
    title = 'run() - unified high-level interface for subprocess'
    updated_at = <Date 2016-05-18.05:14:24.030>
    user = 'https://github.com/takluyver'

    bugs.python.org fields:

    activity = <Date 2016-05-18.05:14:24.030>
    actor = 'ncoghlan'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2015-04-26.05:13:16.179>
    closer = 'gregory.p.smith'
    components = ['Library (Lib)']
    creation = <Date 2015-01-28.22:13:38.931>
    creator = 'takluyver'
    dependencies = []
    files = ['37897', '37899', '37991', '38072', '38075', '38574', '38997']
    hgrepos = []
    issue_num = 23342
    keywords = ['patch']
    message_count = 29.0
    messages = ['234918', '234922', '234923', '234924', '234925', '234927', '235093', '235133', '235134', '235300', '235653', '235654', '235656', '235659', '235663', '235726', '236189', '236319', '237079', '238589', '238591', '240274', '240276', '240277', '240961', '241053', '241054', '242032', '265807']
    nosy_count = 11.0
    nosy_names = ['barry', 'gregory.p.smith', 'ncoghlan', 'r.david.murray', 'cvrebert', 'ethan.furman', 'python-dev', 'takluyver', 'berker.peksag', 'martin.panter', 'Jeff.Hammel']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23342'
    versions = ['Python 3.5']

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Jan 28, 2015

    This follows on from the python-ideas thread starting here: https://mail.python.org/pipermail/python-ideas/2015-January/031479.html

    subprocess gains:

    • A CompletedProcess class representing a process that has finished, with attributes args, returncode, stdout and stderr
    • A run() function which runs a process to completion and returns a CompletedProcess instance, aiming to unify the functionality of call, check_call and check_output
    • CalledProcessError and TimeoutExceeded now have a stderr attribute, to avoid throwing away potentially relevant information.

    Things I'm not sure about:

    1. Should run() capture stdout/stderr by default? I opted not to, for consistency with Popen and with shells.
    2. I gave run() a check_returncode parameter, but it feels quite a long name for a parameter. Is 'check' clear enough to use as the parameter name?
    3. Popen has an 'args' attribute, while CalledProcessError and TimeoutExpired have 'cmd'. CompletedProcess sits between those cases, so which name should it use? For now, it's args.

    @takluyver takluyver mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 28, 2015
    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Jan 28, 2015

    Another question: With this patch, CalledProcessError and TimeoutExceeded exceptions now have attributes called output and stderr. It would seem less surprising for output to be called stdout, but we can't break existing code that relies on the output attribute.

    Using properties, either stdout or output could be made an alias for the other, so both names work. Is this desirable?

    @gpshead gpshead self-assigned this Jan 28, 2015
    @gpshead
    Copy link
    Member

    gpshead commented Jan 28, 2015

    A 1) Opting not to capture by default is good. Let people explicitly request that.

    A 2) "check" seems like a reasonable parameter name for the "should i raise if rc != 0" bool. I don't have any other good bikeshed name suggestions.

    A 3) Calling it args the same way Popen does is consistent. That the attribute on the exceptions is 'cmd' is a bit of an old wart but seems reasonable. Neither the name 'args' or 'cmd' is actually good for any use in subprocess as it is already an unfortunately multi-typed parameter. It can either be a string or it can be a sequence of strings. The documentation is not clear about what type(s) 'cmd' may be.

    A Another) Now that they gain a stderr attribute, having a corresponding stdout one would make sense. Implement it as a property and document it with a versionadded 3.5 as usual.

    @ethanfurman
    Copy link
    Member

    ethanfurman commented Jan 28, 2015

    I haven't checked the code, but does check_output and friends combine stdout and stderr when ouput=PIPE?

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Jan 28, 2015

    Updated patch following Gregory's suggestions:

    • The check_returncode parameter is now called check. The method on CompletedProcess is still check_returncode, though.
    • Clarified the docs about args
    • CalledProcessError and TimeoutExceeded gain a stdout property as an alias of output

    Ethan: to combine stdout and stderr in check_output, you need to pass stderr=subprocess.STDOUT - it doesn't assume you want that.

    I did consider having a simplified interface so you could pass e.g. capture='combine', or capture='stdout', but I don't think the brevity is worth the loss of flexibility.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 28, 2015

    Ethan: check_output combines them when stdout=subprocess.STDOUT is passed (
    https://docs.python.org/3.5/library/subprocess.html#subprocess.STDOUT).
    Never pass stdout=PIPE or stderr= PIPE to call() or check*() methods as
    that will lead to a deadlock when a pipe buffer fills up. check_output()
    won't even allow you pass in stdout as it needs to set that to PIPE
    internally, but you could still do the wrong thing and pass stderr=PIPE
    without it warning you.

    the documentation tells people not to do this. i don't recall why we
    haven't made it warn or raise when someone tries. (but that should be a
    separate issue/change)

    On Wed Jan 28 2015 at 3:30:59 PM Ethan Furman <report@bugs.python.org>
    wrote:

    Ethan Furman added the comment:

    I haven't checked the code, but does check_output and friends combine
    stdout and stderr when ouput=PIPE?

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue23342\>


    @vadmium
    Copy link
    Member

    vadmium commented Jan 31, 2015

    Maybe you don’t want to touch the implementation of the “older high-level API” for fear of subtly breaking something, but for clarification, and perhaps documentation, would the old functions now be equivalent to this?

    def call(***):
        # Verify PIPE not in (stdout, stderr) if needed
        return run(***).returncode
    def check_call(***):
        # Verify PIPE not in (stdout, stderr) if needed
        run(***, check=True)
    def check_output(***):
        # Verify stderr != PIPE if needed
        return run(***, check=True, stdout=PIPE)

    If they are largely equivalent, perhaps simplify the documentation of them in terms of run(), and move them closer to the run() documentation.

    Is it worth making the CalledProcessError exception a subclass of CompletedProcess? They seem to be basically storing the same information.

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Jan 31, 2015

    Yep, they are pretty much equivalent to those, except:

    • check_call has a 'return 0' if it succeeds
    • add '.stdout' to the end of the expression for check_output

    I'll work on documenting the trio in those terms.

    If people want, some/all of the trio could also be implemented on top of run(). check_output() would be the most likely candidate for this, since I copied that code to create run(). I'd probably leave call and check_call as separate implementations to avoid subtle bugs, though.

    Sharing inheritance between CalledProcessError and CompletedProcess: That would mean that either CompletedProcess is an exception class, even though it's not used as such, or CalledProcessError uses multiple inheritance. I think duplicating a few attributes is preferable to having to think about multiple inheritance, especially since the names aren't all the same (cmd vs args, output vs stdout).

    @vadmium
    Copy link
    Member

    vadmium commented Jan 31, 2015

    It’s okay to leave them as independent classes, if you don’t want multiple inheritance. I was just putting the idea out there. It is a similar pattern to the HTTPError exception and HTTPResponse return value for urlopen().

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Feb 2, 2015

    Third version of the patch (subprocess_run3):

    • Simplifies the documentation of the trio (call, check_call, check_output) to describe them in terms of the equivalent run() call.
    • Remove a warning about using PIPE with check_output - I believe this was already incorrect, since check_output uses .communicate() internally, it shouldn't have deadlock issues.
    • Replace the implementation of check_output() with a call to run().

    I didn't reimplement call or check_call - as previously discussed, they are more different from the code in run(), so subtly breaking things is more possible. They are also simpler.

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Feb 10, 2015

    Would anyone like to do further review of this - or commit it ;-) ?

    I don't think anyone has objected to the concept since I brought it up on python-ideas, but if anyone is -1, please say so.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 10, 2015

    Have you seen the code review comments on the Rietveld, <https://bugs.python.org/review/23342\>? (Maybe check spam emails.) Many of the comments from the earlier patches still stand. In particular, I would like to see the “input” default value addressed, at least for the new run() function, if not the old check_output() function.

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Feb 10, 2015

    Aha, I hadn't seen any of those. They had indeed been caught by the spam filter. I'll look over them now.

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Feb 10, 2015

    Fourth version of patch, responding to review comments on Rietveld. The major changes are:

    • Eliminated the corner case when passing input=None to run() - now it's a real default parameter. Added a shim in check_output to keep it behaving the old way in case anything is relying on it, but I didn't document it.
    • The docstring of run() was shortened quite a bit by removing the examples.
    • Added a whatsnew entry

    I also made various minor fixes - thanks to everyone who found them.

    @JeffHammel
    Copy link
    Mannequin

    JeffHammel mannequin commented Feb 10, 2015

    A few observations in passing. I beg your pardon for not commenting after a more in depth study of the issue, but as someone that's written and managed several subprocess module front-ends, my general observations seem applicable.

    subprocess needs easier and more robust ways of managing input and output streams

    subprocess should have easier ways of managing input: file streams are fine, but plain strings would also be nice

    for string commands, shell should always be true. for list/Tupperware commands, shell should be false. in fact you'll get an error if you don't ensure this. instead, just have what is passed key execution (for windows, I have no idea. I'm lucky enough not to write windows software these days)

    subprocess should always terminate processes on program exit robustly (unless asked not too). I always have a hard time figuring out how to get processes to terminate, and how to have them not to. I realize POSIX is black magic, to some degree.

    I'm attaching a far from perfect front end that I currently use for reference

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Feb 11, 2015

    Jeff: This makes it somewhat easier to handle input and output as strings instead of streams. Most of the functionality was already there, but this makes it more broadly useful. It doesn't especially address your other points, but I'm not aiming to completely overhaul subprocess.

    for string commands, shell should always be true. for list/Tupperware commands, shell should be false

    I wondered why this is not the case before, but on Windows a subprocess is actually launched by a string, not a list. And on POSIX, a string without shell=True is interpreted like a one-element list, so you can do e.g. Popen('ls') instead of Popen(['ls']). Changing that would probably break backwards compatibility in unexpected ways.

    @bitdancer
    Copy link
    Member

    bitdancer commented Feb 18, 2015

    string vs list: see bpo-6760 for some background. Yes, I think it is an API bug, but there is no consensus for fixing it (it would require a deprecation period).

    Jeff: in general your points to do not seem to be apropos to this particular proposed enhancement, but are instead addressing other aspects of subprocess and should be dealt with in other targeted issues.

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Feb 20, 2015

    Can I interest any of you in further review? I think I have responded to all comments so far. Thanks!

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Mar 2, 2015

    Is there anything further I should be doing for this?

    @vadmium
    Copy link
    Member

    vadmium commented Mar 20, 2015

    One thing that just popped into my mind that I don’t think has been discussed: The patch adds the new run() function to subprocess.__all__, but the CompletedProcess class is still missing. Was that an oversight or a conscious decision?

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Mar 20, 2015

    Thanks, that was an oversight. Patch 5 adds CompletedProcess to __all__.

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Apr 8, 2015

    I am still keen for this to move forwards. I am at PyCon if anyone wants to discuss it in person.

    @gpshead
    Copy link
    Member

    gpshead commented Apr 8, 2015

    I'm at pycon as well, we can get this taken care of here. :)

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Apr 8, 2015

    Great! I'm free after my IPython tutorial this afternoon, all of tomorrow, and I'm around for the sprints.

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Apr 14, 2015

    6a following in-person review with Gregory:

    • Reapplied to the updated codebase.
    • Docs: mention the older functions near the top, because they'll still be important for some time.
    • Docs: Be explicit that combined stdout/stderr goes in stdout attribute.
    • Various improvements to code style

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 14, 2015

    New changeset f0a00ee094ff by Gregory P. Smith in branch 'default':
    Add a subprocess.run() function than returns a CalledProcess instance for a
    https://hg.python.org/cpython/rev/f0a00ee094ff

    @gpshead
    Copy link
    Member

    gpshead commented Apr 14, 2015

    thanks! i'll close this later after some buildbot runs and any post-commit reviews.

    @takluyver
    Copy link
    Mannequin Author

    takluyver mannequin commented Apr 25, 2015

    I expect this can be closed now, unless there's some post-commit review somewhere that needs addressing?

    @gpshead gpshead closed this as completed Apr 26, 2015
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented May 18, 2016

    This change has made the subprocess docs intimidating and unapproachable again - this is a *LOWER* level swiss-army knife API than the 3 high level convenience functions.

    I've filed http://bugs.python.org/issue27050 to suggest changing the way this is documented to position run() as a mid-tier API that's more flexible than the high level API, but still more convenient than accessing subprocess.Popen directly.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants