Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POpen bufsize=0 ignored with universal_newlines=True #85394

Open
ydirson mannequin opened this issue Jul 6, 2020 · 9 comments
Open

POpen bufsize=0 ignored with universal_newlines=True #85394

ydirson mannequin opened this issue Jul 6, 2020 · 9 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes topic-IO type-bug An unexpected behavior, bug, or error

Comments

@ydirson
Copy link
Mannequin

ydirson mannequin commented Jul 6, 2020

BPO 41222
Nosy @gpshead, @pitrou, @eryksun, @ydirson
PRs
  • gh-85394: new tests for demonstrating polling problem on subprocess with universal_newlines=True #25859
  • Files
  • testproc-unbuffered.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-07-06.19:54:02.188>
    labels = ['3.10', 'type-bug', '3.9', 'expert-IO', '3.11']
    title = 'POpen bufsize=0 ignored with universal_newlines=True'
    updated_at = <Date 2021-12-22.13:01:55.073>
    user = 'https://github.com/ydirson'

    bugs.python.org fields:

    activity = <Date 2021-12-22.13:01:55.073>
    actor = 'eryksun'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['IO']
    creation = <Date 2020-07-06.19:54:02.188>
    creator = 'yann'
    dependencies = []
    files = ['49299']
    hgrepos = []
    issue_num = 41222
    keywords = ['patch']
    message_count = 6.0
    messages = ['373165', '392714', '392795', '392840', '409016', '409031']
    nosy_count = 4.0
    nosy_names = ['gregory.p.smith', 'pitrou', 'eryksun', 'yann']
    pr_nums = ['25859']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue41222'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @ydirson
    Copy link
    Mannequin Author

    ydirson mannequin commented Jul 6, 2020

    On a POpen object created with bufsize=0, stdout.readline() does a buffered reading with python3, whereas in 2.7 it did char-by-char reading. See attached example.

    As a result, a poll including the stdout object suffers a behaviour change when stdout is ready for writing and there is more than one line of data available. In both cases we get notified by poll() that data is available on the fd and we can stdout.readline() and get back to our polling loop. Then:

    • with python2 poll() then returns immediately and stdout.readline() will then return the next line

    • with python3 poll() now blocks

    Running the attached example under strace reveals the underlying difference:

     write(4, "go\n", 3)                     = 3
     poll([{fd=5, events=POLLIN|POLLERR|POLLHUP}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "x", 1)                         = 1
    -read(5, "\n", 1)                        = 1
    -fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x2), ...}) = 0
    +read(5, "xxxxxxxxxxxx\nyyyyyyyyyyyyyyy\naaa"..., 8192) = 74
     write(1, ">xxxxxxxxxxxx\n", 14)         = 14

    We can see a buffered read, which explains the behaviour difference.

    Changing to bufsize=1, strace does not show a difference here.

    This is especially troubling, as the first note in https://docs.python.org/3/library/io.html#class-hierarchy mentions that even in buffered mode there is an unoptimized readline() implementation.

    @ydirson ydirson mannequin added 3.7 (EOL) end of life 3.8 only security fixes topic-IO type-bug An unexpected behavior, bug, or error labels Jul 6, 2020
    @ydirson
    Copy link
    Mannequin Author

    ydirson mannequin commented May 2, 2021

    With upcoming 3.10 phasing out 2.7 compatibility I have to find a solution to this, so I'm back digging here.

    Even .read(1) on a subprocess pipe causes an underlying buffered read, so working around the problem by a loop of 1-byte reads has to do with os.read(), though its usage on file-like object is discouraged.

    It looks like one of those would be needed, depending on the expected semantics of POpen's bufsize parameter:

    • use the provided bufsize for the underlying buffering
    • provide a dummy pipe fd through fileno(), feeding it data as long as a read() call leaves data in the underlying buffer (indeed a simple conditional 1-byte read or write to the pipe before returning to caller should provide the correct semantics)

    @ydirson ydirson mannequin added 3.9 only security fixes labels May 2, 2021
    @ydirson
    Copy link
    Mannequin Author

    ydirson mannequin commented May 3, 2021

    Relevant commits include this one from v3.1.4:

    commit 877766d
    Author: Antoine Pitrou <solipsis@pitrou.net>
    Date: Sat Mar 19 17:00:37 2011 +0100

    Issue bpo-11459: A `bufsize` value of 0 in subprocess.Popen() really creates
    unbuffered pipes, such that select() works properly on them.
    

    I can't use that commit without cherry-picking this one from v3.2.2, though:

    commit e96ec68
    Author: Antoine Pitrou <solipsis@pitrou.net>
    Date: Sat Jul 23 21:46:35 2011 +0200

    Issue bpo-12591: Allow io.TextIOWrapper to work with raw IO objects (without
    a read1() method), and add an undocumented *write_through* parameter to
    mandate unbuffered writes.
    

    And my test script still shows the same behaviour, with poll.poll() or poll.select().

    The fact that my stdout object has no read1() and needs the above patch looks like a good lead for further investigation?

    @ydirson
    Copy link
    Mannequin Author

    ydirson mannequin commented May 3, 2021

    The fact that my stdout object has no read1() and needs the above patch looks like a good lead for further investigation?

    That's linked to universal_newlines, the bug only shows when that flag is set.

    Testcases provided in #25859

    @ydirson ydirson mannequin changed the title Undocumented behaviour change of POpen.stdout.readine with bufsize=0 or =1 POpen bufsize=0 ignored with universal_newlines=True May 3, 2021
    @ydirson ydirson mannequin changed the title Undocumented behaviour change of POpen.stdout.readine with bufsize=0 or =1 POpen bufsize=0 ignored with universal_newlines=True May 3, 2021
    @iritkatriel iritkatriel added 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Dec 12, 2021
    @pitrou
    Copy link
    Member

    pitrou commented Dec 22, 2021

    Hmm, sorry for not responding earlier.

    Buffering is necessary for implementing the universal_newlines behaviour (I don't know how we could do otherwise?). This has the unavoidable side effect that the Python buffered file object is not in sync with the underlying file descriptor, so that using p.stdout in a select call will give you inaccurate information.

    So it seems like this is perhaps a documentation issue. What do you think?

    @eryksun
    Copy link
    Contributor

    eryksun commented Dec 22, 2021

    Buffering is necessary for implementing the universal_newlines

    Why is that? I can see that it requires newline state tracking, and the allowance to make two read(fd, &c, 1) system calls for a single read(1) method call, in case a "\n" has to be ignored.

    testproc-unbuffered.py runs to completion in 3.11 if the following statement that changes the text wrapper's chunk size is added right after creating the Popen() instance:

        if sys.version_info[0] > 2:
            process.stdout._CHUNK_SIZE = 1

    The initial chunk size for a text wrapper is hard coded as 8192 bytes. For some reason the constructor has no parameter for it.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @ydirson
    Copy link

    ydirson commented Jun 10, 2022

    Where can we go from here ?

    Buffering is necessary for implementing the universal_newlines

    Why is that?

    I must say I'm puzzled by this statement as well, since readline did work together with universal newlines in python2.

    @ydirson
    Copy link

    ydirson commented Jun 17, 2022

    Let's summarize my understanding of the situation (don't hesitate to correct any misinterpretation):

    • Popen exposes file-like objects for which no limitation is documented against their use with .readline() and poll()
    • When the pipe is openned with universal_newlines=True the underlying read() syscall disregards the bufsize=0 setting, while with universal_newlines=False it is honored
    • setting _CHUNK_SIZE = 1 as @eryksun suggests works like a charm, and does not report a problem with universal_newlines

    @pitrou python2 does not have such a discrepancy, yet it seems it had no problem to implement the universal_newlines behaviour. Can you elaborate ? Do I just avoid any universal_newlines because I'm essentially just running on Linux ?

    @ydirson
    Copy link

    ydirson commented Nov 15, 2022

    The _CHUNK_SIZE = 1 approach shows the limits of using internals, when starting to type-check with mypy:

    error: Item "IO[str]" of "Optional[IO[str]]" has no attribute "_CHUNK_SIZE"  [union-attr]
    error: Item "None" of "Optional[IO[str]]" has no attribute "_CHUNK_SIZE"  [union-attr]
    

    This can be worked around by piling another kludge:

        assert process.stdout          # teach typechecker to rule out the `None` option
        process.stdout._CHUNK_SIZE = 1 # type: ignore[attr-defined]
    

    A longer-term solution would be nicer, obviously :)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants