Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextIOWrapper Buffering Inconsistent Between _io and _pyio #52203

Open
amcnabb mannequin opened this issue Feb 18, 2010 · 6 comments
Open

TextIOWrapper Buffering Inconsistent Between _io and _pyio #52203

amcnabb mannequin opened this issue Feb 18, 2010 · 6 comments
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@amcnabb
Copy link
Mannequin

amcnabb mannequin commented Feb 18, 2010

BPO 7955
Nosy @birkenfeld, @amauryfa, @pitrou
Files
  • testpyio.py
  • testio.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2010-02-18.06:41:15.794>
    labels = ['type-bug', '3.8', '3.9', 'expert-IO', '3.10', 'docs']
    title = 'TextIOWrapper Buffering Inconsistent Between _io and _pyio'
    updated_at = <Date 2020-11-16.21:51:41.938>
    user = 'https://bugs.python.org/amcnabb'

    bugs.python.org fields:

    activity = <Date 2020-11-16.21:51:41.938>
    actor = 'iritkatriel'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation', 'IO']
    creation = <Date 2010-02-18.06:41:15.794>
    creator = 'amcnabb'
    dependencies = []
    files = ['16248', '16249']
    hgrepos = []
    issue_num = 7955
    keywords = []
    message_count = 5.0
    messages = ['99496', '99497', '99519', '99520', '99521']
    nosy_count = 4.0
    nosy_names = ['georg.brandl', 'amaury.forgeotdarc', 'pitrou', 'amcnabb']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7955'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @amcnabb
    Copy link
    Mannequin Author

    amcnabb mannequin commented Feb 18, 2010

    The following snippet behaves differently in the C IO implementation than in the Python IO implementation:

      import sys
      sys.stdout.write('unicode ')
      sys.stdout.buffer.write(b'bytes ')

    To test this, I have created two scripts, testpyio.py (using _pyio) and testio.py (using _io). The output is as follows:

    % python3 testpyio.py
    unicode bytes
    % python3 testio.py
    bytes unicode
    %

    In my opinion, the behavior exhibited by _pyio is more correct. It appears that to get the C implementation to print the lines in the correct order, there must be a flush in between the statements. This extra flush would create a lot of overhead.

    I am attaching the two test scripts.

    The C implementation prints the output in the correct order if each write ends with a newline.

    @amcnabb amcnabb mannequin added topic-IO type-bug An unexpected behavior, bug, or error labels Feb 18, 2010
    @amauryfa
    Copy link
    Member

    This is by design, for performance the C TextIOWrapper stores the encoded strings in a list, and calls buffer.write() less often.
    You may try to add
    stdout._CHUNK_SIZE = 1
    and get the _pyio behavior.

    @amcnabb
    Copy link
    Mannequin Author

    amcnabb mannequin commented Feb 18, 2010

    This seems like a common need (particularly for stdout and stderr), and setting stdout._CHUNK_SIZE = 1 is relying on an implementation detail.

    1. Can the documentation for TextIOWrapper be updated to clearly describe this extra buffering (how often buffer.write is called, etc.)?

    2. Can there be a flush-like method, say write_to_buffer() to force a buffer.write() without the overhead of a flush?

    @amcnabb amcnabb mannequin reopened this Feb 18, 2010
    @pitrou
    Copy link
    Member

    pitrou commented Feb 18, 2010

    I agree this deserves documentation. I'm not convinced it's a common need, though. Usually you either use stdin/stdout in binary mode or in text mode, but you don't interleave both quite frequently.

    @pitrou pitrou added the docs Documentation in the Doc dir label Feb 18, 2010
    @pitrou pitrou removed the invalid label Feb 18, 2010
    @amcnabb
    Copy link
    Mannequin Author

    amcnabb mannequin commented Feb 18, 2010

    I would imagine that this would come up in most programs that read data from a pipe or from a socket (which are binary data) and then output to stdout or stderr. I ran across the problem in my first non-trivial port to Python 3, and it seems like a common case to me.

    But having the weird behavior documented is the most important thing.

    @admin admin mannequin assigned docspython and unassigned birkenfeld Oct 29, 2010
    @iritkatriel iritkatriel added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes labels Nov 16, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @serhiy-storchaka
    Copy link
    Member

    This issue was opened and last time updated in February 2010. The write_through parameter for TextIOWrapper() was added in Python 3.3 released in September 2012. Specifying write_through=True makes the TextIOWrapper writing immediately to the underlying binary stream.

    But standard streams are opened with write_through=True only if Python was run in the unbuffered mode. The correctness of the Python program should not rely on the running options, and the unbuffered mode may have more significant impact on the performance. Should the standard streams always be created with write_through=True? Or maybe make it the default and require explicitly passing write_through=False if you want to enable buffering in TextIOWrapper?

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants