Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input() builtin always uses "strict" error handler #57551

Closed
stefanholek mannequin opened this issue Nov 4, 2011 · 15 comments
Closed

input() builtin always uses "strict" error handler #57551

stefanholek mannequin opened this issue Nov 4, 2011 · 15 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@stefanholek
Copy link
Mannequin

stefanholek mannequin commented Nov 4, 2011

BPO 13342
Nosy @pitrou, @vstinner, @benjaminp, @ezio-melotti
Files
  • input_readline.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-11-05.23:47:21.706>
    created_at = <Date 2011-11-04.14:37:14.334>
    labels = ['interpreter-core', 'type-bug']
    title = 'input() builtin always uses "strict" error handler'
    updated_at = <Date 2011-11-05.23:47:21.705>
    user = 'https://bugs.python.org/stefanholek'

    bugs.python.org fields:

    activity = <Date 2011-11-05.23:47:21.705>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-11-05.23:47:21.706>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2011-11-04.14:37:14.334>
    creator = 'stefanholek'
    dependencies = []
    files = ['23610']
    hgrepos = []
    issue_num = 13342
    keywords = ['patch']
    message_count = 15.0
    messages = ['147005', '147007', '147008', '147010', '147020', '147029', '147035', '147036', '147038', '147045', '147047', '147049', '147050', '147123', '147124']
    nosy_count = 7.0
    nosy_names = ['pitrou', 'vstinner', 'benjamin.peterson', 'ezio.melotti', 'stefanholek', 'neologix', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue13342'
    versions = ['Python 3.2', 'Python 3.3']

    @stefanholek
    Copy link
    Mannequin Author

    stefanholek mannequin commented Nov 4, 2011

    The input builtin always uses "strict" error handling for Unicode conversions. This means that when I enter a latin-1 string in a utf-8 environment, input breaks with a UnicodeDecodeError. Now don't tell me not to do that, I have a valid use-case. ;-)

    While "strict" may be a good default choice, it is clearly not sufficient. I would like to propose an optional 'errors' argument to input, similar to the 'errors' argument the decode and encode methods have.

    I have in fact implemented such an input method for my own use:
    https://github.com/stefanholek/rl/blob/surrogate-input/rl/input.c

    While this solves my immediate needs, the fact that my implementation is basically just a copy of bltinmode.input with one additional argument, makes me think that this could be fixed in Python proper.

    There cannot be a reason input() should be confined to "strict", or can there? ;-)

    @stefanholek stefanholek mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Nov 4, 2011
    @benjaminp
    Copy link
    Contributor

    There's no reason you couldn't write your own input() function in Python to do this.

    @stefanholek
    Copy link
    Mannequin Author

    stefanholek mannequin commented Nov 4, 2011

    I am not quite sure how I would write a custom, readline-using input function in Python (access to PyOS_Readline seems required), that's why I did it in C. Have an example?

    @pitrou
    Copy link
    Member

    pitrou commented Nov 4, 2011

    There cannot be a reason input() should be confined to "strict", or can
    there? ;-)

    Actually, there's a good reason: in the non-interactive case, input() simply calls sys.stdin.read(), which doesn't have encoding or errors attributes. You want to override sys.stdin so that it has the right error handler.

    However, there is a bug in input() in that it ignores sys.stdin's error handler in interactive mode (where it delegates to the readline library, if present):

    >>> import sys, io
    >>> sys.stdin = io.TextIOWrapper(sys.stdin.detach(), "ascii", "replace")
    >>> sys.stdin.read()
    héhé
    'h��h��\n'
    >>> input()
    héhé
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

    If you don't mind losing GNU readline functionality, the immediate workaround for you is to use sys.stdin.read() directly.

    @pitrou pitrou added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed topic-unicode labels Nov 4, 2011
    @pitrou
    Copy link
    Member

    pitrou commented Nov 4, 2011

    Here is a patch. The bugfix itself is quite pedestrian, but the test is more interesting. I did what I could to fork a subprocess into a pseudoterminal so as to trigger the GNU readline code path. The only limitation I've found is that I'm unable to read further on the child's stdout after input() has been called. The test therefore uses a pipe to do the return checking.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Nov 4, 2011

    The bugfix itself is quite pedestrian, but the test is more interesting.

    Indeed. Looks good to me.

    @stefanholek
    Copy link
    Mannequin Author

    stefanholek mannequin commented Nov 4, 2011

    Thank you Antoine, this looks good.

    However when I try your example I get

    sys.stdin = io.TextIOWrapper(
        sys.stdin.detach(), 'ascii', 'replace')
    ValueError: underlying buffer has been detached

    </helpforum>

    @pitrou
    Copy link
    Member

    pitrou commented Nov 4, 2011

    However when I try your example I get

    sys.stdin = io.TextIOWrapper(
    sys.stdin.detach(), 'ascii', 'replace')
    ValueError: underlying buffer has been detached

    Which version of Python (and which OS?). It works fine here on latest
    3.2 and 3.3 branches.

    @stefanholek
    Copy link
    Mannequin Author

    stefanholek mannequin commented Nov 4, 2011

    This is with Python 3.2.2 on Mac OS X 10.6 (SL). I have built Python from source with: ./configure; make; make install.

    @stefanholek
    Copy link
    Mannequin Author

    stefanholek mannequin commented Nov 4, 2011

    Python 3.2.2 (default, Nov  4 2011, 22:28:55) 
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys, io
    >>> w = io.TextIOWrapper(sys.stdin.detach(), 'ascii', 'replace')
    >>> input
    <built-in function input>
    >>> input()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: underlying buffer has been detached

    @stefanholek
    Copy link
    Mannequin Author

    stefanholek mannequin commented Nov 4, 2011

    Oops, the last one wasn't meant for the bug tracker. <blush>

    @stefanholek
    Copy link
    Mannequin Author

    stefanholek mannequin commented Nov 4, 2011

    I can make it work at the interpreter prompt with your patch applied. Sorry for cluttering up the ticket. ;-)

    @pitrou
    Copy link
    Member

    pitrou commented Nov 4, 2011

    I can make it work at the interpreter prompt with your patch applied.
    Sorry for cluttering up the ticket. ;-)

    That's ok, thanks a lot for testing.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 5, 2011

    New changeset 421c8e291221 by Antoine Pitrou in branch '3.2':
    Issue bpo-13342: input() used to ignore sys.stdin's and sys.stdout's unicode
    http://hg.python.org/cpython/rev/421c8e291221

    New changeset 992ba03d60a8 by Antoine Pitrou in branch 'default':
    Issue bpo-13342: input() used to ignore sys.stdin's and sys.stdout's unicode
    http://hg.python.org/cpython/rev/992ba03d60a8

    @pitrou
    Copy link
    Member

    pitrou commented Nov 5, 2011

    Committed. I hope the test won't disturb the buildbots.

    @pitrou pitrou closed this as completed Nov 5, 2011
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants