Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_readline.test_nonascii fails on Android #73183

Closed
xdegaye mannequin opened this issue Dec 17, 2016 · 14 comments
Closed

test_readline.test_nonascii fails on Android #73183

xdegaye mannequin opened this issue Dec 17, 2016 · 14 comments
Labels
3.7 tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@xdegaye
Copy link
Mannequin

xdegaye mannequin commented Dec 17, 2016

BPO 28997
Nosy @xdegaye, @vadmium
Files
  • tracing_the_completer.patch
  • tracing_the_completer_2.patch
  • readline_multibyte.patch
  • readline_multibyte.v2.patch
  • readline.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-11-08.14:16:57.799>
    created_at = <Date 2016-12-17.09:02:49.699>
    labels = ['3.7', 'type-bug', 'tests']
    title = 'test_readline.test_nonascii fails on Android'
    updated_at = <Date 2017-11-08.14:16:57.797>
    user = 'https://github.com/xdegaye'

    bugs.python.org fields:

    activity = <Date 2017-11-08.14:16:57.797>
    actor = 'xdegaye'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-11-08.14:16:57.799>
    closer = 'xdegaye'
    components = ['Tests']
    creation = <Date 2016-12-17.09:02:49.699>
    creator = 'xdegaye'
    dependencies = []
    files = ['45954', '45956', '45976', '46014', '46237']
    hgrepos = []
    issue_num = 28997
    keywords = ['patch']
    message_count = 14.0
    messages = ['283476', '283544', '283555', '283563', '283564', '283708', '283906', '283953', '284976', '285103', '285533', '285728', '285736', '305848']
    nosy_count = 2.0
    nosy_names = ['xdegaye', 'martin.panter']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28997'
    versions = ['Python 3.6', 'Python 3.7']

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Dec 17, 2016

    test_nonascii has been implemented in bpo-16182.

    The error message:
    ======================================================================
    FAIL: test_nonascii (test.test_readline.TestReadline)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/sdcard/org.bitbucket.pyona/lib/python3.7/test/test_readline.py", line 203, in test_nonascii
        self.assertIn(b"text 't\\xeb'\r\n", output)
    AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\303\\257nserted]|t\x07\x08\x08\x08\x08\x08\x08\x08\x07\x07xrted]|t\x08\x08\x08\x08\x08\x08\x08\x07\r\nresult \'[\\xefnsexrted]|t\'\r\nhistory \'[\\xefnsexrted]|t\'\r\n")

    @xdegaye xdegaye mannequin self-assigned this Dec 17, 2016
    @xdegaye xdegaye mannequin added 3.7 tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Dec 17, 2016
    @vadmium
    Copy link
    Member

    vadmium commented Dec 18, 2016

    This is a breakdown of running the test script on my Linux setup (UTF-8 locale):

    ^A^B^B^B^B^B^B^B\t\tx\t\r\n
    Input echoed back (before Readline disables echo)
    [\xc3\xafnserted]
    Inserted by pre_input_hook()
    |t\xc3\xab[after]
    Inserted by the Ctrl+A macro
    \x08\x08\x08\x08\x08\x08\x08
    Response to moving the cursor left of [after]
    text \'t\\xeb\'\r\n
    Encoded completer(text=...) parameter for testing
    line \'[\\xefnserted]|t\\xeb[after]\'\r\n
    get_line_buffer() result for testing
    indexes 11 13\r\n
    get_begidx(), get_endidx()
    \x07
    Rings terminal bell due to multiple possible completions
    text . . . line . . . indexes . . .\r\n
    From second round of completer() calls due to second Tab input
    substitution \'t\\xeb\'\r\n
    matches [\'t\\xebnt\', \'t\\xebxt\']\r\n
    display() parameters
    x[after]\x08\x08\x08\x08\x08\x08\x08
    Response to inserting “x”
    t[after]\x08\x08\x08\x08\x08\x08\x08
    Completion of "t\xEBxt"
    \r\n
    Response to inputting Return
    result \'[\\xefnserted]|t\\xebxt[after]\'\r\n
    input() function call return value
    history \'[\\xefnserted]|t\\xebxt[after]\'\r\n
    Line as retrieved from history

    The problem is that the Ctrl+A macro seems to have only inserted the two ASCII characters "|t" and has stopped at the non-ASCII character "\xEB". Everthing else relies on the macro working properly to get the right cursor positioning and completion text.

    Xavier: If you run the following code in an interactive Python session, what does pressing $ print out?

    import readline
    readline.parse_and_bind('Control-a: "|t\xEB[after]"')
    readline.parse_and_bind('"$": dump-macros')

    For me, it prints this: (includes “te” with an umlaut)

    \C-a outputs |të[after]

    What locale encoding does Python use for you? The parse_and_bind() call should be encoding "\xEB" with PyUnicode_EncodeLocale(), and passing the resulting string to Readline. Then Readline is supposed to insert the string when we invoke the macro. Somewhere the string is getting truncated.

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Dec 18, 2016

    For me, it prints this: (includes “te” with an umlaut)
    I got this same result when testing on Android with the interactive interpreter.

    See the content of the 'completer' list below, when run with tracing_the_completer.patch:
    On linux:
    test_nonascii (test.test_readline.TestReadline) ... bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\xc3\xafnserted]|t\xc3\xab[after]\x08\x08\x08\x08\x08\x08\x08text \'t\\xeb\'\r\nline \'[\\xefnserted]|t\\xeb[after]\'\r\nindexes 11 13\r\ntext \'t\\xeb\'\r\nline \'[\\xefnserted]|t\\xeb[after]\'\r\nindexes 1113\r\nsubstitution \'t\\xeb\'\r\nmatches [\'t\\xebnt\', \'t\\xebxt\']\r\nx[after]\x08\x08\x08\x08\x08\x08\x08t[after]\x08\x08\x08\x08\x08\x08\x08\r\nresult \'[\\xefnserted]|t\\xebxt[after]\'\r\nhistory \'[\\xefnserted]|t\\xebxt[after]\'\r\ncompleter [\'t\xc3\xab-0\', \'GOT 1\', \'t\xc3\xab-1\', \'GOT 2\', \'t\xc3\xab-2\', \'t\xc3\xab-0\', \'GOT 1\', \'t\xc3\xab-1\', \'GOT 2\', \'t\xc3\xab-2\', \'t\xc3\xabx-0\', \'GOT 3\', \'t\xc3\xabx-1\']\r\n")

    On Android:
    AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\303\\257nserted]|t\x07\x08\x08\x08\x08\x08\x08\x08\x07\x07xrted]|t\x08\x08\x08\x08\x08\x08\x08\x07\r\nresult \'[\\xefnsexrted]|t\'\r\nhistory \'[\\xefnsexrted]|t\'\r\ncompleter [\'\xc3\xafnse-0\', \'\xc3\xafnse-0\', \'\xc3\xafnsex-0\']\r\n")

    b'\xc3\xab' is the UTF-8 code for '\xEB'
    b'\xc3\xaf' is the UTF-8 code for '\xEF'
    It seems that the completer is triggered by the wrong key.

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Dec 18, 2016

    FWIW on Android we have:

    >>> readline.__doc__ and "libedit" in readline.__doc__
    False
    >>> getattr(readline, "set_pre_input_hook", None)
    <built-in function set_pre_input_hook>

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Dec 18, 2016

    Checking that the the problem is indeed an encoding problem on Android with readline.set_pre_input_hook(): with tracing_the_completer_2.patch that replaces '\xEB' with 'A', the test now succeeds and prints:

    test_nonascii (test.test_readline.TestReadline) ... bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\303\\257nserted]|tA[after]\x08\x08\x08\x08\x08\x08\x08text \'tA\'\r\nline \'[\\xefnserted]|tA[after]\'\r\nindexes 11 13\r\n\x07text \'tA\'\r\nline \'[\\xefnserted]|tA[after]\'\r\nindexes 11 13\r\nsubstitution \'tA\'\r\nmatches [\'tAnt\', \'tAxt\']\r\nx[after]\x08\x08\x08\x08\x08\x08\x08t[after]\x08\x08\x08\x08\x08\x08\x08\r\nresult \'[\\xefnserted]|tAxt[after]\'\r\nhistory \'[\\xefnserted]|tAxt[after]\'\r\ncompleter [\'tA-0\', \'GOT 1\', \'tA-1\', \'GOT 2\', \'tA-2\', \'tA-0\', \'GOT 1\', \'tA-1\', \'GOT 2\', \'tA-2\', \'tAx-0\', \'GOT 3\', \'tAx-1\']\r\n")
    ok

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Dec 20, 2016

    The test is ok on Android when LANG=en_US.UTF-8.
    When LANG is not set, the check made by the first statement in test_nonascii() should skip the test but fails to skip it on Android as Py_EncodeLocale() always encode to utf8 whatever the locale (same as with darwin).
    This patch adds a second check to verify that readline does handle multi bytes characters and does skip the test if not. So on Android the test is skipped now when LANG is not set. Another solution would have been to set LANG=en_US.UTF-8 in the environment of the process forked by run_pty(), but it does not sound robust as there may be other setups where readline does not handle multi bytes characters.

    @vadmium
    Copy link
    Member

    vadmium commented Dec 23, 2016

    The basic idea of your patch may be reasonable, but something is not right. Imagine the locale is something other than UTF-8. The input code will now contain mojibake print("\xC3\xAB"), although the decode() call will translate the result back to the expected "\xEB".

    I suggest this 2nd version of the patch. I used io.TextIOWrapper to use the locale encoding, and incorporated the other test character "\xEF". I also changed the preliminary test to call input() instead of relying on the interactive interpreter, quiet mode, etc.

    Just to clarify, is the problem that Python (correctly) assumes UTF-8 encoding on Android, but Readline does not unless you tweak the environment variable? I.e. is Readline assuming ASCII or something and ignoring the test characters? If so, it sounds like this may be a more general problem with Gnu tools and libraries on Android.

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Dec 24, 2016

    With your patch Martin, test_nonascii is skipped on Android when LANG is not set and the issue is fixed.

    Just to clarify, is the problem that Python (correctly) assumes UTF-8 encoding on Android, but Readline does not unless you tweak the environment variable?

    Readline (correctly) uses the locale environment variables to set eight-bit mode (see function _rl_init_eightbit() in Readline nls.c) and when the LC_CTYPE category is C or POSIX, eight-bit characters are discarded on input and a bell character ('\x07') is echoed instead.

    @vadmium
    Copy link
    Member

    vadmium commented Jan 8, 2017

    .
    Thanks for the explanation. It sounds like the Readline library assumes an ASCII-only locale and sets its “convert-meta” variable to “on”. But Python assumes UTF-8 and inputs b"\xC3\xAB" to the terminal. Readline converts the input to two escape sequences: "\N{ESC}\x43" == "\N{ESC}C" (Alt + Capital C), which probably runs the “capitalize-word” command, and "\N{ESC}\x2B" == "\N{ESC}+" (Alt + Plus), which presumably generates the bell character.

    I don’t understand why you say Readline is “correctly” using the C or Posix locale (ASCII), while my understanding is Python on Android always uses UTF-8 as the locale encoding. It seems there is an inconsistency with the locale or encodings being used.

    Or is this just an obscure case that you choose not to support on Android, and therefore skip the test?

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Jan 10, 2017

    Trying to track the 'char * text' pointer that is returned to Python by readline in rlhandler() and using gdb 'set follow-fork-mode child' and 'set detach-on-fork off' to stop (with a breakpoint in rlhandler) in the child spawned by run_pty() in test_nonascii, unfortunately this fails with gdb crashing in a debug session on Android. So I use the attached readline.patch with good old printf instead. The results:

    (1) UTF-8 locale:
    command: LANG=en_US.UTF-8 python -m test -v -m test_nonascii test_readline
    stderr output:
    char *text: "[ïnserted]|tëxt[after]"
    (2) C locale
    command: python -m test -v -m test_nonascii test_readline
    stderr output:
    char *text: "[ïnsexrted]|t"
    (3) C locale and a change in 'script' to not use "\xEB" in 'macro':
    s/macro = "|t\xEB[after]"/macro = "|te[after]"/
    command: python -m test -v -m test_nonascii test_readline
    stderr output:
    char *text: "[ïnserted]|tex[after]"

    Case (2) confirms your comment in msg283544: "the Ctrl+A macro seems to have only inserted the two ASCII characters "|t" and has stopped at the non-ASCII character "\xEB".
    So readline with the C locale on Android:

    • Accepts eight-bits characters in rl_pre_input_hook().
    • Discards all the characters after the first eight-bits character in macro expansion.
    • Discards eight-bits characters on input.
      It seems reasonable to consider that readline is broken with the C locale on Android and to skip the test in that case whether using your last patch Martin, or when test.support.is_android is True.

    I don't know if this is relevant, but on archlinux (not on debian), when I run bash and therefore readline, with the C locale (on xterm or on the linux console, using my french keyboard) I have the following results:

    [xavier@bilboquet ~]$ LANG= bash
    [xavier@bilboquet ~]$ # first prompt
    bash: $'\303': command not found
    [xavier@bilboquet ~]$ A # second prompt
    bash: $'\303A\251': command not found
    [xavier@bilboquet # third prompt
    [xavier@bilboquet ~]$

    On the first prompt I type 'é' followed by the <backspace> key and the <return> key.
    On the second prompt I type 'é' followed by <left-arrow>, followed by 'A' and <return>.
    On the third prompt I type 4 times 'é' followed by 8 times <backspace> and <return>, note how the prompt has been eaten by the backspaces.

    @xdegaye xdegaye mannequin removed their assignment Jan 10, 2017
    @vadmium
    Copy link
    Member

    vadmium commented Jan 16, 2017

    So the problem seems to be that Python assumes Readline’s encoding is UTF-8, but Readline actually uses ASCII (depending on locale variables). The code at the start of the test is supposed to catch when add_history() calls PyUnicode_EncodeLocale() and fails.

    I don’t understand the details of UTF-8 vs locale on Android, but maybe we could adjust the encode() and decode() implementations in Modules/readline.c, to account for the Readline library’s idea of the locale encoding. Or maybe we could adjust the temporary setlocale() calls in Modules/readline.c.

    If you are happy to declare the Readline library is broken on Android, I now think I would prefer to skip the test based on support.is_android, rather than the previous patches. Otherwise, we risk masking genuine test failures on other platforms. Something like:

    @unittest.skipIf(is_android,
    "Gnu Readline disagrees about the locale encoding on Android")
    def test_nonascii(self):
    try:
    readline.add_history("\xEB\xEF")
    ...

    When you run “LANG= bash”, it is only Bash and Readline that gets the C locale; the terminal is unchanged. I presume the terminal inputs é as two UTF-8 bytes, but Readline with the C locale is not aware of UTF-8, and assumes the two bytes are two separate characters.

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Jan 18, 2017

    PEP-538 [1] coerces the C locale to UTF-8 by setting the locale environment variables (LC_ALL, LANG). The PEP has an implementation at bpo-28180 as pep538_coerce_legacy_c_locale_v3.diff, and the patch fixes test_nonascii when run on the Android emulators (api 21 and 24). This is as expected as now both Python and Readline are in agreement and use the same encoding.
    I think we should wait for the resolution of PEP-538 and of bpo-28684. If this does not happen, I agree we should skip the test with the message you are proposing Martin.

    [1] https://www.python.org/dev/peps/pep-0538

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Jan 18, 2017

    I think we should wait for the resolution of PEP-538 and of bpo-28684

    s/issue 28684/issue 28180/

    @xdegaye
    Copy link
    Mannequin Author

    xdegaye mannequin commented Nov 8, 2017

    PEP-538 is implemented now in the master branch.

    Closing this issue as out of date for the following reasons:

    a) PR 4334 adds locale coercion for Android to the implementation of PEP-538 (and test_nonascii succeeds with this PR).

    b) With the current implementation of PEP-538 and without PR 4334 (thus lacking locale coercion on Android), test_nonascii also does not fail on Android.

    Conclusion:
    Because of b) the reason that test_nonascii was failing is not a problem of environment variables since locale coercion is missing.
    The readline library does indeed look up the "LANG" and "LC_ALL" locale environment variables first and if one is set the library uses it to set LC_CTYPE [1]. But when none of these environment variables is found, then readline gets the locale by calling setlocale (LC_CTYPE, (char *)NULL). In the current implementation of PEP-538 test_nonascii succeeds now in b) because setlocale(LC_ALL, "C.UTF-8") is called now by Python upon starting up when __ANDROID__ is defined.

    [1] search for _rl_init_eightbit () and _rl_get_locale_var in git.savannah.gnu.org/cgit/readline.git/tree/nls.c

    @xdegaye xdegaye mannequin closed this as completed Nov 8, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant