New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyUnicode_AsEncodedString, PyUnicode_Decode: add fast-path for "us-ascii" encoding #72125
Comments
The "us-ascii" encoding is an alias to the Python ASCII encoding. PyUnicode_AsEncodedString() and PyUnicode_Decode() functions have a fast-path for the "ascii" string, but not for "us-ascii". Attached patch uses also the fast-path for "us-ascii". It's a more generic change than the issue bpo-27915. The "us-ascii" name is common in the email and xml.etree modules. Other changes of the patch:
|
See also get_standard_encoding() in Python/codecs.c. I suppose it is faster. UTF-32 is rarely used as external encoding, but it is still used as internal encoding in some programming languages and libraries (e.g. wchar_t* in C and std::wstring in C++ on Linux). The codec itself is very fast. I would add fast path all utf encodings (except utf-7). |
New changeset 99818330b4c0 by Victor Stinner in branch 'default': |
I understand that PyCodec_SurrogatePassErrors() is already called with a normalized encoding name. With my enhanced _Py_normalize_encoding(), strange syntaxes like " utf 8 " also take the fast path.
Ok, I used the same design than get_standard_encoding() to match the "utf" prefix, so having a fast-path for UTF-16 and UTF-32 doesn't add new strcmp() for "latin9". I pushed my change, so I close the issue. |
It seems this change is the cause of the Free BSD buildbot failures. From memory, both failing cases involve sending or receiving non-ASCII bytes in child Python processes. ====================================================================== Traceback (most recent call last):
File "/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/Lib/test/test_cmd_line_script.py", line 517, in test_non_ascii
rc, stdout, stderr = assert_python_ok(script_name)
File "/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/Lib/test/support/script_helper.py", line 139, in assert_python_ok
return _assert_python(True, *args, **env_vars)
File "/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/Lib/test/support/script_helper.py", line 125, in _assert_python
err))
AssertionError: Process return code is 1
command line: ['/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/python', '-X', 'faulthandler', '-I', './@test_60885_tmp\udce7w\udcf0.py'] stdout: --- stderr: ====================================================================== Traceback (most recent call last):
File "/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/Lib/test/test_readline.py", line 203, in test_nonascii
self.assertIn(b"text 't\\xeb'\r\n", output)
AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\x07\r\x07\x07\x07\x07\x07\x07\x07\x07x[\x08\x07\r\nresult \'x[\'\r\nhistory \'x[\'\r\n") |
Re-open and assign for regressions. Observed in all koobs-freebsd* buildbots (9/10/11) and build types. Issue is in default branch (add version 3.7) First failing test run: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%20CURRENT%20Non-Debug%203.x/builds/110 |
Koobs if you can, it would be good to understand where the failure is. My guess is that Python doesn’t like running a non-ASCII filename. The following is hopefully a simplified version of the test_cmd_line_script test case: import os, subprocess, sys
script_name = os.fsdecode(b'./\xE7w\xF0.py')
script_file = open(script_name, 'w', encoding='utf-8')
script_file.write('print(ascii(__file__))\n')
script_file.close()
cmd_line = [sys.executable, '-X', 'faulthandler', '-I', script_name]
env = os.environ.copy()
env['TERM'] = ''
proc = subprocess.Popen(cmd_line, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env)
out, err = proc.communicate()
print(proc.returncode) # Should be 0 but Free BSD has 1
print(repr(err)) # Error is about encoding 0xE7 with ASCII
print(repr(out)) # If executed, this would be the file name Hopefully fixing the above problem will help with the test_readline failure. The readline test case does Readline (tab) completions involving non-ASCII text, and it seems that the Python completion routine is no longer being called. |
Sorry, but I don't have enough information to fix the issue. I don't see how my change can break the two failing tests. Could you please try to collect more information manually? |
Maybe Windows buildbots failures are related: ====================================================================== Traceback (most recent call last):
File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_io.py", line 3174, in test_create_at_shutdown_without_encoding
self.assertIn(self.shutdown_error, err.decode())
AssertionError: 'LookupError: unknown encoding: ascii' not found in 'Exception ignored in: <bound method C.__del__ of <__main__.C object at 0x000000000123BF60>>\r\nTraceback (most recent call last):\r\n File "<string>", line 12, in __del__\r\n File "C:\\buildbot.python.org\\3.x.kloth-win64\\build\\lib\\_pyio.py", line 1934, in __init__\r\n File "C:\\buildbot.python.org\\3.x.kloth-win64\\build\\lib\\encodings\\__init__.py", line 158, in _alias_mbcs\r\nImportError: sys.meta_path is None, Python is likely shutting down' |
The Windows buildbot failures are partly my fault and partly Ben's fault (I created a new error message - Ben added it to the wrong test), so I'll go and prevent the error message. No idea on the other issue. It doesn't repro for me, but since it seems to be FreeBSD readline related that isn't a surprise. |
Steve fixed it: Its new search function now catchs ImportError as expected. |
New changeset 3b185df3a3e2 by Victor Stinner in branch 'default': |
@koobs: That's my tiny gift for your birthday. Happy Birthday! ;-) (It should fix FreeBSD buildbots.) |
Sorry for the little breakage of FreeBSD buildbots, it seems to be ok now ;-) |
@victor I was just checking this issue to copy the test command, to provide results to you both when I saw the lovely surprise. Thank you :) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: