-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster UTF-16 encoding #59231
Comments
In pair to bpo-14624 here is a patch than speed up UTF-16 encoding in several times. In addition, it fixes an unsafe check of an integer overflow. Here are the results of benchmarking. See benchmark tools in https://bitbucket.org/storchaka/cpython-stuff repository. On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz: Py2.7 Py3.2 Py3.3 patched 457 (+575%) 458 (+573%) 1077 (+186%) 3083 encode utf-16le 'A'*10000 447 (+507%) 493 (+450%) 1086 (+150%) 2712 encode utf-16be 'A'*10000 |
Here are results under 64-bit Linux on a Core i5-2500K: 3.3 patched 3327 (+360%) 15304 encode utf-16le 'A'*10000 3237 (+562%) 21422 encode utf-16be 'A'*10000 |
Thank you, Antoine.
It must be a fluctuation (-30-40%). For all UCS1 strings the same code
This is most likely the fluctuation too. Code for non-BMP characters is On 32-bit Linux, Intel Atom N570 @ 1.66GHz: Py2.7 Py3.2 Py3.3 patched 273 (+229%) 274 (+227%) 333 (+169%) 897 encode utf-16le 'A'*10000 274 (+152%) 275 (+151%) 334 (+107%) 690 encode utf-16be 'A'*10000 |
Serhiy, the tests crash here in debug mode: $ ./python -m test -v test_unicode
== CPython 3.3.0a4+ (default:b17c8005e08a+, Jun 15 2012, 19:28:56) [GCC 4.5.2]
== Linux-2.6.38.8-desktop-10.mga-x86_64-with-mandrake-1-Official little-endian
== /home/antoine/cpython/default/build/test_python_2567
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1)
[1/1] test_unicode
test_formatter_field_name_split (test.test_unicode.StringModuleTest) ... ok
test_formatter_parser (test.test_unicode.StringModuleTest) ... ok
test___contains__ (test.test_unicode.UnicodeTest) ... ok
test_additional_rsplit (test.test_unicode.UnicodeTest) ... ok
test_additional_split (test.test_unicode.UnicodeTest) ... ok
test_ascii (test.test_unicode.UnicodeTest) ... ok
test_aswidechar (test.test_unicode.UnicodeTest) ... ok
test_aswidecharstring (test.test_unicode.UnicodeTest) ... ok
test_bug1001011 (test.test_unicode.UnicodeTest) ... ok
test_bytes_comparison (test.test_unicode.UnicodeTest) ... ok
test_capitalize (test.test_unicode.UnicodeTest) ... ok
test_casefold (test.test_unicode.UnicodeTest) ... ok
test_center (test.test_unicode.UnicodeTest) ... ok
test_codecs (test.test_unicode.UnicodeTest) ... python: Objects/unicodeobject.c:5401: _PyUnicode_EncodeUTF16: Assertion `(Py_uintptr_t)(((((((((PyObject*)(v))->ob_type))->tp_flags & ((1L<<27))) != 0)) ? (void) (0) : __assert_fail ("((((((PyObject*)(v))->ob_type))->tp_flags & ((1L<<27))) != 0)", "Objects/unicodeobject.c", 5401, __PRETTY_FUNCTION__)), (((PyBytesObject *)(v))->ob_sval)) & 1 == 0' failed.
Fatal Python error: Aborted Current thread 0x00007faa4980e700: |
My fault. It's operator precedence issue in the assert expression. Gcc Objects/unicodeobject.c: In function ‘_PyUnicode_EncodeUTF16’: Here is a fixed patch. |
New changeset acca141fda80 by Antoine Pitrou in branch 'default': |
Thank you for the quick turnaround! The patch is now pushed in 3.3. |
It would be nice to mention the improvement in the What's New in Python 3.3 doc (Optimizations section). |
New changeset 35667fc5f785 by Antoine Pitrou in branch 'default': |
Thank you for pushing. :-) Are you interested in a faster UTF-32 codec? |
Not much :) I know you posted issues on that, but I think UTF-32 is |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: