-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazingly faster UTF-8 decoding #58943
Comments
I propose a complex patch, which significantly speeds up UTF-8 decoding. Now decoder faster even decoder in 3.2 (except in a few unreal patological cases). Also the decoder code reduced and simplified (formerly decoding code was repeated in at least three places). As a side effect ASCII decoding now faster on some platforms (bpo-14419). Related issues: Here are the results of benchmarking (numbers is speed in MB/s). On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:
utf-8 'A'*10000 1199 (+69%) 1721 (+18%) 2032 ascii 'A'*10000 233 (+971%) 1876 (+33%) 2496 On 32-bit Linux, Intel Atom N570 @ 1.66GHz:
utf-8 'A'*10000 345 (+81%) 596 (+5%) 623 ascii 'A'*10000 132 (+499%) 758 (+4%) 791 |
64-bit Linux, Intel Core i5 2500K:
utf-8 'A'*10000 2550 (+198%) 6828 (+11%) 7607 |
Thank your, Antoine. Finally Intel Core is defeated! If someone wants to repeat tests, see benchmark tools in bpo-14624. |
The patch updated in accordance with Antoine cosmetic comments. |
There's a Mac-specific portion in the patch, it would be nice if someone could check that it works. |
It would be good if someone checked on Macs work with command line arguments, including non-valid utf8. The difficulty is that you need to check on both Macs with 16-bit and with 32-bit wchar_t. |
bpo-4388 is related to this Mac-specific portion of the patch. |
Actually, it should be enough to run the test suite, since we should |
I hacked the code (commented out "#if __APPLE__" in |
I just ran the test suite ("python -m test") on OS X 10.6.8 with 'decode_utf8_5.patch' applied. (64-bit --with-pydebug build of Python.) No test failures. test header: == CPython 3.3.0a3+ (default:840cb46d0395+, May 9 2012, 20:55:18) [GCC 4.2.1 (Apple Inc. build 5664)] Fragment of configure output relevant to wchar looked like this: checking wchar.h usability... yes |
I don't think that the size of wchar_t is configurable: it should always be 32 bits on Mac OS X. |
New changeset e08c3791f035 by Antoine Pitrou in branch 'default': |
The patch is now committed. Well done and thanks for your contribution. |
Thanks Martin for review, which has allowed me to make a quality patch, and for promotion of further research. Thanks Antoine for review, benchmarks, commit, and for the original optimization, which served as the basis for my patch. |
If the commit makes Python 3.3 faster than Python 3.2, it is an |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: