-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster utf-16 decoder #58829
Comments
I propose a patch, which accelerates the utf-16 decoder. With PEP-393 utf-16 decoder slowed down a few times (3-4x), this patch returns the performance at the level of Python 3.2 and even higher (+10-30% over 3.2). In addition, it fixes a few bugs in the utf-16 decoder. Also as a side effect is possible acceleration of other decoders. |
See also bpo-14625 for UTF-32 decoder. |
See also issue bpo-14579 for utf-16 decoder bugs. |
Serhiy: can you please submit a contributor form? |
Here are the results of benchmarking (numbers in MB/s). On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:
utf-16le 'A'*10000 504 (+282%) 1905 (+1%) 565 (+241%) 1927 utf-16be 'A'*10000 504 (+284%) 1553 (+24%) 469 (+312%) 1933 On 32-bit Linux, Intel Atom N570 @ 1.66GHz:
utf-16le 'A'*10000 136 (+417%) 584 (+20%) 184 (+282%) 703 utf-16be 'A'*10000 136 (+331%) 441 (+33%) 166 (+253%) 586 |
64 bit Linux, Intel Core i5-2500K @ 3.30GHz:
utf-16le 'A'*10000 1384 (+278%) 5233 utf-16be 'A'*10000 1268 (+313%) 5240 |
New changeset 830eeff4fe8f by Victor Stinner in branch 'default': |
Here is updated patch, taking into account that unicode_widen is already |
The patch updated to stylistic conformity of the UTF-8 decoder. The decoding of the UCS2 non-surrogate characters a little speed up (+15%). |
New performance figures under 64 bit Linux, Intel Core i5-2500K @ 3.30GHz:
utf-16le 'A'*10000 1411 (+290%) 5504 utf-16be 'A'*10000 1341 (+298%) 5342 |
The patch updated with a little clarified code and added comments. |
Here are two new patch. Checking for characters out-of-range moved, |
New changeset cdcc816dea85 by Antoine Pitrou in branch 'default': |
Thank you Serhiy! Now committed. |
Thank you, Antoine. Now only bpo-14625 waits for review.
In fact now UTF-16 decoding faster for a maximum of +25% compared to Python 3.2 on my computers (and sometimes a little slower yet). 2x to 4x it is faster compared to former slow-downed Python 3.3 (thanks to PEP-393). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: