-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster utf-8 decoding #49118
Comments
Here is a patch to speedup utf8 decoding. On a 64-bit build, the maximum The patch may look disturbingly trivial, and I haven't studied the [side note: utf8 encoding is still much faster than decoding, but it may The same principle can probably be applied to the other decoding (*) the benchmark I used is: ./python -m timeit -s "import More complex input also gets a speedup, albeit a smaller one (~10%): ./python -m timeit -s "import |
Can you please upload it to Rietveld? I'm skeptical about changes that merely rely on the compiler's register |
As I said I don't think it's due to register allocation, but simply I've open a Rietveld issue here: http://codereview.appspot.com/11681 |
Ha, the patch makes things slower on MSVC. The patch can probably be (and interestingly, MSVC produces 40% faster code than gcc on my |
On 2009-01-07 16:25, Antoine Pitrou wrote:
I'm +1 on anything that makes codecs faster :-) However, the patch should be checked with some other compilers
|
Reopening and attaching a more ambitious patch, based on the The worst case (tight interleaving of ASCII and non-ASCII chars) shows a (performance measured with gcc and MSVC) |
Very nice! It seems that you can get slightly faster by not copying the Does this idea apply to the encode function as well? |
Thanks!
Probably, although with less efficiency (a long can hold 1, 2 or 4 |
Attached patch adds acceleration for latin1 and utf16 decoding as well. All three codecs (utf8, utf16, latin1) are now in the same ballpark (unpatched, it is between 150 and 500MB/s. depending on the codec) |
(PS : performance measured on UCS-2 and UCS-4 builds with gcc, and under |
Antoine Pitrou wrote:
A few style comments: * please use indented #pre-processor directives whenever possible, e.g.
#if
# define
#else
# define
#endif
Please also add a comment somewhere to the bit masks explaining what Thanks,Marc-Andre Lemburg ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 |
Marc-Andre, this patch should address your comments. |
Antoine Pitrou wrote:
Thanks. Much better ! BTW: I'd also change the variable name "word" to something |
I committed the patch with the last suggested change (word -> data) in |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: