-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[patch] improve unicode methods: split() rsplit() and replace() #51871
Comments
Content of the patch:
Benchmark coming soon... |
The "split.h" file is missing from your patch. |
You're right. Oups. |
The patch looks wrong for bytearrays. They are mutable, so you shouldn't return the original object as an optimization. Here is the current (unpatched) behaviour: >>> a = bytearray(b"abc")
>>> b, = a.split()
>>> b is a
False On the other hand, you aren't doing this optimization at all in the general case of stringlib_split() and stringlib_rsplit(), while it could be done. |
Mutable methods split() splitlines() and partition() fixed. |
added "Makefile.pre.in". |
There's some reference leaking somewhere... ~ $ ./python Lib/test/regrtest.py -R 2:3: test_unicode |
Refleak fixed in PyUnicode_Splitlines. |
A few comments on coding style:
count = countstring(self_s, self_len,
from_s, from_len,
0, self_len, FORWARD, maxcount); or /* helper macro to fixup start/end slice values */
+#define ADJUST_INDICES(start, end, len) \ and use similar formatting for the replacement function
#define LONG_BITMASK (LONG_BIT-1)
#define BLOOM(mask, ch) ((mask & (1 << ((ch) & LONG_BITMASK)))) LONG_BITMASK has a value of 0x1f (31) - that's a single byte, not When adjusting the value to be platform dependent, please check Note that you don't need to expose that value separately if
Py_ssize_t i, j, count=0;
PyObject *list = PyList_New(PREALLOC_SIZE(maxcount)), *sub; instead, write: Py_ssize_t i, j;
Py_ssize_t count=0;
PyObject *list = PyList_New(PREALLOC_SIZE(maxcount))
PyObject *sub;
+stringlib_split(
+stringlib_splitlines( instead use this style: -static
-{ |
Thank you for your remarks. I will update the patch accordingly.
Since the same value is used to build the mask, I assume it's better to keep the value around (or use (LONG_BIT-1) directly?). s/LONG_BITMASK/BLOOM_BITMASK/ is not confusing? |
I copied the style of "stringlib/partition.h" for this part. |
No, it's ok for stringlib to have its own consistent style and there's no reason to change it IMO. More interesting would be benchmark results showing how much this improves the various methods :-) |
Patch updated:
|
And now, the figures. There's no gain for the string methods. Most significant results: --- bench_slow.log Trunk string unicode ========== late match, 100 characters
-13.30 20.51 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
-16.12 29.88 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
+13.27 14.38 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
+16.19 17.61 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100) ========== quick replace multiple character match ========== quick replace single character match (full benchmark diff is attached) And we save 1000 lines of code cumulated |
I don't think you should remove such blocks: -/*
There probably are people relying on them :-) |
Florent Xicluna wrote:
I'd prefer if you change the coding style to what we use elsewhere See http://www.python.org/dev/peps/pep-0007/ for more C coding |
Eric Smith wrote:
For any new files added, PEP-7 should always be used. For PEP-7-ifying the existing code, we could open a separate ticket or just apply the change as separate patch. |
Fixed a problem with the splitlines optimization: use PyList_Append instead of PyList_SET_ITEM because there's no preallocated list |
The test case for the previous issue. |
This looks generally good. Can you produce a separate patch for py3k? stringobject.c has been replaced with bytesobject.c there. |
Slight update:
|
And the Py3k patch. (note: previous update v4b -> v4c minimize the differences |
Committed in r77461 (trunk), r77462 (py3k). Thank you very much! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: