New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TODO in tailmatch(): it does not support backward in all cases #60485
Comments
Oh oh, it looks like the implementation of tailmatch() was not finished: /* If both are of the same kind, memcmp is sufficient */
if (kind_self == kind_sub) {
return ...;
}
/* otherwise we have to compare each character by first accesing it */
else {
/* We do not need to compare 0 and len(substring)-1 because
the if statement above ensured already that they are equal
when we end up here. */
/* TODO: honor direction and do a forward or backwards search */
for (i = 1; i < end_sub; ++i) {
if (PyUnicode_READ(kind_self, data_self, offset + i) !=
PyUnicode_READ(kind_sub, data_sub, i))
return 0;
}
return 1;
} |
The result does not depend on the direction of comparison. This only affects speed. But who can to say in which direction comparison will be faster? Here I see a one obvious opportunity for optimization:
After that and after processing the case (kind_self == kind_sub) only 3 special cases left: UCS1 in UCS2, UCS1 in UCS4, and UCS2 in UCS4. Get rid of slow PyUnicode_READ() for this cases will speed up the code. Also note that comparing first and last characters before memcmp can be a slowdown (because PyUnicode_READ() is slow). Try to compare first and last bytes. |
Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function can fail. |
But it does. .. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \ Return 1 if substr matches |
Oh, I read the "documentation" in unicodeobject.h: /* Return 1 if substr matches str[start:end] at the given tail end, 0 The problem is that tailmatch() returns 0 if PyUnicode_READY() failed. |
New changeset 49eb2488145d by Victor Stinner in branch 'default': |
"Here I see a one obvious opportunity for optimization: ..." @serhiy: Can you please open a new issue for this? I consider the issue as fixed: I just removed the TODO (for the reason explained in the changeset). |
Shouldn't this be applied to 3.3? As for optimization, I made some benchmarks and didn't saw any significant difference. Usually this function used to check short ASCII heads and tails and any optimization will not be seen even under a microscope. |
It's just a cleanup, it doesn't fix any real bug. I prefer to not
Ok, agreed. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: