-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fixes Bug #73913 #2303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes Bug #73913 #2303
Conversation
The strrpos and strripos-functions where handling a negative offset in a way that was different from strpos. Especially negative offsets that where smaller than the length of the needle caused illogical results. This commit changes the way the negative offset is handled. A negative offset will cause the haystack to be truncated of offset characters at the end and after that the last occurrence of needle is looked for. That is the bahaviour that is described in the manual and that is used in strpos. The fix caused several tests to break as they where testing for wrong values. That means that this change IS a BC-break. But as the previous results where not really consistent and in the most cases not what users would expect that should not stop us from merging this into the next release.
@heiglandreas The failing tests look related, can you check that. I haven't actually reviewed the patch, starting with green CI is good though ;) |
I'm already at it :) |
Found that not only strrpos and strripos where wrong but mb_strrpos (and presumably mb_strripos) also: https://3v4l.org/HhgQf Essence: with a negative offset a needle will be found when the offset is the start of the needle.
Or am I missing something? |
Please advise before I start to disassemble |
The offset handling for strrpos is pretty stupid in that it positive and negative offsets have a totally different meaning ... however, I don't think that the current handling of negative offsets is incorrect per-se: There are two ways the end offsets of a reverse search function may be interpreted: It can be the last position at which the needle may match (what strrpos in PHP currently implements) or it may be the position prior to which the entirety of the needle must be contained. If I'm reading spec/docs right, all of JS, C++, C#, Java use the same behavior. Python uses the second behavior (though they also have a different signature where this is clear). As such, I do not think anything should change here. |
Addresses Bug #73913
Thanks for the feedback @nikic. When I read your comment right, the negative offset should be the last position where the needle may match. But from https://3v4l.org/s1Obf it looks like currently (at least in
Which in my eyes means there needs something to change here. |
The mb_strrpos output looks exactly like what I would expect. You get a match while the (effective) offset is >= 13, which is the position the character occurs at. |
It's not what I'd expect: The search will start 8 characters from the end of the string searching backwards. "ßdämpfer" are 8 characters from the end of the string. Searching backwards means searching in the remainder of the string. IMO "ß" is not part of the remainder anymore. The thing becomes more interesting when you search for a needle with more than one character. Have a look at https://3v4l.org/cGsHK where the search should start 8 characters from the string. Let's assume that even includes the "ß" (which I'd say it's not), the result now is definitely wrong as the string "ßd" definitely is out of the scope of the search. Nevertheless it is accepted and returns the offset 13. |
I get what you mean, but this is simply not the behavior that PHP (and JS, C++, C#, Java) implement. As language is finicky, lets put it in code: For an effective offset of |
To phrase it another way, for an effective offset of |
Thanks for the clarification! That seems mathematically correct. But from a logical POV I'd expect something else. Have a look at https://3v4l.org/lH9uA to see the implications that solution has... But from what I've seen so far in Java the implementation is the same. For me that means that the description in the docs is highly misleading and needs updating. I'll adapt the related issue then. Thanks! |
The strrpos and strripos-functions where handling a negative offset in a way that was different from strpos.
Especially negative offsets that where smaller than the length of the needle caused illogical results.
This commit changes the way the negative offset is handled. A negative offset will cause the haystack to be truncated of offset characters at the end and after that the last occurrence of needle is looked for. That is the bahaviour that is described in the manual and that is used in strpos.
The fix caused several tests to break as they where testing for wrong values. That means that this change IS a BC-break. But as the previous results where not really consistent and in the most cases not what users would expect that should not stop us from merging this into the next release.
Fixes #73913