Skip to content

Conversation

heiglandreas
Copy link
Contributor

The strrpos and strripos-functions where handling a negative offset in a way that was different from strpos.

Especially negative offsets that where smaller than the length of the needle caused illogical results.

This commit changes the way the negative offset is handled. A negative offset will cause the haystack to be truncated of offset characters at the end and after that the last occurrence of needle is looked for. That is the bahaviour that is described in the manual and that is used in strpos.

The fix caused several tests to break as they where testing for wrong values. That means that this change IS a BC-break. But as the previous results where not really consistent and in the most cases not what users would expect that should not stop us from merging this into the next release.

Fixes #73913

The strrpos and strripos-functions where handling a negative offset in
a way that was different from strpos.

Especially negative offsets that where smaller than the length of the
needle caused illogical results.

This commit changes the way the negative offset is handled. A negative
offset will cause the haystack to be truncated of offset characters at
the end and after that the last occurrence of needle is looked for. That
is the bahaviour that is described in the manual and that is used in
strpos.

The fix caused several tests to break as they where testing for wrong
values. That means that this change IS a BC-break. But as the previous
results where not really consistent and in the most cases not what users
would expect that should not stop us from merging this into the next
release.
@krakjoe krakjoe added the Bug label Jan 13, 2017
@krakjoe
Copy link
Member

krakjoe commented Jan 13, 2017

@heiglandreas The failing tests look related, can you check that.

I haven't actually reviewed the patch, starting with green CI is good though ;)

@heiglandreas
Copy link
Contributor Author

I'm already at it :)

@heiglandreas
Copy link
Contributor Author

Found that not only strrpos and strripos where wrong but mb_strrpos (and presumably mb_strripos) also: https://3v4l.org/HhgQf

Essence: with a negative offset a needle will be found when the offset is the start of the needle.

(mb_)strrpos('testabc', 'a', 3); will return 4 instead of the expected false. Why is false expected? testabc with an offset of -3 will search within a substring of testa instead of test. Though 3 characters from the end would be abc and not just bc.

Or am I missing something?

@heiglandreas
Copy link
Contributor Author

Please advise before I start to disassemble mb_strrpos… 😄

@nikic
Copy link
Member

nikic commented Jan 13, 2017

The offset handling for strrpos is pretty stupid in that it positive and negative offsets have a totally different meaning ... however, I don't think that the current handling of negative offsets is incorrect per-se:

There are two ways the end offsets of a reverse search function may be interpreted: It can be the last position at which the needle may match (what strrpos in PHP currently implements) or it may be the position prior to which the entirety of the needle must be contained. If I'm reading spec/docs right, all of JS, C++, C#, Java use the same behavior. Python uses the second behavior (though they also have a different signature where this is clear).

As such, I do not think anything should change here.

Addresses Bug #73913
@heiglandreas
Copy link
Contributor Author

Thanks for the feedback @nikic.

When I read your comment right, the negative offset should be the last position where the needle may match. But from https://3v4l.org/s1Obf it looks like currently (at least in mb_strrpos) the last position where a needle may match is offset - length($needle). And the same behaviour was implemented in strrpos. That seems to be more like Python-behaviour and should be explicitly defined in the docs then. And there it's written

If the value is negative, search will instead start from that many characters from the end of the string, searching backwards.

Which in my eyes means there needs something to change here.

@nikic
Copy link
Member

nikic commented Jan 13, 2017

The mb_strrpos output looks exactly like what I would expect. You get a match while the (effective) offset is >= 13, which is the position the character occurs at.

@heiglandreas
Copy link
Contributor Author

It's not what I'd expect: The search will start 8 characters from the end of the string searching backwards. "ßdämpfer" are 8 characters from the end of the string. Searching backwards means searching in the remainder of the string. IMO "ß" is not part of the remainder anymore.

The thing becomes more interesting when you search for a needle with more than one character. Have a look at https://3v4l.org/cGsHK where the search should start 8 characters from the string. Let's assume that even includes the "ß" (which I'd say it's not), the result now is definitely wrong as the string "ßd" definitely is out of the scope of the search. Nevertheless it is accepted and returns the offset 13.

@nikic
Copy link
Member

nikic commented Jan 13, 2017

I get what you mean, but this is simply not the behavior that PHP (and JS, C++, C#, Java) implement. As language is finicky, lets put it in code: For an effective offset of $lastPos = strlen($haystack) - $offset strrpos returns the the largest $pos such that $pos <= $lastPos && substr($haystack, $pos, strlen($needle)) === $needle. What "search will instead start from that many characters from the end of the string, searching backwards" means is that it will try to match $needle at $lastPos, then try to match it at $lastPos-1, at $lastPos-2, ..., at 0.

@nikic
Copy link
Member

nikic commented Jan 13, 2017

To phrase it another way, for an effective offset of $lastPos = strlen($haystack) - $offset, the meaning is "the return value will be between 0 and $lastPos", and not "the needle will be fully contained in the slice [0..$lastPos] of the string".

@heiglandreas
Copy link
Contributor Author

Thanks for the clarification! That seems mathematically correct. But from a logical POV I'd expect something else. Have a look at https://3v4l.org/lH9uA to see the implications that solution has...

But from what I've seen so far in Java the implementation is the same. For me that means that the description in the docs is highly misleading and needs updating. I'll adapt the related issue then.

Thanks!

mariuswilms added a commit to UnionOfRAD/lithium that referenced this pull request Apr 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants