Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize stripos #7852

Closed
wants to merge 2 commits into from
Closed

Conversation

iluuu1994
Copy link
Member

@iluuu1994 iluuu1994 commented Dec 29, 2021

Closes GH-7847
Closes GH-7852

Previously stripos would lowercase both the haystack and needle to reuse
strpos. The approach in this PR is similar to strpos. memchr is highly
optimized so we're using it to search for the first character of the
needle in the haystack. If we find it we compare the remaining
characters of the needle manually.

The new implementation seems to perform about half as well as strpos (as
two memchr calls are necessary).

@cmb69 Thoughts?

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Zend/zend_operators.h Outdated Show resolved Hide resolved
Zend/zend_operators.h Outdated Show resolved Hide resolved
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this improves some cases, it will regress others. For large strings this makes the search quadratic, while the previous implementation is linear (if you will allow you me the imprecision).

Zend/zend_operators.h Outdated Show resolved Hide resolved
@iluuu1994
Copy link
Member Author

@nikic Yeah, right now it would be quadratic in the worst-case scenario (something like stripos('ab', 'aaaaaaaaaaaaaaaaaAb)). I think we can make it linear by caching the result of each memchr for the next iteration if we haven't passed it yet.

@iluuu1994
Copy link
Member Author

It should be linear now. Performance is still approximately the same for me. Let me know if you think the add complexity is worth it.

@KapitanOczywisty
Copy link

I might be late to the party, but I've tried to improve this with 2 concurrent searches, somewhat close to what you did in the last commit, but I might have slightly better performance overall, and it doesn't dip with modified test:

$haystack = str_repeat('A', 1e+6) . "BCB";
$haystack = str_repeat($haystack, 10) . 'BBB';
//...
stripos($haystack, 'bbb');

Take a look if you will: https://gist.github.com/KapitanOczywisty/5c8c053ceb3ba687cbab41266120dac6

Also I think that more complex algorithm should be used with haystack longer than 10KB or so.

@cmb69
Copy link
Member

cmb69 commented Dec 29, 2021

This is certainly an interesting approach (thanks, @iluuu1994!), but I still have doubts that the performance gain outweighs the added complexity. Is stripos() used often with long haystacks? And even if, that code still could use PCRE instead (i.e. document that in the manual). I also wonder about the performance of something like stripos(str_repeat("abc", 10000) . "abd", "abd"), what appears to ruin the potential optimizations of memchr() (such cases might be rare, though).

Anyhow, if we do this optimization, we also should consider doing it for stristr().

@iluuu1994
Copy link
Member Author

@cmb69 @KapitanOczywisty's version doesn't look too complicated. I'll check if the same can be applied to stristr. I'll also do some benchmarks for various cases (best-, average- and worst-case, real world cases) to see how it compares. I will try a If the difference for real-world cases is negligible I'm happy to drop this PR.

I also wonder about the performance of something like stripos(str_repeat("abc", 10000) . "abd", "abd"), what appears to ruin the potential optimizations of memchr() (such cases might be rare, though).

Yes. Although the same applies for strpos too. The worst case for the current algorithm would be something like stripos(str_repeat("aA", 10000) . "ab", "ab") as memchr but as you mentioned that's probably pretty rare.

@iluuu1994
Copy link
Member Author

iluuu1994 commented Dec 30, 2021

@KapitanOczywisty Thanks for your implementation. There was a small bug here (only p should be checked because not both upper and lower case need to be present for a match). Other than that your implementation was simpler so I mostly copied it 1:1. I just removed the macro to avoid finding the right place to add it / fleshing it out / naming it properly.

@cmb69 It works fine for stristr too, except that php_stristr as a side-effect lowers the haystack and needle passed to it, so I created a new method that does not do that. We could still modify the old method and document the change as this is in master.

I'll create the benchmarks next.

@KapitanOczywisty
Copy link

@iluuu1994 This was fun challenge, since I'm not really working with C and glad I could help.

There was a small bug here (only p should be checked because not both upper and lower case need to be present for a match).

Yeah, I wasn't sure about that part so I left it in the code for people smarter than me to clean up :)

@iluuu1994
Copy link
Member Author

Posted the results in the wrong issue. #7847 (comment)

Anyway, I think this is worth merging because all cases seem to be faster, and the implementation doesn't seem terribly complex.

iluuu1994 added a commit to iluuu1994/php-src that referenced this pull request Jan 3, 2022
Closes phpGH-7847
Closes phpGH-7852

Previously stripos/stristr would lowercase both the haystack and the
needle to reuse strpos. The approach in this PR is similar to strpos.
memchr is highly optimized so we're using it to search for the first
character of the needle in the haystack. If we find it we compare the
remaining characters of the needle manually.

The new implementation seems to perform about half as well as strpos (as
two memchr calls are necessary to find the next candidate).
@iluuu1994
Copy link
Member Author

Any objections to merging this?

Zend/zend_operators.h Outdated Show resolved Hide resolved
ext/standard/string.c Outdated Show resolved Hide resolved

const char first_lower = zend_tolower_ascii(*needle);
const char first_upper = zend_toupper_ascii(*needle);
const char *p_lower = (const char *)memchr(haystack, first_lower, end - haystack);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future scope: I think there may be a way to optimize this with SSE for really long strings, but it may not be worth it, especially with stripos not being used that often.

(both checking if the first character exists case-insensitively, and checking if long needles match)

Definitely not something to add to this PR.

(*needle_byte ^ *haystack_byte) == *needle_mask, where needle_mask is 0xff or 0xcf (~0x20), since 'a' ^ 'A' is 0x20)

For a block of 8/16/32 bytes, that could be checked by:

  1. Computing the needle and mask and extending it to 8 bytes
  2. Bitwise xoring the needle block with the haystack mask
  3. Masking the xored block (0xff or 0xcf, depending on whether this was an alphabet character. 0xff ^ first_lower ^ first_upper)
  4. Checking for 0 bytes in the mask
  5. Combining to a word
  6. Checking if the combination was non-zero
  7. Searching in that block normally for the first character

This would make the worst case of case-insensitive search faster (stripos(str_repeat('A', 1000000), 'a')) by stopping at the first lowercase or uppercase byte . The memchr could also start at whatever that result was if needle_len > 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know nothing about SSE, but as you mentioned, the new implementation should already be much better for large strings, we can always improve it at a later point.

Closes phpGH-7847
Closes phpGH-7852

Previously stripos/stristr would lowercase both the haystack and the
needle to reuse strpos. The approach in this PR is similar to strpos.
memchr is highly optimized so we're using it to search for the first
character of the needle in the haystack. If we find it we compare the
remaining characters of the needle manually.

The new implementation seems to perform about half as well as strpos (as
two memchr calls are necessary to find the next candidate).
@iluuu1994
Copy link
Member Author

I decided to actually alter php_stristr instead of creating a new function to make sure the faster implementation is actually used. It's unlikely that this changes behavior of existing code as normally haystack and needle were copied exactly to avoid the tolower side effect. I documented the change in UPGRADING.INTERNALS. If there's no more feedback I'll merge this in a day or two.

Zend/zend_operators.h Show resolved Hide resolved
ext/standard/string.c Show resolved Hide resolved
ext/standard/string.c Show resolved Hide resolved
ext/libxml/libxml.c Show resolved Hide resolved
@iluuu1994 iluuu1994 closed this in 2f52956 Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stripos with large haystack has bad performance
5 participants