Implement NEON-accelerated version of BLOCKCONV for lowercasing and uppercasing strings #11161

nielsdos · 2023-04-29T12:58:34Z

Since lowercasing and uppercasing is a common operation for both internal purposes and userland purposes, it makes sense to implement a NEON accelerated version for this.

…ppercasing strings Since lowercasing and uppercasing is a common operation for both internal purposes and userland purposes, it makes sense to implement a NEON accelerated version for this.

alexdowad · 2023-04-29T13:01:26Z

@nielsdos Thank you very much!

Two questions: Do all ARM64 hosts support the NEON instruction set? And second, can you benchmark this code and let us know how much it improves performance?

FYI @youkidearitai @easyaspi314

nielsdos · 2023-04-29T13:54:30Z

Do all ARM64 hosts support the NEON instruction set?

The AArch64 version of the NEON instruction set (which is used here, and is not the same as the original NEON instruction set) is supported for all ARM64 hosts. See https://en.wikipedia.org/wiki/AArch64 which says

ARMv8-A makes VFPv3/v4 and advanced SIMD (Neon) standard

ARMv8-A is also the version of ARM that introduced the AArch64 mode, so therefore it will always be supported.

Benchmarks

And second, can you benchmark this code and let us know how much it improves performance?

Important: I don't own a ARM64 machine, the tests and development of this patch were performed on an x86-64 host with qemu-user to emulate an AArch64 build of PHP.
Hence, the benchmark results are only a rough indication, because of the emulation. Also an additional problem is that I'm benching on a laptop, so the confidence interval is rather large.
I only benchmarked strtolower() because the strtoupper() code is basically the exact same, but with another check condition. So their performance must be practically identical.

Benchmark: strtolower() Strings that are already all lowercase (1000 iterations)

For a string of size 100: 1.01 ± 0.10 times faster than old approach
For a string of size 1000: 1.34 ± 0.20 times faster than old approach
For a string of size 10000: 2.27 ± 0.22 times faster than old approach

Benchmark: strtolower() Strings where the first half is lowercase and second half is uppercase (1000 iterations)

For a string of size 100: 1.18 ± 0.15 times faster than old approach
For a string of size 1000: 2.28 ± 0.26 times faster than old approach
For a string of size 10000: 6.90 ± 0.65 times faster than old approach

Benchmark: strtolower() Strings that are all upercase (1000 iterations)

For a string of size 100: 1.36 ± 0.18 times faster than old approach
For a string of size 1000: 3.79 ± 0.50 times faster than old approach
For a string of size 10000: 13.42 ± 0.85 times faster than old approach

alexdowad · 2023-04-29T13:57:55Z

@nielsdos Looks good to me!

Any comments from others?

youkidearitai · 2023-04-29T15:25:03Z

I don't know well to ZendEngine and SSE (NEON is not confident to little bit), BLOCKCONV_FOUND returns max value if use vmaxvq_u8 by NEON. SSE use _mm_movemask_epi8 that says intel's homepage to below.

Create mask from the most significant bit of each 8-bit element in a

Is it okay to difference?

nielsdos · 2023-04-29T15:28:58Z

I don't know well to ZendEngine and SSE (NEON is not confident to little bit), BLOCKCONV_FOUND returns max value if use vmaxvq_u8 by NEON. SSE use _mm_movemask_epi8 that says intel's homepage to below.

Create mask from the most significant bit of each 8-bit element in a

Is it okay to difference?

The goal of BLOCKCONV_FOUND is to detect if there was at least one character element where the comparison was true.
Both the SSE and NEON version will use a "less than" instruction which sets the element to all one bits if the comparison is true, and all zero bits if the comparison is false. On SSE we use movemask to gather all the highest order bits, which results in a value != 0 if there was at least one true value. On NEON I do this by using the max, if there was a comparison which yielded all ones, then the max will be all ones, otherwise it will be all zeros. So the two versions are equivalent.

youkidearitai · 2023-04-29T15:36:21Z

@nielsdos Thank you very much for response. I understand. Looks good to me.

alexdowad · 2023-04-30T12:35:41Z

Waiting to hear from @iluuu1994, if he feels like commenting.

@nielsdos, if there are no comments after a few days, please ping me and I will merge this.

nielsdos · 2023-04-30T12:38:20Z

@alexdowad I have merge access, so I can merge this myself. So I'll wait a few days for more comments and if it's all good I can merge this myself :)

nielsdos · 2023-05-04T17:51:05Z

Thanks for the reviews :)

Implement NEON-accelerated version of BLOCKCONV for lowercasing and u…

664459c

…ppercasing strings Since lowercasing and uppercasing is a common operation for both internal purposes and userland purposes, it makes sense to implement a NEON accelerated version for this.

nielsdos requested a review from alexdowad April 29, 2023 12:58

nielsdos requested a review from iluuu1994 as a code owner April 29, 2023 12:58

github-actions bot added the Category: Engine label Apr 29, 2023

nielsdos merged commit a65cdd9 into php:master May 4, 2023
12 of 13 checks passed

pronskiy mentioned this pull request Jun 6, 2023

Add Roundup #13 ThePHPF/thephp.foundation#90

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement NEON-accelerated version of BLOCKCONV for lowercasing and uppercasing strings #11161

Implement NEON-accelerated version of BLOCKCONV for lowercasing and uppercasing strings #11161

nielsdos commented Apr 29, 2023

alexdowad commented Apr 29, 2023

nielsdos commented Apr 29, 2023

alexdowad commented Apr 29, 2023

youkidearitai commented Apr 29, 2023

nielsdos commented Apr 29, 2023

youkidearitai commented Apr 29, 2023

alexdowad commented Apr 30, 2023

nielsdos commented Apr 30, 2023

nielsdos commented May 4, 2023

Implement NEON-accelerated version of BLOCKCONV for lowercasing and uppercasing strings #11161

Implement NEON-accelerated version of BLOCKCONV for lowercasing and uppercasing strings #11161

Conversation

nielsdos commented Apr 29, 2023

alexdowad commented Apr 29, 2023

nielsdos commented Apr 29, 2023

Benchmarks

Benchmark: strtolower() Strings that are already all lowercase (1000 iterations)

Benchmark: strtolower() Strings where the first half is lowercase and second half is uppercase (1000 iterations)

Benchmark: strtolower() Strings that are all upercase (1000 iterations)

alexdowad commented Apr 29, 2023

youkidearitai commented Apr 29, 2023

nielsdos commented Apr 29, 2023

youkidearitai commented Apr 29, 2023

alexdowad commented Apr 30, 2023

nielsdos commented Apr 30, 2023

nielsdos commented May 4, 2023