[aarch64] neon vectorization for base64 #4381

sebpop · 2019-07-08T15:10:51Z

A similar algorithm is used to vectorize on x86_64, with a good description in
https://arxiv.org/abs/1704.00605 . On AArch64 the implementation differs in that
instead of using multiplies to shift bits around, it uses the vld3+vst4 and
vld4+vst3 combinations to load and store interleaved data. This patch is based
on the NEON implementation of Wojciech Mula:
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.neon.cpp
and
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
adapted to php/ext/standard/base64.c and vectorized with factor 16 instead of 8.

On a Graviton A1 instance and on the synthetic benchmarks in
https://github.com/lemire/fastbase64 I see 175% speedup on base64 encoding and
60% speedup on base64 decode compared to the scalar implementation.

The patch passes make test regression testing on aarch64-linux.

dstogov · 2019-07-08T20:08:24Z

I can't test this, but I think this may be merged into 7.4 and master.

A similar algorithm is used to vectorize on x86_64, with a good description in https://arxiv.org/abs/1704.00605 . On AArch64 the implementation differs in that instead of using multiplies to shift bits around, it uses the vld3+vst4 and vld4+vst3 combinations to load and store interleaved data. This patch is based on the NEON implementation of Wojciech Mula: https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.neon.cpp and https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp adapted to php/ext/standard/base64.c and vectorized with factor 16 instead of 8. On a Graviton A1 instance and on the synthetic benchmarks in https://github.com/lemire/fastbase64 I see 175% speedup on base64 encoding and 60% speedup on base64 decode compared to the scalar implementation. The patch passes `make test` regression testing on aarch64-linux.

sebpop · 2019-07-10T16:26:20Z

In the last update 967e4c5 I fixed the size of the .mem field in the error union to match the vector factor 16 change.

@dstogov this patch is ready to be committed to master and php-7.4, thanks!

dstogov · 2019-07-11T09:06:07Z

Merged as 3b73c9f

lemire · 2019-07-16T20:44:08Z

+1

sebpop force-pushed the base64 branch from 1fe119b to 967e4c5 Compare July 10, 2019 01:39

dstogov closed this Jul 11, 2019

lemire mentioned this pull request Apr 17, 2020

base64 bio using avx when avaliable openssl/openssl#11460

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[aarch64] neon vectorization for base64 #4381

[aarch64] neon vectorization for base64 #4381

Uh oh!

sebpop commented Jul 8, 2019

Uh oh!

dstogov commented Jul 8, 2019

Uh oh!

sebpop commented Jul 10, 2019

Uh oh!

dstogov commented Jul 11, 2019

Uh oh!

lemire commented Jul 16, 2019

Uh oh!

Uh oh!

[aarch64] neon vectorization for base64 #4381

[aarch64] neon vectorization for base64 #4381

Uh oh!

Conversation

sebpop commented Jul 8, 2019

Uh oh!

dstogov commented Jul 8, 2019

Uh oh!

sebpop commented Jul 10, 2019

Uh oh!

dstogov commented Jul 11, 2019

Uh oh!

lemire commented Jul 16, 2019

Uh oh!

Uh oh!