Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aarch64] neon vectorization for base64 #4381

Closed
wants to merge 1 commit into from

Conversation

@sebpop
Copy link
Contributor

commented Jul 8, 2019

A similar algorithm is used to vectorize on x86_64, with a good description in
https://arxiv.org/abs/1704.00605 . On AArch64 the implementation differs in that
instead of using multiplies to shift bits around, it uses the vld3+vst4 and
vld4+vst3 combinations to load and store interleaved data. This patch is based
on the NEON implementation of Wojciech Mula:
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.neon.cpp
and
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
adapted to php/ext/standard/base64.c and vectorized with factor 16 instead of 8.

On a Graviton A1 instance and on the synthetic benchmarks in
https://github.com/lemire/fastbase64 I see 175% speedup on base64 encoding and
60% speedup on base64 decode compared to the scalar implementation.

The patch passes make test regression testing on aarch64-linux.

@dstogov

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

I can't test this, but I think this may be merged into 7.4 and master.

[aarch64] neon vectorization for base64
A similar algorithm is used to vectorize on x86_64, with a good description in
https://arxiv.org/abs/1704.00605 . On AArch64 the implementation differs in that
instead of using multiplies to shift bits around, it uses the vld3+vst4 and
vld4+vst3 combinations to load and store interleaved data.  This patch is based
on the NEON implementation of Wojciech Mula:
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
https://github.com/WojciechMula/base64simd/blob/master/encode/lookup.neon.cpp
and
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
https://github.com/WojciechMula/base64simd/blob/master/encode/encode.neon.cpp
adapted to php/ext/standard/base64.c and vectorized with factor 16 instead of 8.

On a Graviton A1 instance and on the synthetic benchmarks in
https://github.com/lemire/fastbase64 I see 175% speedup on base64 encoding and
60% speedup on base64 decode compared to the scalar implementation.

The patch passes `make test` regression testing on aarch64-linux.

@sebpop sebpop force-pushed the sebpop:base64 branch from 1fe119b to 967e4c5 Jul 10, 2019

@sebpop

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

In the last update 967e4c5 I fixed the size of the .mem field in the error union to match the vector factor 16 change.

@dstogov this patch is ready to be committed to master and php-7.4, thanks!

@dstogov

This comment has been minimized.

Copy link
Member

commented Jul 11, 2019

Merged as 3b73c9f

@dstogov dstogov closed this Jul 11, 2019

@lemire

This comment has been minimized.

Copy link

commented Jul 16, 2019

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.