enable LZ4_FAST_DEC_LOOP build macro on aarch64 by default #707
Pull request #645 has introduced the build macro LZ4_FAST_DEC_LOOP which by default enables an optimization only for x86/x64.
I propose to enable this optimization for aarch64 as well. Here are the benchmark results for this pull request running on a1.4xlarge AWS EC2 instance. The final percent is how much faster this patchset is vs. the current dev branch.
Your results are in line with several of our observations.
However, the issue is,
In general, server-class
So I believe we need something more accurate than just