Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable fast decoding on Apple/AArch64 builds (18-25% faster decompression) #1040

Merged
merged 1 commit into from Nov 23, 2021
Merged

Enable fast decoding on Apple/AArch64 builds (18-25% faster decompression) #1040

merged 1 commit into from Nov 23, 2021

Conversation

zeux
Copy link
Contributor

@zeux zeux commented Nov 22, 2021

This makes decoding significantly faster on M1; measured on compressed source
code, decompressing 294 MB to 1301 MB takes 513 ms (2.53 GB/s) before, and
406 ms (3.2 GB/s) after this change on M1 Pro.

There's no way to check if the target architecture is M1 specifically but the
gains are likely to be similar on recent iterations of Apple processors, and
the original performance issue was probably more specific to Qualcomm.

This makes decoding significantly faster on M1; measured on compressed source
code across 8 hardware threads, decompressing 294 MB to 1301 MB takes 513 ms
of cumulative work (2.53 GB/s) before, and 406 ms (3.2 GB/s) after this change
on M1 Pro.

There's no way to check if the target architecture is M1 specifically but the
gains are likely to be similar on recent iterations on Apple processors, and
the original performance issue was probably more specific to Qualcomm.
@Cyan4973
Copy link
Member

Well, hopefully, the condition defined(__aarch64__) && defined(__APPLE__) should make it more specific to Apple Silicon implementations of ARM64, and therefore avoid bringing in Qualcomm's (and possibly other vendors) issues.

@terrelln
Copy link
Contributor

I measured on my M1 MacBook Air, and see decompression speed go from 3.9 GB/s -> 4.6 GB/s.

Well, hopefully, the condition defined(aarch64) && defined(APPLE) should make it more specific to Apple Silicon implementations of ARM64, and therefore avoid bringing in Qualcomm's (and possibly other vendors) issues.

This will also enable it for older iPhones, which ran on Qualcomm chips. I'm not sure how much we care about that. Or even if older iPhones had the mentioned performance issues. It may have been Android devices.

@zeux
Copy link
Contributor Author

zeux commented Nov 22, 2021

I wouldn't be too worried about this - I don't think Apple ever used Qualcomm CPUs, but the last ARM Cortex CPU they used was Apple A5 (launched in 2011 and discontinued in 2016). It's hard to know for certain without measuring this but I wouldn't expect this to cause a regression on Apple's more recent hardware.

@zeux zeux changed the title Enable fast decoding on Apple/AArch64 builds (25% faster decompression) Enable fast decoding on Apple/AArch64 builds (18-25% faster decompression) Nov 22, 2021
@Cyan4973
Copy link
Member

Cyan4973 commented Nov 23, 2021

Thanks @zeux, this looks like a good trade off

@Cyan4973 Cyan4973 merged commit db57809 into lz4:dev Nov 23, 2021
@zeux zeux deleted the m1-fastdec branch November 23, 2021 04:45
zeux added a commit to zeux/qgrep that referenced this pull request Nov 24, 2021
This includes the M1 optimization PR: lz4/lz4#1040
As a result, qgrep bruteforce queries run 10-15% faster on M1 Pro.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants