Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ARMv8 NEON (AdvSIMD) acceleration #5

Merged
merged 3 commits into from
Sep 1, 2021
Merged

Implement ARMv8 NEON (AdvSIMD) acceleration #5

merged 3 commits into from
Sep 1, 2021

Commits on Aug 31, 2021

  1. Properly cfg() all x86-specific things to x86

    And use a wrapper function to allow cfg()ed early-return for dispatch
    valpackett committed Aug 31, 2021
    Configuration menu
    Copy the full SHA
    2e2eeea View commit details
    Browse the repository at this point in the history
  2. Add AArch64 NEON/AdvSIMD acceleration

    Currently using the "simple" movemask from
    https://branchfree.org/2019/04/01/fitting-my-head-through-the-arm-holes-or-two-sequences-to-substitute-for-the-missing-pmovmskb-instruction-on-arm-neon/
    but it should be possible to try the "interleaved" one later.
    
    Benchmark results: on Cortex-A72 with a 541MB file, the speedup is 2.58x (1.721s to 0.662s).
    valpackett committed Aug 31, 2021
    Configuration menu
    Copy the full SHA
    80e1543 View commit details
    Browse the repository at this point in the history
  3. fixup! Add AArch64 NEON/AdvSIMD acceleration

    Minor update to fix documentation typo and to use a shorter version of
    the branchfree.org URL to prevent long lines (tested).
    mqudsi committed Aug 31, 2021
    Configuration menu
    Copy the full SHA
    099f836 View commit details
    Browse the repository at this point in the history