forked from libjpeg-turbo/libjpeg-turbo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARM64 NEON: Optimize final transpose in jsimd_idct_ifast_neon
Get rid of 4 redundant MOV instructions and replace 24 64-bit instructions with 12 128-bit instructions at the final transpose step. This should make the code faster on ARM cores with wide NEON unit. Also interleave scalar ARM instructions (which are doing addresses calculation) with NEON instructions to make use of dual-issue on in-order ARM cores.
- Loading branch information
Showing
1 changed file
with
59 additions
and
71 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters