Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM-specific optimisations for inflate. #256

Closed
wants to merge 2 commits into from
Closed

ARM-specific optimisations for inflate. #256

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 27, 2017

In inflate_fast() the output pointer always has plenty of room to write. This means that so long as the target is capable, wide un-aligned loads and stores can be used to transfer several bytes at once. When the reference distance is too short simply unroll the data a little to increase the distance.

For PNG decode this comes out at about 33% faster overall across a wide set of files. Small PNGs tend to benefit the least because they don't ever enter into inflate_fast() where the most straightforward assumptions can be made.

Simon Hosie added 2 commits April 26, 2017 17:19
Change-Id: Id4cda552b39bfb39ab35ec499dbe122b43b6d1a1
In inflate_fast() the output pointer always has plenty of room to write. This
means that so long as the target is capable, wide un-aligned loads and stores
can be used to transfer several bytes at once. When the reference distance is
too short simply unroll the data a little to increase the distance.

Change-Id: I59854eb25d2b1e43561c8a2afaf9175bf10cf674
@ghost
Copy link
Author

ghost commented May 4, 2017

@ProgramMax, more of the PNG optimisation here, FYI. Corresponding Chromium patch (now with green bots!) is here.

@ProgramMax
Copy link

Thank you for pinging me. :)

exploited.
*/
static inline unsigned char FAR *chunkcopy_safe(unsigned char FAR *out,
const unsigned char FAR * Z_RESTRICT from,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you can use Z_RESTRICT here. Maybe that's true if you came in via inflate.c, but maybe not if you came in via infback.c.

There's a longer discussion of that at https://chromium-review.googlesource.com/c/chromium/src/+/641575/4/third_party/zlib/contrib/arm/chunkcopy.h#230

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is to give infback.c and inflate.c different implementations; but it could still be argued that the assumption is too dangerous for something, somewhere out there, written in the last twenty-something years.

@ghost
Copy link
Author

ghost commented Sep 28, 2017

This is past-life work, now, and I'm not sure how I'm supposed to reconcile that now that I need to fix it. So I won't.

@ghost ghost closed this Sep 28, 2017
@Adenilson
Copy link

I can fix it and add it to the Adler-32 + CRC32 merge request in: #251

jow- pushed a commit to lede-project/source that referenced this pull request Jan 2, 2018
This adds two optimizations for ARM:
NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs)
ARM(v7+) specific optimization for inflate
I've also connected inflate optimization to the build using the following
source as template.
mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16

Additional info:
https://codereview.chromium.org/2676493007/
https://codereview.chromium.org/2722063002/

Sources:
madler/zlib#251 (only the first commit)
madler/zlib#256

Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
SpiralP pushed a commit to SpiralP/lede-source that referenced this pull request Jan 2, 2018
This adds two optimizations for ARM:
NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs)
ARM(v7+) specific optimization for inflate
I've also connected inflate optimization to the build using the following
source as template.
mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16

Additional info:
https://codereview.chromium.org/2676493007/
https://codereview.chromium.org/2722063002/

Sources:
madler/zlib#251 (only the first commit)
madler/zlib#256

Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
jollaman999 pushed a commit to jollaman999/openwrt that referenced this pull request Jan 13, 2018
This adds two optimizations for ARM:
NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs)
ARM(v7+) specific optimization for inflate
I've also connected inflate optimization to the build using the following
source as template.
mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16

Additional info:
https://codereview.chromium.org/2676493007/
https://codereview.chromium.org/2722063002/

Sources:
madler/zlib#251 (only the first commit)
madler/zlib#256

Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants