-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8268276: Base64 Decoding optimization for x86 using AVX-512 #4368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewed-by: prappo
Reviewed-by: psandoz, vlivanov
…escription.java SIGSEGV in memmove_ssse3 Reviewed-by: lmesnik, sspitsyn
Co-authored-by: Wang Huang <whuang@openjdk.org> Co-authored-by: Ai Jiaming <aijiaming1@huawei.com> Reviewed-by: kvn, jiefu
Reviewed-by: prr, serb
Reviewed-by: sangheki, whuang
Reviewed-by: kvn, vlivanov, thartmann
Reviewed-by: vromero, jfranck
Reviewed-by: kbarrett, tschatzl
Reviewed-by: shade, lucy
Reviewed-by: ayang, tschatzl
… marking in progress Reviewed-by: shade
…leted Reviewed-by: chegar
…roc abnormal end" Reviewed-by: dfuchs
Reviewed-by: dfuchs
Reviewed-by: chegar, alanb, michaelm, iris
Reviewed-by: ccheung, iklam
Reviewed-by: hseigel, lfoltan, dholmes
Reviewed-by: alanb, lancea
1. Changed errorvec handling 2. Removed unnecessary register copies and aliasing 3. Streamlined mask generation
@asgibbons The patch looks good to me. @vnkozlov We need one more review for this patch. Could you please help? |
What testing has been done for this change? I do not see that the Github Actions have been run for this PR. Has this been tested on a range of x86 systems with differing AVX capabilities? Thanks, |
/test |
@asgibbons you need to get approval to run the tests in tier1 for commits up until 58461b8 |
Mailing list message from Gibbons, Scott on hotspot-dev: Hi, David. I don't have permissions to run tests in this repo. I have tested on several x86 platforms (ICX, SKL) with several options. I'll be running more tests today. Thanks, -----Original Message----- On Wed, 23 Jun 2021 00:31:55 GMT, Scott Gibbons <github.com+6704669+asgibbons at openjdk.org> wrote:
What testing has been done for this change? I do not see that the Github Actions have been run for this PR. Has this been tested on a range of x86 systems with differing AVX capabilities? Thanks, ------------- PR: https://git.openjdk.java.net/jdk/pull/4368 |
I will run our internal testing before approving this. |
I hit strange failure in compiler/intrinsics/base64/TestBase64.java test on Windows machine which have Intel 8167M cpu (AVX512).
|
The rest of testing hs-tier1-4 and xcomp is finished and clean. |
Hi, @vnkozlov. I just pushed a change that fixes a register overwrite. Can you please start the tests again? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest update fixed TestBase64.java test issue.
/integrate |
@asgibbons |
Thanks a lot @vnkozlov for the review and test. |
/sponsor |
Going to push as commit c37988d.
Your commit was automatically rebased without conflicts. |
@sviswa7 @asgibbons Pushed as commit c37988d. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
We found this optimization causes https://bugs.openjdk.org/browse/JDK-8321599 |
Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. Also allows for performance improvement for non-AVX-512 enabled platforms. Due to the nature of MIME-encoded inputs, modify the intrinsic signature to accept an additional parameter (isMIME) for fast-path MIME decoding.
A change was made to the signature of DecodeBlock in Base64.java to provide the intrinsic information as to whether MIME decoding was being done. This allows for the intrinsic to bypass the expensive setup of zmm registers from AVX tables, knowing there may be invalid Base64 characters every 76 characters or so. A change was also made here removing the restriction that the intrinsic must return an even multiple of 3 bytes decoded. This implementation handles the pad characters at the end of the string and will return the actual number of characters decoded.
The AVX portion of this code will decode in blocks of 256 bytes per loop iteration, then in chunks of 64 bytes, followed by end fixup decoding. The non-AVX code is an assembly-optimized version of the java DecodeBlock and behaves identically.
Running the Base64Decode benchmark, this change increases decode performance by an average of 2.6x with a maximum 19.7x for buffers > ~20k. The numbers are given in the table below.
Base Score is without intrinsic support, Optimized Score is using this intrinsic, and Gain is Base / Optimized.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4368/head:pull/4368
$ git checkout pull/4368
Update a local copy of the PR:
$ git checkout pull/4368
$ git pull https://git.openjdk.java.net/jdk pull/4368/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4368
View PR using the GUI difftool:
$ git pr show -t 4368
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4368.diff