-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8292587: AArch64: Support SVE fabd instruction #10011
Conversation
Scalar and NEON fabd instructions were initially supported in JDK-8256318. In this patch, we support SVE fabd instruction [1] and add one Jtreg test case as well. With this patch, two instructions `fsub + fabs` would be combined into one single `fabd` instruction. ``` fsub z16.s, z16.s, z17.s fabs z16.s, p7/m, z16.s --> fabd z16.s, p7/m, z16.s, z17.s ``` In the initial evaluation of JMH case, i.e. FloatingScalarVectorAbsDiff.java, we found the performance uplift done by this optimization was easily hidden by the heavy memory load/store instructions. To avoid that, we updated the JMH case a bit, adding one more group of subtraction and Math.abs operations in the loop body. Here shows the data with the new JMH case on one 256-bit SVE machine. We can observe about 39% and 35% improvements for the two functions respectively. ``` Benchmark Before After Units FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op ``` [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated--
👋 Welcome back haosun! A progress list of the required criteria for merging this PR into |
Webrevs
|
Ping? Can anyone help to review this patch? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Hi @nick-arm, could you help to review this patch when you have spare time? Thanks in advance. |
As pointed out by Faye Gao, the test results are not fully verified due to incorrect loop limits. Updated it. Reran the test and no regression.
I tested this in our CI. All tests passed. |
@shqking This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 210 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@nsjian, @nick-arm) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Thanks for your code reviews, @nick-arm @nsjian and @fg1417 . I don't think the GHA test failure is related to this patch. /integrate |
/sponsor |
Going to push as commit cbee0bc.
Your commit was automatically rebased without conflicts. |
Scalar and NEON fabd instructions were initially supported in
JDK-8256318. In this patch, we support SVE fabd instruction [1] and add
one Jtreg test case as well.
With this patch, two instructions
fsub + fabs
would be combined intoone single
fabd
instruction.In the initial evaluation of JMH case, i.e.
FloatingScalarVectorAbsDiff.java, we found the performance uplift done
by this optimization was easily hidden by the heavy memory load/store
instructions. To avoid that, we updated the JMH case a bit, adding one
more group of subtraction and Math.abs operations in the loop body.
Here shows the data with the new JMH case on one 256-bit SVE machine. We
can observe about 39% and 35% improvements for the two functions
respectively.
Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine.
[1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated--
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10011/head:pull/10011
$ git checkout pull/10011
Update a local copy of the PR:
$ git checkout pull/10011
$ git pull https://git.openjdk.org/jdk pull/10011/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 10011
View PR using the GUI difftool:
$ git pr show -t 10011
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10011.diff