-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8280510: AArch64: Vectorize operations with loop induction variable #7491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AArch64 has SVE instruction of populating incrementing indices into an SVE vector register. With this we can vectorize some operations in loop with the induction variable operand, such as below. for (int i = 0; i < count; i++) { b[i] = a[i] * i; } This patch enables the vectorization of operations with loop induction variable by extending current scope of C2 superword vectorizable packs. Before this patch, any scalar input node in a vectorizable pack must be an out-of-loop invariant. This patch takes the induction variable input as consideration. It allows the input to be the iv phi node or phi plus its index offset, and creates a PopulateIndexNode to generate a vector filled with incrementing indices. On AArch64 SVE, final generated code for above loop expression is like below. add x12, x16, x10 add x12, x12, #0x10 ld1w {z16.s}, p7/z, [x12] index z17.s, w1, openjdk#1 mul z17.s, p7/m, z17.s, z16.s add x10, x17, x10 add x10, x10, #0x10 st1w {z17.s}, p7, [x10] As there is no populating index instruction on AArch64 NEON or other platforms like x86, a function named is_populate_index_supported() is created in the VectorNode class for the backend support check. Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1 are tested and no issue is found. Hotspot jtreg has existing tests in compiler/c2/cr7192963/Test*Vect.java covering this kind of use cases so no new jtreg is created within this patch. A new JMH is created in this patch and tested on a 512-bit SVE machine. Below test result shows the performance can be significantly improved in some cases. Benchmark Performance IndexVector.exprWithIndex1 ~7.7x IndexVector.exprWithIndex2 ~13.3x IndexVector.indexArrayFill ~5.7x
👋 Welcome back pli! A progress list of the required criteria for merging this PR into |
Webrevs
|
Can anyone help review this? This optimization is currently for AArch64 but changes are mainly in C2 mid-end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks ok to me modulo the cut-and-paste error I highlighted.
Have you any way of ensuring that the generated code is correct (other than the fact that it runs quicker).
Also have you checked this does not invalidate any other SVE tests. I would expect some failures thanks to the cut-and-paste error.
It would be a good idea if someone who knows the superword code better than me could comment on this extension to the range of cases covered by superword transformation. This looks like a very useful extension to the range of cases that can be optimized using superword analysis. However, I am not familiar enough with the code to know if this is the best way to proceed or if some other alternative has been considered and might provide a better alternative, especially wrt architectures other than aarch64. |
Thanks for your careful review. The cut-and-paste error has been fixed in my updated patch.
I've run almost full jtreg on an SVE featured machine and no issue is found. I noticed that existing hotspot jtregs cases in
I agree. I think people who know superword well are mostly in Oracle so I'd like to see if anyone in Oracle could help look at this. I've considered enabling this for x86. But as I'm not familiar with x86 instruction set enough, I haven't found any instruction(s) to implement such index vector. I would be very glad if someone from Intel can give me some suggestions. |
@vnkozlov @TobiHartmann Can you also help look at this? |
Please resolve the merge conflicts. |
Done. After merge, this is also tested with recent new jtreg cases in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I executed some testing, all passed.
Do we need to add the new node to MatchRule::is_expensive()
?
Do we need to add a declaration to vmStructs.cpp
?
Hi @TobiHartmann ,
I've fixed these in my latest commit. In that commit, I also align matching rule code with other rules in AArch64 ad file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good to me
@pfustc This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 143 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too. Have you considered adding a IR verification test?
Hi @TobiHartmann ,
Thanks for your review. And good question! You might have remembered that we contributed a vectorization test framework together with our post loop fix (#6828). Currently we are enabling IR tests in that framework. So far we have made some progress. So IR tests will be added for all vectorizable loops there (in |
/integrate |
Going to push as commit ea83b44.
Your commit was automatically rebased without conflicts. |
Okay, thanks for the heads up! |
AArch64 has SVE instruction of populating incrementing indices into an
SVE vector register. With this we can vectorize some operations in loop
with the induction variable operand, such as below.
This patch enables the vectorization of operations with loop induction
variable by extending current scope of C2 superword vectorizable packs.
Before this patch, any scalar input node in a vectorizable pack must be
an out-of-loop invariant. This patch takes the induction variable input
as consideration. It allows the input to be the iv phi node or phi plus
its index offset, and creates a
PopulateIndexNode
to generate a vectorfilled with incrementing indices. On AArch64 SVE, final generated code
for above loop expression is like below.
As there is no populating index instruction on AArch64 NEON or other
platforms like x86, a function named
is_populate_index_supported()
iscreated in the VectorNode class for the backend support check.
Jtreg hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1
are tested and no issue is found. Hotspot jtreg has existing tests in
compiler/c2/cr7192963/Test*Vect.java
covering this kind of use cases sono new jtreg is created within this patch. A new JMH is created in this
patch and tested on a 512-bit SVE machine. Below test result shows the
performance can be significantly improved in some cases.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7491/head:pull/7491
$ git checkout pull/7491
Update a local copy of the PR:
$ git checkout pull/7491
$ git pull https://git.openjdk.java.net/jdk pull/7491/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7491
View PR using the GUI difftool:
$ git pr show -t 7491
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7491.diff