-
Notifications
You must be signed in to change notification settings - Fork 5.9k
8297172: Fix some issues of auto-vectorization of Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros()
#11405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…erOfTrailingZeros/numberOfLeadingZeros()` Background: Java API[1] for `Long.bitCount/numberOfTrailingZeros/ numberOfLeadingZeros()` returns int type, while Vector API[2] for them returns long type. Currently, to support auto-vectorization of Java API and Vector API at the same time, some vector platforms, namely aarch64 and x86, provides two types of vector nodes taking long type: One produces long vector type for vector API, and the other one produces int vector type by casting long-type result from the first one. We can move the casting work for auto-vectorization of Java API to the mid-end so that we can unify the vector implementation in the backend, reducing extra code. The patch does the refactoring and also fixes several issues below. 1. Refine the auto-vectorization of `Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros()` In the patch, during the stage of generating vector node for the candidate pack, to implement the complete behavior of these Java APIs, superword will make two consecutive vector nodes: the first one, the same as Vector API, does the real execution to produce long-type result, and the second one casts the result to int vector type. For those platforms, which have supported correctly vectorizing these java APIs before, the patch has no real impact on final generated assembly code and, consequently, has no performance regression. 2. Fix the IR check failure of `compiler/vectorization/TestPopCountVectorLong.java` on 128-bit sve platform These Java APIs take a long type and produce an int type, like conversion nodes between different data sizes do. In superword, the alignment of their input nodes is different from their own. It results in that these APIs can't be vectorized when `-XX:MaxVectorSize=16`. So, the IR check for vector nodes in `compiler/vectorization/TestPopCountVectorLong.java` would fail. To fix the issue of alignment, the patch corrects their related alignment, just like it did for conversion nodes between different data sizes. After the patch, these Java APIs can be vectorized on 128-bit platforms, as long as the auto- vectorization is profitable. 3. Fix the incorrect vectorization of `numberOfTrailingZeros/numberOfLeadingZeros()` in aarch64 platforms with more than 128 bits Although `Long.NumberOfLeadingZeros/NumberOfTrailingZeros()` can be vectorized on sve platforms when `-XX:MaxVectorSize=32` or `-XX:MaxVectorSize=64` even before the patch, aarch64 backend didn't provide special vector implementation for Java API and thus the generated code is not correct, like: ``` LOOP: sxtw x13, w12 add x14, x15, x13, uxtx openjdk#3 add x17, x14, #0x10 ld1d {z16.d}, p7/z, [x17] // Incorrectly use integer rbit/clz insn for long type vector *rbit z16.s, p7/m, z16.s *clz z16.s, p7/m, z16.s add x13, x16, x13, uxtx openjdk#2 str q16, [x13, openjdk#16] ... add w12, w12, #0x20 cmp w12, w3 b.lt LOOP ``` It causes a runtime failure of the testcase `compiler/vectorization/TestNumberOfContinuousZeros.java` added in the patch. After the refactoring, the testcase can pass and the code is corrected: ``` LOOP: sxtw x13, w12 add x14, x15, x13, uxtx openjdk#3 add x17, x14, #0x10 ld1d {z16.d}, p7/z, [x17] // Compute with long vector type and convert to int vector type *rbit z16.d, p7/m, z16.d *clz z16.d, p7/m, z16.d *mov z24.d, #0 *uzp1 z25.s, z16.s, z24.s add x13, x16, x13, uxtx openjdk#2 str q25, [x13, openjdk#16] ... add w12, w12, #0x20 cmp w12, w3 b.lt LOOP ``` 4. Fix an assertion failure on x86 avx2 platform Before, on x86 avx2 platform, there is an assertion failure when C2 tries to vectorize the loops like: ``` // long[] ia; // int[] ic; for (int i = 0; i < LENGTH; ++i) { ic[i] = Long.numberOfLeadingZeros(ia[i]); } ``` X86 backend supports vectorizing `numberOfLeadingZeros()` on avx2 platform, but it uses `evpmovqd()` to do casting for `CountLeadingZerosV`[3], which can only be used when `UseAVX > 2`[4]. After the refactoring, the failure can be fixed naturally. Tier 1~3 passed with no new failures on Linux AArch64/X86 platform. [1] https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/lang/Long.html#bitCount(long) https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/lang/Long.html#numberOfTrailingZeros(long) https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/lang/Long.html#numberOfLeadingZeros(long) [2] https://github.com/openjdk/jdk/blob/544e31722528d12fae0eb19271f85886680801a6/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java#L687 [3] https://github.com/openjdk/jdk/blob/544e31722528d12fae0eb19271f85886680801a6/src/hotspot/cpu/x86/x86.ad#L9418 [4] https://github.com/openjdk/jdk/blob/fc616588c1bf731150a9d9b80033bb589bcb231f/src/hotspot/cpu/x86/assembler_x86.cpp#L2239
👋 Welcome back fgao! A progress list of the required criteria for merging this PR into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nicely done.
I suggest to wait approval from @TobiHartmann after his testing is finished. |
@fg1417 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 169 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @TobiHartmann) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too. All tests passed.
Thanks for your review and test work, @vnkozlov @TobiHartmann. I'll integrate it. /integrate |
/sponsor |
Going to push as commit 4458de9.
Your commit was automatically rebased without conflicts. |
Background:
Java API[1] for
Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros()
returns int type, while Vector API[2] for them returns long type. Currently, to support auto-vectorization of Java API and Vector API at the same time, some vector platforms, namely aarch64 and x86, provides two types of vector nodes taking long type: One produces long vector type for vector API, and the other one produces int vector type by casting long-type result from the first one.We can move the casting work for auto-vectorization of Java API to the mid-end so that we can unify the vector implementation in the backend, reducing extra code. The patch does the refactoring and also fixes several issues below.
Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros()
In the patch, during the stage of generating vector node for the candidate pack, to implement the complete behavior of these Java APIs, superword will make two consecutive vector nodes: the first one, the same as Vector API, does the real execution to produce long-type result, and the second one casts the result to int vector type.
For those platforms, which have supported correctly vectorizing these java APIs before, the patch has no real impact on final generated assembly code and, consequently, has no performance regression.
compiler/vectorization/TestPopCountVectorLong.java
on 128-bit sve platformThese Java APIs take a long type and produce an int type, like conversion nodes between different data sizes do. In superword, the alignment of their input nodes is different from their own. It results in that these APIs can't be vectorized when
-XX:MaxVectorSize=16
. So, the IR check for vector nodes incompiler/vectorization/TestPopCountVectorLong.java
would fail. To fix the issue of alignment, the patch corrects their related alignment, just like it did for conversion nodes between different data sizes. After the patch, these Java APIs can be vectorized on 128-bit platforms, as long as the auto-vectorization is profitable.numberOfTrailingZeros/numberOfLeadingZeros()
in aarch64 platforms with more than 128 bitsAlthough
Long.NumberOfLeadingZeros/NumberOfTrailingZeros()
can be vectorized on sve platforms when-XX:MaxVectorSize=32
or-XX:MaxVectorSize=64
even before the patch, aarch64 backend didn't provide special vector implementation for Java API and thus the generated code is not correct, like:It causes a runtime failure of the testcase
compiler/vectorization/TestNumberOfContinuousZeros.java
added in the patch. After the refactoring, the testcase can pass and the code is corrected:Before, on x86 avx2 platform, there is an assertion failure when C2 tries to vectorize the loops like:
X86 backend supports vectorizing
numberOfLeadingZeros()
on avx2 platform, but it usesevpmovqd()
to do casting forCountLeadingZerosV
[3], which can only be used whenUseAVX > 2
[4]. After the refactoring, the failure can be fixed naturally.Tier 1~3 passed with no new failures on Linux AArch64/X86 platform.
[1] https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/lang/Long.html#bitCount(long)
https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/lang/Long.html#numberOfTrailingZeros(long)
https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/lang/Long.html#numberOfLeadingZeros(long)
[2]
jdk/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/LongVector.java
Line 687 in 544e317
[3]
jdk/src/hotspot/cpu/x86/x86.ad
Line 9418 in 544e317
[4]
jdk/src/hotspot/cpu/x86/assembler_x86.cpp
Line 2239 in fc61658
Progress
Issue
Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros()
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11405/head:pull/11405
$ git checkout pull/11405
Update a local copy of the PR:
$ git checkout pull/11405
$ git pull https://git.openjdk.org/jdk pull/11405/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 11405
View PR using the GUI difftool:
$ git pr show -t 11405
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11405.diff