-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8290249: Vectorize signum on AArch64 #9807
Conversation
This patch auto-vectorizes Math.signum intrinsic for float and double types on aarch64 (Neon and SVE). On SVE supporting machines, if the MaxVectorSize <=16 the Neon code would be emitted and if the MaxVectorSize > 16, the SVE code for the intrinsic would be emitted. Following is the performance data for the micro test here - test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java Benchmark Size A B C VectorSignum.doubleSignum 256 1.79 1.70 3.18 VectorSignum.doubleSignum 512 1.86 1.73 3.69 VectorSignum.doubleSignum 1024 1.89 1.74 2.98 VectorSignum.doubleSignum 2048 1.92 1.75 3.04 VectorSignum.floatSignum 256 3.34 3.06 3.92 VectorSignum.floatSignum 512 3.63 3.22 5.27 VectorSignum.floatSignum 1024 3.76 3.35 4.77 VectorSignum.floatSignum 2048 3.85 3.47 5.59 A, B , C machine descriptions given below - A : 128-bit Neon machine B : 256-bit SVE machine C : 512-bit SVE machine The numbers in the table are the gain ratios between the runtime (ns/op) of the scalar, non-vectorized intrinsic code and the vectorized version of the intrinsic (this patch).
👋 Welcome back Bhavana-Kilambi! A progress list of the required criteria for merging this PR into |
@Bhavana-Kilambi The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Please do not commit this until 9346 is in. |
switch (T) { | ||
case S: | ||
sve_and(vtmp, T, 0x80000000); // Extract the sign bit of float value in every lane of src | ||
sve_orr(vtmp, T, 0x3f800000); // OR it with +1 to make the final result +1 or -1 depending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sve_orr(vtmp, T, 0x3f800000); // OR it with +1 to make the final result +1 or -1 depending | |
sve_orr(vtmp, T, jlong_cast(1.0)); // OR it with +1 to make the final result +1 or -1 depending |
...everywhere
@@ -3193,6 +3194,18 @@ void mvnw(Register Rd, Register Rm, | |||
INSN(sve_mls, 0b00000100, 0, 0b011); // multiply-subtract, writing addend: Zda = Zda + -Zn*Zm | |||
#undef INSN | |||
|
|||
// SVE floating-point compare abs (predicated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be handled by the "SVE Integer/Floating-Point Compare - Vectors" code.
Hello, thank you for reviewing my patch. I have made the changes as suggested and waiting for the refactoring patch to be merged. I will then change my *ad files accordingly and put another patch for review in this PR. |
The change to assembler.hpp is still not done. |
|
Hi, I just pushed a new commit with the proposed changes (and a few others). Please review. Once the refactoring patch is merged, I will rebase/merge this patch accordingly. Thank you. |
} \ | ||
int cond_op; \ | ||
switch(cond) { \ | ||
case EQ: cond_op = (op2 << 2) | 0b10; break; \ | ||
case NE: cond_op = (op2 << 2) | 0b11; break; \ | ||
case GE: cond_op = (op2 << 2) | 0b00; break; \ | ||
case GT: cond_op = (op2 << 2) | 0b01; break; \ | ||
case GE: cond_op = (op2 << 2) | ((op2 == 0b11) ? 0b01 : 0b00); break; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would something like this be easier to understand?
bool is_absolute = op2 == 0b11;
....
case GE: cond_op = (op2 << 2) | (is_absolute ? 0b01 : 0b00); break; \
} else { \ | ||
assert(T != B && T != Q, "invalid size"); \ | ||
assert(cond != HI && cond != HS, "invalid condition for fcm"); \ | ||
assert(T != Q, "invalid size"); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing. Could you please clarify by what exactly you mean by "Please wrap all of this in #ifdef ASSERT"? Do you mean squashing the if conditions with the asserts? The assert macro calls are already inside a "#define".
The builds on aarch64 have failed as I missed adding parantheses in the assembler.hpp file. Will update with the new patch shortly. |
@Bhavana-Kilambi This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 19 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph, @nick-arm) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
/integrate |
@Bhavana-Kilambi |
/sponsor |
Going to push as commit 07c7977.
Your commit was automatically rebased without conflicts. |
@nick-arm @Bhavana-Kilambi Pushed as commit 07c7977. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This patch auto-vectorizes Math.signum intrinsic for float and double
types on aarch64 (Neon and SVE). On SVE supporting machines, if the
MaxVectorSize <=16 the Neon code would be emitted and if the
MaxVectorSize > 16, the SVE code for the intrinsic would be emitted.
Following is the performance data for the micro test here -
test/micro/org/openjdk/bench/vm/compiler/VectorSignum.java
A, B , C machine descriptions given below -
A : 128-bit Neon machine
B : 256-bit SVE machine
C : 512-bit SVE machine
The numbers in the table are the gain ratios between the runtime (ns/op)
of the scalar, non-vectorized intrinsic code and the vectorized version
of the intrinsic (this patch).
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9807/head:pull/9807
$ git checkout pull/9807
Update a local copy of the PR:
$ git checkout pull/9807
$ git pull https://git.openjdk.org/jdk pull/9807/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9807
View PR using the GUI difftool:
$ git pr show -t 9807
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9807.diff