-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8276673: Optimize abs operations in C2 compiler #6755
Conversation
👋 Welcome back fgao! A progress list of the required criteria for merging this PR into |
The patch aims to help optimize Math.abs() mainly from these three parts: 1) Remove redundant instructions for abs with constant values 2) Remove redundant instructions for abs with char type 3) Convert some common abs operations to ideal forms 1. Remove redundant instructions for abs with constant values If we can decide the value of the input node for function Math.abs() at compile-time, we can substitute the Abs node with the absolute value of the constant and don't have to calculate it at runtime. For example, int[] a for (int i = 0; i < SIZE; i++) { a[i] = Math.abs(-38); } Before the patch, the generated code for the testcase above is: ... mov w10, #0xffffffda cmp w10, wzr cneg w17, w10, lt dup v16.8h, w17 ... After the patch, the generated code for the testcase above is : ... movi v16.4s, #0x26 ... 2. Remove redundant instructions for abs with char type In Java semantics, as the char type is always non-negative, we could actually remove the absI node in the C2 middle end. As for vectorization part, in current SLP, the vectorization of Math.abs() with char type is intentionally disabled after JDK-8261022 because it generates incorrect result before. After removing the AbsI node in the middle end, Math.abs(char) can be vectorized naturally. For example, char[] a; char[] b; for (int i = 0; i < SIZE; i++) { b[i] = (char) Math.abs(a[i]); } Before the patch, the generated assembly code for the testcase above is: B15: add x13, x21, w20, sxtw openjdk#1 ldrh w11, [x13, openjdk#16] cmp w11, wzr cneg w10, w11, lt strh w10, [x13, openjdk#16] ldrh w10, [x13, openjdk#18] cmp w10, wzr cneg w10, w10, lt strh w10, [x13, openjdk#18] ... add w20, w20, #0x1 cmp w20, w17 b.lt B15 After the patch, the generated assembly code is: B15: sbfiz x18, x19, openjdk#1, openjdk#32 add x0, x14, x18 ldr q16, [x0, openjdk#16] add x18, x21, x18 str q16, [x18, openjdk#16] ldr q16, [x0, openjdk#32] str q16, [x18, openjdk#32] ... add w19, w19, #0x40 cmp w19, w17 b.lt B15 3. Convert some common abs operations to ideal forms The patch overrides some virtual support functions for AbsNode so that optimization of gvn can work on it. Here are the optimizable forms: a) abs(0 - x) => abs(x) Before the patch: ... ldr w13, [x13, openjdk#16] neg w13, w13 cmp w13, wzr cneg w14, w13, lt ... After the patch: ... ldr w13, [x13, openjdk#16] cmp w13, wzr cneg w13, w13, lt ... b) abs(abs(x)) => abs(x) Before the patch: ... ldr w12, [x12, openjdk#16] cmp w12, wzr cneg w12, w12, lt cmp w12, wzr cneg w12, w12, lt ... After the patch: ... ldr w13, [x13, openjdk#16] cmp w13, wzr cneg w13, w13, lt ... Change-Id: I5434c01a225796caaf07ffbb19983f4fe2e206bd
Change-Id: I5e3898054b75f49653b8c3b37e4f5007675fa963
The PR optimizes abs operations in the C2 middle end. Can I have your review please? |
So what's the performance data before and after this patch? It would be better to provide a jmh micro benchmark. |
Change-Id: I71987594e9288a489a04de696e69a62f4ad19357
Change-Id: I64938d543126c2e3f9fad8ffc4a50e25e4473d8f
Thanks, @DamonFool . Yes, it's supposed to benefit all archs. Before the patch: After the patch: The jmh micro benchmark testcase has been added in the latest commit. |
Hi @fg1417 , Thanks for your update. Now I see that you are trying to optimize the following three abs() patterns:
But did you see these code patterns in real programs? |
src/hotspot/share/opto/subnode.cpp
Outdated
if (ti->is_con()) { | ||
// Special case for min_jint: Math.abs(min_jint) = min_jint. | ||
// Do not use C++ abs() for min_jint to avoid undefined behavior. | ||
return (ti->is_con(min_jint)) ? TypeInt::MIN : TypeInt::make(abs(ti->get_con())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (ti->is_con(min_jint)) ? TypeInt::MIN : TypeInt::make(abs(ti->get_con())); | |
return TypeInt::make(uabs(ti->get_con()); |
We have uabs() for julong and unsigned int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. Fixed.
Thanks for your review, @DamonFool . I really understand your concern. In terms of complexity, the change only involves AbsNode and doesn’t modify any other code part. I don’t think it will make C2 more complex. As for performance gain in the real world, as we all know, the ability of GVN to optimize a node often depends on the optimized result of its input nodes. For example, if the input node of one AbsNode is recognized as a constant after last round of GVN optimization, now, we can optimize abs(constant) to a simple constant value. Like C2 did in jdk/src/hotspot/share/opto/subnode.cpp Line 55 in 0bddd8a
Math.abs(-38) , (char) Math.abs((char) c) and Math.abs(0 - x) are just conformance testcases. As you said, maybe nobody writes these cases in the real world. These testcases are just simulating all possible scenarios that AbsNode may meet, to guarantee the correctness of the optimization. What do you think :) Thanks. |
Then, shall we also opt cases like |
Hi, @DamonFool . Actually, the cases you listed above, In C2, jdk/src/hotspot/share/opto/mulnode.cpp Line 50 in 0bddd8a
1*x to x. After that, 0 – x matched the pattern that AbsNode can recognize. Math.abs(x / (-1)) , too. As I mentioned before, the AbsNode optimization doesn’t work as a standalone pass and it’s combined very closely to its input nodes. Instruction sequence changes as GVN repeats and our ideal pattern may occur.
Your cases also help prove that these several patterns, like abs(0-x) and abs(positive_value), are very fundamental and common. That’s why we choose them. Thanks. |
I don't think so. How about improving your micro-bechmark to show the performance gain? |
Change-Id: Ie6f37ab159fb7092e1443b9af8d620562a45ae47
I discussed this opt with @theRealAph offline. To clarify from my point of view:
Thanks. |
Hi, @DamonFool I ran the jtreg test internally with some logging info to verify if the optimization works in real java program. The results shows that these patterns are hit in the following cases: • java/lang/StackWalker/LocalsAndOperands.java#id0 It’s not easy to identify these patterns from original java code by our eyes. Since the added code lines are hit, the patterns must occur after many rounds of optimization. Definitely, it benefits all platforms, whether x86 or aarch64. As for the current benchmark, it’s not to show the real performance gain but to illustrate that the opto benefits x86 as well in case you wonder. If you need a real java program, the case won’t be light-weight or straightforward. Maybe I can’t provide you with a satisfying micro benchmark. Thanks. |
Good news! But can you show us an example with more detailed analysis which pattern is applied in the test? |
public class TestAbs { | ||
private static int SIZE = 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
Asserts.assertEquals(Long.MAX_VALUE, Math.abs(-Long.MAX_VALUE)); | ||
|
||
// Test abs(constant) optimization for float | ||
Asserts.assertEquals(Float.NaN, Math.abs(Float.NaN)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest something like:
assertTrue(Float.isNaN(Math.abs(Float.NaN)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
Asserts.assertEquals(Double.MIN_VALUE, Math.abs(-Double.MIN_VALUE)); | ||
} | ||
|
||
private static void testAbsTransformInt(int[] a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to verify C2's transformation, probably we should use C2's IR test framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
Hi, @DamonFool For example, I learnt the optimization technique from the patch of my colleague, #2776 (comment) Thanks. |
Very good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me otherwise (expect the points about the test that @DamonFool already raised).
src/hotspot/share/opto/subnode.cpp
Outdated
set_req(1, in1->in(2)); | ||
PhaseIterGVN* igvn = phase->is_IterGVN(); | ||
if (igvn) { | ||
igvn->_worklist.push(in1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that needed? Because in1
could become dead? You should use set_req_X
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
Change-Id: I8220d54a443a39e04353688143db4b61428be2ad
Change-Id: Ia2372ed06fcc7c88285461a1b013898d9327c18e
Thanks for your review, @DamonFool @TobiHartmann . I fixed all your points mentioned above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@fg1417 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 25 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@TobiHartmann, @DamonFool) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the update.
/integrate |
/sponsor |
Going to push as commit c619666.
Your commit was automatically rebased without conflicts. |
@DamonFool @fg1417 Pushed as commit c619666. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The patch aims to help optimize Math.abs() mainly from these three parts:
If we can decide the value of the input node for function Math.abs()
at compile-time, we can substitute the Abs node with the absolute
value of the constant and don't have to calculate it at runtime.
For example,
int[] a
for (int i = 0; i < SIZE; i++) {
a[i] = Math.abs(-38);
}
Before the patch, the generated code for the testcase above is:
...
mov w10, #0xffffffda
cmp w10, wzr
cneg w17, w10, lt
dup v16.8h, w17
...
After the patch, the generated code for the testcase above is :
...
movi v16.4s, #0x26
...
In Java semantics, as the char type is always non-negative, we
could actually remove the absI node in the C2 middle end.
As for vectorization part, in current SLP, the vectorization of
Math.abs() with char type is intentionally disabled after
JDK-8261022 because it generates incorrect result before. After
removing the AbsI node in the middle end, Math.abs(char) can be
vectorized naturally.
For example,
char[] a;
char[] b;
for (int i = 0; i < SIZE; i++) {
b[i] = (char) Math.abs(a[i]);
}
Before the patch, the generated assembly code for the testcase
above is:
B15:
add x13, x21, w20, sxtw #1
ldrh w11, [x13, #16]
cmp w11, wzr
cneg w10, w11, lt
strh w10, [x13, #16]
ldrh w10, [x13, #18]
cmp w10, wzr
cneg w10, w10, lt
strh w10, [x13, #18]
...
add w20, w20, #0x1
cmp w20, w17
b.lt B15
After the patch, the generated assembly code is:
B15:
sbfiz x18, x19, #1, #32
add x0, x14, x18
ldr q16, [x0, #16]
add x18, x21, x18
str q16, [x18, #16]
ldr q16, [x0, #32]
str q16, [x18, #32]
...
add w19, w19, #0x40
cmp w19, w17
b.lt B15
The patch overrides some virtual support functions for AbsNode
so that optimization of gvn can work on it. Here are the optimizable
forms:
a) abs(0 - x) => abs(x)
Before the patch:
...
ldr w13, [x13, #16]
neg w13, w13
cmp w13, wzr
cneg w14, w13, lt
...
After the patch:
...
ldr w13, [x13, #16]
cmp w13, wzr
cneg w13, w13, lt
...
b) abs(abs(x)) => abs(x)
Before the patch:
...
ldr w12, [x12, #16]
cmp w12, wzr
cneg w12, w12, lt
cmp w12, wzr
cneg w12, w12, lt
...
After the patch:
...
ldr w13, [x13, #16]
cmp w13, wzr
cneg w13, w13, lt
...
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6755/head:pull/6755
$ git checkout pull/6755
Update a local copy of the PR:
$ git checkout pull/6755
$ git pull https://git.openjdk.java.net/jdk pull/6755/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6755
View PR using the GUI difftool:
$ git pr show -t 6755
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6755.diff