-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8308340: C2: Idealize Fma nodes #14576
Conversation
Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: ``` match(Set dst (FmaF src3 (Binary (NegF src1) src2))); match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); ``` Since `Fma` is partially communitive, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. The patch passed all tier 1 - 3 on aarch64 and x86 platforms.
👋 Welcome back fgao! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for UseFMA
should be moved from c2compiler.cpp
to Matcher::match_rule_supported
in .ad
files.
I see we have such check for Fma vectors in x86.ad
but not for scalars.
Similar issue exist for other platforms.
@vnkozlov Thanks for your review! I updated it in the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
You need second review.
@fg1417 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 11 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks for your review @vnkozlov . I would appreciate it very much if some expert on ppc or riscv could help review it! Perhaps @RealFYang @reinrich |
Can I get a second review, please? Thanks. |
@@ -1875,6 +1875,17 @@ Node* VectorLongToMaskNode::Ideal(PhaseGVN* phase, bool can_reshape) { | |||
return nullptr; | |||
} | |||
|
|||
Node* FmaVNode::Ideal(PhaseGVN* phase, bool can_reshape) { | |||
// We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a little bit more please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
For vectorapi masked operations, like av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)
, the inactive lanes of the output should save the first input of the node, so the inactive lanes of the output should be equal to lane values in av.neg()
. If we exchange the inputs, the inactive lanes will be equal to bv
, which is incorrect. So we shouldn't swap edges for masked nodes. The newly added testcases in jdk/test/hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java
can cover this. Fortunately, there is no such constraint for non-masked vector nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I think I understood it to some degree.
What happens with the subgraphs that are not canonicalized? They will have extra vector operations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)
, the subgraph is like:
match (Set dst (FmaV (Binary (NegV src1) src2) (Binary src3 pg)));
, almost no platform supports fusing it directly, so it should be split into two vector operations: NegV
+ FmaV
. I suppose the NegV
is what you called as "the extra vector operation", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I meant. Thanks. Now on PPC, my understanding would be that with the symmetrical match-rules (removed with this pr) the NegV
wouldn't be generated. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. How are FmaV
nodes with mask handled then? Are they transformed into equivalent nodes without mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is no handling on FmaV
nodes with mask in this patch, whether in the C2 mid-end or codegen backend. The gvn transformation just skips them. And I suppose FmaV
nodes with mask can't be transformed into nodes without mask, except that C2 can guarantee that the mask is all true (this transformation has not been supported by current C2). Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I would have to dig deeper into the vector api implementation to really understand how it works.
Thanks for your patience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fg1417 I only understood the comment with the help of your explanations in this thread. I think you should improve the comment. I would not mention the vectorapi. We may generate FmaV
through an auto-vectorizer. Though I guess that is unlikely, since the scalar version Fma::Ideal
would already reshape things.
Suggestion:
// We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c"
// This reduces the number of rules in the matcher, as we only need to check
// for negations on the second argument, and not the symmetric case where
// the first argument is negated.
// We cannot do this if he FmaV is masked. the inactive lanes have to return
// the first input (ie "-a"). If we were to swap the inputs, the inactive lanes would
// incorrectly return "b".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the patch to our nightly testing about 10 days ago and forgot about it (sorry).
So it passed several iterations of tier1-4 of hotspot and jdk, all of langtools and jaxp, renaissance benchmarks as functional tests. All testing with fastdebug and release builds on the main platforms and also on Linux/PPC64le.
PPC changes do look good to me. I'm not the greatest C2 expert though. So I'd suggest to get another review.
Thanks a lot for your review and test work @reinrich! Can I get a review from @TobiHartmann @chhagedorn @eme64 for C2 part? Thanks! |
Hello, the RISC-V part looks fine from what this PR is supposed to do. And this has passed tier1-3 tests on linux-riscv64 platform. Note that I didn't check the shared code changes. |
@RealFYang Thanks for your review and test work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fg1417 This looks like a reasonable refactoring.
We should probably do the verification that the canonicalization happened, if the normal fma
matcher rule is chosen. We should add asserts that the first argument is not a negation (you could check the second argument also, just in case). What do you think?
@@ -60,7 +60,7 @@ public class VectorFusedMultiplyAddSubTest { | |||
private static final VectorSpecies<Long> L_SPECIES = LongVector.SPECIES_MAX; | |||
private static final VectorSpecies<Short> S_SPECIES = ShortVector.SPECIES_MAX; | |||
|
|||
private static int LENGTH = 1024; | |||
private static int LENGTH = 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for the reduction? Speed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's for speeding up.
@@ -1875,6 +1875,17 @@ Node* VectorLongToMaskNode::Ideal(PhaseGVN* phase, bool can_reshape) { | |||
return nullptr; | |||
} | |||
|
|||
Node* FmaVNode::Ideal(PhaseGVN* phase, bool can_reshape) { | |||
// We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fg1417 I only understood the comment with the help of your explanations in this thread. I think you should improve the comment. I would not mention the vectorapi. We may generate FmaV
through an auto-vectorizer. Though I guess that is unlikely, since the scalar version Fma::Ideal
would already reshape things.
Suggestion:
// We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c"
// This reduces the number of rules in the matcher, as we only need to check
// for negations on the second argument, and not the symmetric case where
// the first argument is negated.
// We cannot do this if he FmaV is masked. the inactive lanes have to return
// the first input (ie "-a"). If we were to swap the inputs, the inactive lanes would
// incorrectly return "b".
//============================================================================= | ||
//------------------------------Ideal------------------------------------------ | ||
Node* FmaNode::Ideal(PhaseGVN* phase, bool can_reshape) { | ||
// We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add motivation to comment
// This reduces the number of rules in the matcher, as we only need to check
// for negations on the second argument, and not the symmetric case where
// the first argument is negated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
@@ -3972,7 +3973,6 @@ instruct fmaD_reg(regD a, regD b, regD c) %{ | |||
|
|||
// a * b + c | |||
instruct fmaF_reg(regF a, regF b, regF c) %{ | |||
predicate(UseFMA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add an assert to the encoding code. Just to ensure that we do not generate bad code, even if it is never executed during testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for your suggestion! Updated in the new commit.
Hi @eme64, thanks for your review! The check may be more complex than expected. Matcher can fuse two instructions into one, only when there is no other use for the inputs. It means that we can do the fusion for the case like:
But we can't fuse them for the case like:
For the second case, we still match normal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fg1417 fair enough. the checks may be too complex. I'll approve it now :)
Thanks a lot for all your kind reviews and test work, @RealFYang @vnkozlov @eme64 @reinrich. I'll integrate it. |
/integrate |
Going to push as commit 37c6b23.
Your commit was automatically rebased without conflicts. |
Some platforms, like aarch64, ppc, and riscv, support fusing
Math.fma(-a, b, c)
orMath.fma(a, -b, c)
by generating partially symmetric match rules like:Since
Fma
is partially commutative, the patch is to convertMath.fma(-a, b, c)
toMath.fma(b, -a, c)
in gvn phase, making node patterns canonical. Then we can remove redundant rules.Also, we should guarantee that C2 generates
Fma
nodes only on platforms supportingFma
instructions before matcher, so we can remove allpredicate(UseFMA)
for allFma
rules.After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k.
The patch passed all tier 1 - 3 on aarch64 and x86 platforms.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14576/head:pull/14576
$ git checkout pull/14576
Update a local copy of the PR:
$ git checkout pull/14576
$ git pull https://git.openjdk.org/jdk.git pull/14576/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14576
View PR using the GUI difftool:
$ git pr show -t 14576
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14576.diff
Webrev
Link to Webrev Comment