8308340: C2: Idealize Fma nodes by fg1417 · Pull Request #14576 · openjdk/jdk

fg1417 · 2023-06-21T03:26:38Z

Some platforms, like aarch64, ppc, and riscv, support fusing Math.fma(-a, b, c) or Math.fma(a, -b, c) by generating partially symmetric match rules like:

  match(Set dst (FmaF src3 (Binary (NegF src1) src2)));
  match(Set dst (FmaF src3 (Binary src1 (NegF src2))));

Since Fma is partially commutative, the patch is to convert Math.fma(-a, b, c) to Math.fma(b, -a, c) in gvn phase, making node patterns canonical. Then we can remove redundant rules.

Also, we should guarantee that C2 generates Fma nodes only on platforms supporting Fma instructions before matcher, so we can remove all predicate(UseFMA) for all Fma rules.

After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k.

The patch passed all tier 1 - 3 on aarch64 and x86 platforms.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8308340: C2: Idealize Fma nodes (Enhancement - P4)

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer) ⚠️ Review applies to 06162d88
Emanuel Peter (@eme64 - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14576/head:pull/14576
$ git checkout pull/14576

Update a local copy of the PR:
$ git checkout pull/14576
$ git pull https://git.openjdk.org/jdk.git pull/14576/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14576

View PR using the GUI difftool:
$ git pr show -t 14576

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14576.diff

Webrev

Link to Webrev Comment

Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: ``` match(Set dst (FmaF src3 (Binary (NegF src1) src2))); match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); ``` Since `Fma` is partially communitive, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. The patch passed all tier 1 - 3 on aarch64 and x86 platforms.

bridgekeeper · 2023-06-21T03:27:50Z

👋 Welcome back fgao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-06-21T03:29:35Z

@fg1417 The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-06-21T03:33:22Z

Webrevs

vnkozlov

Check for UseFMA should be moved from c2compiler.cpp to Matcher::match_rule_supported in .ad files.
I see we have such check for Fma vectors in x86.ad but not for scalars.
Similar issue exist for other platforms.

…orted in .ad files

fg1417 · 2023-06-27T03:17:45Z

Check for UseFMA should be moved from c2compiler.cpp to Matcher::match_rule_supported in .ad files. I see we have such check for Fma vectors in x86.ad but not for scalars. Similar issue exist for other platforms.

@vnkozlov Thanks for your review! I updated it in the latest commit.

vnkozlov

Looks good to me.
You need second review.

openjdk · 2023-06-27T16:08:43Z

@fg1417 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8308340: C2: Idealize Fma nodes

Reviewed-by: kvn, epeter

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 11 new commits pushed to the master branch:

207bd00: 8313756: [BACKOUT] 8308682: Enhance AES performance
823f5b9: 8308850: Change JVM options with small ranges that get -Wconversion warnings to 32 bits
5bfb82e: 8314119: G1: Fix -Wconversion warnings in G1CardSetInlinePtr::card_pos_for
06aa3c5: 8314118: Update JMH devkit to 1.37
4164693: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers.
049b55f: 8314019: Add gc logging to jdk/jfr/event/gc/detailed/TestZAllocationStallEvent.java
a39ed10: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636
1de5bf1: 8314106: C2: assert(is_valid()) failed: must be valid after JDK-8305636
5c91622: 8314117: RISC-V: Incorrect VMReg encoding in RISCV64Frame.java
6bbcef5: 8313948: Remove unnecessary static fields defaultUpper/defaultLower in sun.net.PortConfig
... and 1 more: https://git.openjdk.org/jdk/compare/ec0cc6300a02dd92b25d9072b8b3859dab583bbd...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

fg1417 · 2023-06-28T08:18:34Z

Thanks for your review @vnkozlov .

I would appreciate it very much if some expert on ppc or riscv could help review it! Perhaps @RealFYang @reinrich

fg1417 · 2023-07-05T06:36:13Z

Can I get a second review, please? Thanks.

reinrich · 2023-07-17T14:28:16Z

src/hotspot/share/opto/vectornode.cpp

 }

+Node* FmaVNode::Ideal(PhaseGVN* phase, bool can_reshape) {
+  // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c"


Could you explain a little bit more please?

Thanks for your review!

For vectorapi masked operations, like av.neg().lanewise(VectorOperators.FMA, bv, cv, mask), the inactive lanes of the output should save the first input of the node, so the inactive lanes of the output should be equal to lane values in av.neg(). If we exchange the inputs, the inactive lanes will be equal to bv, which is incorrect. So we shouldn't swap edges for masked nodes. The newly added testcases in jdk/test/hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java can cover this. Fortunately, there is no such constraint for non-masked vector nodes.

Thanks for the explanation. I think I understood it to some degree.
What happens with the subgraphs that are not canonicalized? They will have extra vector operations, right?

Yes. For av.neg().lanewise(VectorOperators.FMA, bv, cv, mask), the subgraph is like:
match (Set dst (FmaV (Binary (NegV src1) src2) (Binary src3 pg)));, almost no platform supports fusing it directly, so it should be split into two vector operations: NegV + FmaV. I suppose the NegV is what you called as "the extra vector operation", right?

Yes that's what I meant. Thanks. Now on PPC, my understanding would be that with the symmetrical match-rules (removed with this pr) the NegV wouldn't be generated. Is my understanding correct?

Thanks. How are FmaV nodes with mask handled then? Are they transformed into equivalent nodes without mask?

Actually, there is no handling on FmaV nodes with mask in this patch, whether in the C2 mid-end or codegen backend. The gvn transformation just skips them. And I suppose FmaV nodes with mask can't be transformed into nodes without mask, except that C2 can guarantee that the mask is all true (this transformation has not been supported by current C2). Thanks.

I guess I would have to dig deeper into the vector api implementation to really understand how it works.
Thanks for your patience.

@fg1417 I only understood the comment with the help of your explanations in this thread. I think you should improve the comment. I would not mention the vectorapi. We may generate FmaV through an auto-vectorizer. Though I guess that is unlikely, since the scalar version Fma::Ideal would already reshape things.

Suggestion:

// We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" // This reduces the number of rules in the matcher, as we only need to check // for negations on the second argument, and not the symmetric case where // the first argument is negated. // We cannot do this if he FmaV is masked. the inactive lanes have to return // the first input (ie "-a"). If we were to swap the inputs, the inactive lanes would // incorrectly return "b".

Thanks! Done.

reinrich

I've added the patch to our nightly testing about 10 days ago and forgot about it (sorry).
So it passed several iterations of tier1-4 of hotspot and jdk, all of langtools and jaxp, renaissance benchmarks as functional tests. All testing with fastdebug and release builds on the main platforms and also on Linux/PPC64le.

PPC changes do look good to me. I'm not the greatest C2 expert though. So I'd suggest to get another review.

fg1417 · 2023-07-18T01:39:41Z

I've added the patch to our nightly testing about 10 days ago and forgot about it (sorry). So it passed several iterations of tier1-4 of hotspot and jdk, all of langtools and jaxp, renaissance benchmarks as functional tests. All testing with fastdebug and release builds on the main platforms and also on Linux/PPC64le.

PPC changes do look good to me. I'm not the greatest C2 expert though. So I'd suggest to get another review.

Thanks a lot for your review and test work @reinrich!

Can I get a review from @TobiHartmann @chhagedorn @eme64 for C2 part? Thanks!

RealFYang · 2023-07-20T09:42:36Z

Thanks for your review @vnkozlov .

I would appreciate it very much if some expert on ppc or riscv could help review it! Perhaps @RealFYang @reinrich

Hello, the RISC-V part looks fine from what this PR is supposed to do. And this has passed tier1-3 tests on linux-riscv64 platform. Note that I didn't check the shared code changes.

fg1417 · 2023-07-20T09:43:56Z

Hello, the RISC-V part looks fine from what this PR is supposed to do. And this has passed tier1-3 tests on linux-riscv64 platform. I didn't check the shared code changes.

@RealFYang Thanks for your review and test work!

eme64

@fg1417 This looks like a reasonable refactoring.

We should probably do the verification that the canonicalization happened, if the normal fma matcher rule is chosen. We should add asserts that the first argument is not a negation (you could check the second argument also, just in case). What do you think?

eme64 · 2023-08-10T12:50:17Z

test/hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java

    private static final VectorSpecies<Short> S_SPECIES = ShortVector.SPECIES_MAX;

-    private static int LENGTH = 1024;
+    private static int LENGTH = 128;


What is the reason for the reduction? Speed?

Yes, it's for speeding up.

eme64 · 2023-08-10T13:13:18Z

src/hotspot/share/opto/vectornode.cpp

 }

+Node* FmaVNode::Ideal(PhaseGVN* phase, bool can_reshape) {
+  // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c"


@fg1417 I only understood the comment with the help of your explanations in this thread. I think you should improve the comment. I would not mention the vectorapi. We may generate FmaV through an auto-vectorizer. Though I guess that is unlikely, since the scalar version Fma::Ideal would already reshape things.

Suggestion:

// We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" // This reduces the number of rules in the matcher, as we only need to check // for negations on the second argument, and not the symmetric case where // the first argument is negated. // We cannot do this if he FmaV is masked. the inactive lanes have to return // the first input (ie "-a"). If we were to swap the inputs, the inactive lanes would // incorrectly return "b".

eme64 · 2023-08-10T13:16:04Z

src/hotspot/share/opto/mulnode.cpp

+//=============================================================================
+//------------------------------Ideal------------------------------------------
+Node* FmaNode::Ideal(PhaseGVN* phase, bool can_reshape) {
+  // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c"


Add motivation to comment

// This reduces the number of rules in the matcher, as we only need to check // for negations on the second argument, and not the symmetric case where // the first argument is negated.

Thanks! Done.

eme64 · 2023-08-10T13:19:06Z

src/hotspot/cpu/x86/x86.ad


 // a * b + c
 instruct fmaF_reg(regF a, regF b, regF c) %{
-  predicate(UseFMA);


You could add an assert to the encoding code. Just to ensure that we do not generate bad code, even if it is never executed during testing.

Yes, thanks for your suggestion! Updated in the new commit.

fg1417 · 2023-08-14T07:46:58Z

We should probably do the verification that the canonicalization happened, if the normal fma matcher rule is chosen. We should add asserts that the first argument is not a negation (you could check the second argument also, just in case). What do you think?

Hi @eme64, thanks for your review!

The check may be more complex than expected. Matcher can fuse two instructions into one, only when there is no other use for the inputs. It means that we can do the fusion for the case like:

return Math.fma(-a, b, c);

But we can't fuse them for the case like:

float tmp = -a;
return Math.fma(tmp, b, c) + (-a);

For the second case, we still match normal neg + fma rules separately, instead of these combined rules. So, we can't simply guarantee that the first argument is not a negation when the normal fma matcher rule is chosen. If considering the def-use while doing the instruction selection, the check may be complex. WDYT? Thanks.

eme64

@fg1417 fair enough. the checks may be too complex. I'll approve it now :)

fg1417 · 2023-08-14T14:15:35Z

Thanks a lot for all your kind reviews and test work, @RealFYang @vnkozlov @eme64 @reinrich.

I'll integrate it.

fg1417 · 2023-08-15T01:02:57Z

/integrate

openjdk · 2023-08-15T01:04:29Z

Going to push as commit 37c6b23.
Since your change was applied there have been 23 commits pushed to the master branch:

583cb75: 8313406: nep_invoker_blob can be simplified more
0074b48: 8312597: Convert TraceTypeProfile to UL
1f1c5c6: 8314241: Add test/jdk/sun/security/pkcs/pkcs7/SignerOrder.java to ProblemList
f142470: 8311981: Test gc/stringdedup/TestStringDeduplicationAgeThreshold.java#ZGenerational timed out
595fdd3: 8314059: Remove PKCS7.verify()
49b2984: 8313854: Some tests in serviceability area fail on localized Windows platform
c132176: 8114830: (fs) Files.copy fails due to interference from something else changing the file system
e56d3bc: 8313657: com.sun.jndi.ldap.Connection.cleanup does not close connections on SocketTimeoutErrors
4b2703a: 8313678: SymbolTable can leak Symbols during cleanup
f41c267: 8314045: ArithmeticException in GaloisCounterMode
... and 13 more: https://git.openjdk.org/jdk/compare/ec0cc6300a02dd92b25d9072b8b3859dab583bbd...master

Your commit was automatically rebased without conflicts.

openjdk · 2023-08-15T01:04:38Z

@fg1417 Pushed as commit 37c6b23.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk bot added the rfr Pull request is ready for review label Jun 21, 2023

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 21, 2023

vnkozlov reviewed Jun 21, 2023

View reviewed changes

Fei Gao added 2 commits June 27, 2023 03:03

Merge branch 'master' into fg8308340

972e345

Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supp…

a22814d

…orted in .ad files

vnkozlov approved these changes Jun 27, 2023

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jun 27, 2023

Merge branch 'master' into fg8308340

06162d8

Merge branch 'master' into fg8308340

8d6d98e

reinrich reviewed Jul 17, 2023

View reviewed changes

Merge branch 'master' into fg8308340

33e7fb8

eme64 reviewed Aug 10, 2023

View reviewed changes

Fei Gao added 2 commits August 14, 2023 06:19

Merge branch 'master' into fg8308340

72a8002

Improve comments and add assertions

7b6c928

eme64 approved these changes Aug 14, 2023

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Aug 15, 2023

openjdk bot closed this Aug 15, 2023

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 15, 2023

Conversation

fg1417 commented Jun 21, 2023 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Webrev

Uh oh!

bridgekeeper bot commented Jun 21, 2023

Uh oh!

openjdk bot commented Jun 21, 2023

Uh oh!

mlbridge bot commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

fg1417 commented Jun 27, 2023

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fg1417 commented Jun 28, 2023

Uh oh!

fg1417 commented Jul 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fg1417 Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reinrich left a comment

Choose a reason for hiding this comment

Uh oh!

fg1417 commented Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RealFYang commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fg1417 commented Jul 20, 2023

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

fg1417 commented Jun 21, 2023 •

edited by openjdk bot

Loading

mlbridge bot commented Jun 21, 2023 •

edited

Loading

openjdk bot commented Jun 27, 2023 •

edited

Loading

fg1417 commented Jul 5, 2023 •

edited

Loading

fg1417 Jul 19, 2023 •

edited

Loading

fg1417 commented Jul 18, 2023 •

edited

Loading

RealFYang commented Jul 20, 2023 •

edited

Loading