Skip to content

8321308: AArch64: Fix matching predication for cbz/cbnz #16989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

fg1417
Copy link

@fg1417 fg1417 commented Dec 6, 2023

For array length check like:

  if (a.length > 0) {
    [Block 1]
  } else {
    [Block 2]
  }

Since a.length is unsigned, it's semantically equivalent to:

  if (a.length != 0) {
    [Block 1]
  } else {
    [Block 2]
  }

On aarch64 port, we can do the conversion like above, during c2 compiler instruction matching, for certain unsigned integral comparisons.

For example,

cmpw  w11, #0 # unsigned
bls   label   # unsigned
[Block 1]

label:
[Block 2]

can be converted to:

cbz  w11, label
[Block 1]

label:
[Block 2]

Currently, we have some matching rules to do the conversion [1]. But the predicate here [2] matches wrong BoolTest masks, so these rules fail to convert. I guess it's a typo introduced in JDK-8160006. The patch fixes it.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8321308: AArch64: Fix matching predication for cbz/cbnz (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16989/head:pull/16989
$ git checkout pull/16989

Update a local copy of the PR:
$ git checkout pull/16989
$ git pull https://git.openjdk.org/jdk.git pull/16989/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16989

View PR using the GUI difftool:
$ git pr show -t 16989

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16989.diff

Webrev

Link to Webrev Comment

For array length check like:
```
  if (a.length > 0) {
    [Block 1]
  } else {
    [Block 2]
  }
```

Since `a.length` is unsigned, it's semantically equivalent to:
```
  if (a.length != 0) {
    [Block 1]
  } else {
    [Block 2]
  }
```

On aarch64 port, we can do the conversion like above, during c2
compiler instruction matching, for certain unsigned integral
comparisons.

For example,
```
cmpw  w11, #0 # unsigned
bls   label   # unsigned
[Block 1]

label:
[Block 2]
```

can be converted to:
```
cbz  w11, label
[Block 1]

label:
[Block 2]
```

Currently, we have some matching rules to do the conversion[1].
But the predicate here[2] matches wrong `BoolTest` masks,
so these rules fail to convert. I guess it's a typo introduced
in JDK-8160006. The patch fixes it.

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L16179
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L6140
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 6, 2023

👋 Welcome back fgao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 6, 2023
@openjdk
Copy link

openjdk bot commented Dec 6, 2023

@fg1417 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 6, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 6, 2023

Webrevs

@fg1417
Copy link
Author

fg1417 commented Dec 6, 2023

Suppose the GHA failure of java/util/stream/GathererTest on linux-x86 is not caused by the patch :)

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 3, 2024

@fg1417 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk
Copy link

openjdk bot commented Jan 26, 2024

@fg1417 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8321308: AArch64: Fix matching predication for cbz/cbnz

Reviewed-by: fyang, adinn, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 92 new commits pushed to the master branch:

  • ef101f1: 8332920: C2: Partial Peeling is wrongly applied for CmpU with negative limit
  • 2843745: 8333972: Parallel: Remove unused methods in PSOldGen
  • 93f3918: 8333954: Parallel: Remove unused arguments of type ParCompactionManager*
  • 788b876: 8333917: G1: Refactor G1CollectedHeap::register_old_region_with_region_attr
  • 0e4d4a0: 8320725: AArch64: C2: Add "requires_strict_order" flag for floating-point add and mul reduction
  • badf1cb: 8331675: gtest CollectorPolicy.young_min_ergo_vm fails after 8272364
  • 4d6064a: 8333649: Allow different NativeCall encodings
  • fe9c63c: 8333931: Problemlist serviceability/jvmti/vthread/CarrierThreadEventNotification
  • 41c88bc: 8333756: java/lang/instrument/NativeMethodPrefixApp.java failed due to missing intrinsic
  • 3a01b47: 8330205: Initial troff manpage generation for JDK 24
  • ... and 82 more: https://git.openjdk.org/jdk/compare/3cbdf8d4d4604c92d3760ba4e069216564306bcf...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 26, 2024
Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Now the predicate the same as riscv's cmpOpUEqNeLeGt operand.

@TobiHartmann
Copy link
Member

I'll let this run through our testing and report back once it passed.

@TobiHartmann
Copy link
Member

All tests passed.

@@ -16185,15 +16185,15 @@ instruct cmpUI_imm0_branch(cmpOpUEqNeLtGe cmp, iRegIorL2I op1, immI0 op2, label
ins_encode %{
Label* L = $labl$$label;
Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
if (cond == Assembler::EQ || cond == Assembler::LS)
if (cond == Assembler::EQ || cond == Assembler::LE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is an explicitly unsigned comparison CmpU yet it has been converted into an explicitly signed Assembler::LE? Is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is confusing. Why not fix cmpOpUEqNeLeGt so that it uses the unsigned condition code values?

Copy link
Contributor

@adinn adinn Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dean-long I am not sure what you mean by 'so that it uses the unsigned condition code values'. Are you suggesting that it would be inappropriate for a CmpU node to employ an le condition? That is certainly a possibility that arises with the current opto code.

n.b. Field '$cmp$$cmpcode' of the CmpUNode is actually a value of the enum type BoolTest. So, strictly the cast on line 16187 ought to perform the conversion using

Assembler::Condition cond = to_assembler_cond(cmp$$cmpcode);

but the result will be the same since the values in enum Assembler::Condition mirror those in enum BoolTest.

In this specific case the compiler installs BoolTest::gt into the CmpUNode during bytecode parsing because that is the test specified in the example code provided by @fg1417. This condition will be flipped to BoolTest::le as part of normalization of the resulting IfNode (the true and false branches attached to the if are switched as part of this change). So, after parsing and normalization we end up with a node that looks like CmpU[le](LoadRange, IntConstant(0)) and this is what is embedded as an operand of the If node that gets considered as a match candidate by the current rule.

So, while there is nothing to stop an unsigned comparison passed through to the back end from employing a BoolTest::le comparison against 0, by contrast the compiler should never pass through a BoolTest::lt unsigned comparison against 0 because it will always evaluate to false. I am not certain but I believe from my brief look at the code that this case gets detected as part of the IfNode normalization and the if is eliminated as a dead node.

Granted then that the back end can see CmpU[le](op1, IntConstant(0)), the current operand and rule do not cover that case. Instead of LE they actually test for an LT against 0 -- which we are never going to match. If we change the rule to match for LE then it will match and insert an appropriate compare and branch. With the current match failure the CmpU and If get translated independently leading to the inefficiency @fg1417 describes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry for the confusion, that's not what I meant. I'm OK with the backend using BoolTest values, but if it is going to use Assembler::Condition values, why not use LO/LS/HI/HS like operand cmpOpU does? To be specific, I'm talking about the interface(COND_INTER) table.

Copy link
Contributor

@adinn adinn Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dean-long Ah, apologies for missing your point. Yes, I looked into how the generated ad code uses the interface definitions to translate the BoolTest codes and what you say makes sense.

Operand CmpOpUEqNeLeGt really ought to be translating the BoolTest values using the same translation scheme as CmpOpU i.e.

0 == BoolTest::eq          --> equal         --> 0 == Assembler::Condition::EQ
1 == BoolTest::gt          --> greater       --> 8 == Assembler::Condition::HI
3 == BoolTest::lt          --> less          --> 3 == Assembler::Condition::LO
4 == BoolTest::ne          --> not_equal     --> 1 == Assembler::Condition::NE
5 == BoolTest::le          --> less_equal    --> 9 == Assembler::Condition::LS
7 == BoolTest::ge          --> greater_equal --> 2 == Assembler::Condition::HS
2 == BoolTest::overflow    --> overflow      --> 6 == Assembler::Condition::VS
6 == BoolTest::no_overflow --> no_overflow   --> 7 == Assembler::Condition::VC

These are the correct codes for an unsigned comparison whether or not it is comparing two values or one value against zero.

The current (revised) definition of CmpOpUEqNeLeGt is translating the BoolTest values as follows:

0 == BoolTest::eq          --> equal         -->  0 == Assembler::Condition::EQ
1 == BoolTest::gt          --> greater       --> 12 == Assembler::Condition::GT
3 == BoolTest::lt          --> less          --> 11 == Assembler::Condition::LT
4 == BoolTest::ne          --> not_equal     -->  1 == Assembler::Condition::NE
5 == BoolTest::le          --> less_equal    --> 13 == Assembler::Condition::LE
7 == BoolTest::ge          --> greater_equal --> 10 == Assembler::Condition::GE
2 == BoolTest::overflow    --> overflow      -->  6 == Assembler::Condition::VS
6 == BoolTest::no_overflow --> no_overflow   -->  7 == Assembler::Condition::VC

The amended rules propose that when we have EQ or LE we generate cbz and otherwise (when we have NE or GT) we generate cbnz. However, the rules do not really need changing.

If we amend the interface(COND_ITER) definition for CmpOpUEqNeLeGt so it is the same as the one provided for CmpOpU then the BoolTest conditions get translated into the correct Assembler conditions for an unsigned comparison.

operand cmpOpUEqNeLeGt()
%{
  match(Bool);
  op_cost(0);

  predicate(n->as_Bool()->_test._test == BoolTest::eq
            || n->as_Bool()->_test._test == BoolTest::ne
            || n->as_Bool()->_test._test == BoolTest::le
            || n->as_Bool()->_test._test == BoolTest::gt);

  format %{ "" %}
  interface(COND_INTER) %{
    equal(0x0, "eq");
    not_equal(0x1, "ne");
    less(0x3, "lo");
    greater_equal(0x2, "hs");
    less_equal(0x9, "ls");
    greater(0x8, "hi");
    overflow(0x6, "vs");
    no_overflow(0x7, "vc");
  %}
%}

The above change means both the current rules that perform unsigned compare against zero are correct. When we find EQ or LS we generate cbz and otherwise (meaning we have NE or HI) we would generate cbnz e.g.

instruct cmpUI_imm0_branch(cmpOpUEqNeLtGe cmp, iRegIorL2I op1, immI0 op2, label labl, rFlagsRegU cr) %{
  match(If cmp (CmpU op1 op2));
  effect(USE labl);

  ins_cost(BRANCH_COST);
  format %{ "cbw$cmp   $op1, $labl" %}
  ins_encode %{
    Label* L = $labl$$label;
    Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
    if (cond == Assembler::EQ || cond == Assembler::LS)
      __ cbzw($op1$$Register, *L);
    else
      assert (cond == Assembler::NE || cond == Assembler::HI);
      __ cbnzw($op1$$Register, *L);
  %}
  ins_pipe(pipe_cmp_branch);
%}

So, neither of the rules needs 'fixing'. They would only benefit from one change, the addition of an assert for the else case (as shown above).

@adinn
Copy link
Contributor

adinn commented Feb 8, 2024

@fg1417 I agree that this change is correct. Clearly when we have an unsigned compare an LS test is never going to be true and ought never to be generated. So this rule is actually missing the relevant case of an LE test.

As you correctly note, the error was introduced by the change I made as part of JDK-8160006 where the original in-rule occurrences of the predicate tested for BoolTest::le. This was mistranscribed into the operand name as Lt and into the operand predicate as a check against constant BoolTest::lt. The BoolTest::lt constants was later replaced by the equivalent value Assembler::LS.

@@ -16185,15 +16185,15 @@ instruct cmpUI_imm0_branch(cmpOpUEqNeLtGe cmp, iRegIorL2I op1, immI0 op2, label
ins_encode %{
Label* L = $labl$$label;
Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
if (cond == Assembler::EQ || cond == Assembler::LS)
if (cond == Assembler::EQ || cond == Assembler::LE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is confusing. Why not fix cmpOpUEqNeLeGt so that it uses the unsigned condition code values?

Copy link
Contributor

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface definition needs revising to translate the input BoolTest codes to the same Assembler Condition codes as are used for OpCmpU.

@adinn
Copy link
Contributor

adinn commented Feb 12, 2024

@fg1417 Regarding the rework, see my response to @dean-long which explains how the interface for cmpOpUEqNeLeGt should be redefined (also how the rules can be retained as currently defined).

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 12, 2024

@fg1417 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label Apr 17, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 17, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented May 15, 2024

@fg1417 This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this May 15, 2024
@bridgekeeper bridgekeeper bot removed the oca Needs verification of OCA signatory status label May 15, 2024
@fg1417
Copy link
Author

fg1417 commented May 15, 2024

/open

@openjdk openjdk bot reopened this May 15, 2024
@openjdk
Copy link

openjdk bot commented May 15, 2024

@fg1417 This pull request is now open

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 15, 2024
@fg1417
Copy link
Author

fg1417 commented Jun 6, 2024

Thanks for all your comments.

@fg1417 Regarding the rework, see my response to @dean-long which explains how the interface for cmpOpUEqNeLeGt should be redefined (also how the rules can be retained as currently defined).

In the new commit, I redefined the interface for cmpOpUEqNeLeGt and also kept the rules besides adding assertion lines. Thanks @adinn .

Copy link
Contributor

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Fei. Looks good.

@fg1417
Copy link
Author

fg1417 commented Jun 11, 2024

Thanks, Fei. Looks good.

Thanks @adinn .

Can I have a second review for the latest commit please? @dean-long @theRealAph @RealFYang @TobiHartmann

@fg1417
Copy link
Author

fg1417 commented Jun 11, 2024

Thanks for all your reviews and comments.

I'll integrate it.

@fg1417
Copy link
Author

fg1417 commented Jun 12, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Jun 12, 2024

Going to push as commit 2c9185e.
Since your change was applied there have been 108 commits pushed to the master branch:

  • 5a8a9fd: 8333382: [s390x] Enhance popcnt Instruction to use Z15 facilities
  • 81083a0: 8299487: Test java/net/httpclient/whitebox/SSLTubeTestDriver.java timed out
  • 81ca0ec: 8334028: HttpClient: NPE thrown from assert statement
  • bd750b6: 8319933: Disable tests for JDK-8280481 on Graal
  • c80e2eb: 8333886: Explicitly specify that asSlice and reinterpret return a memory segment backed by the same region of memory.
  • a0318bc: 8334077: Fix problem list entries for compiler tests
  • a7e4ab9: 8333730: ubsan: FieldIndices/libFieldIndicesTest.cpp:276:11: runtime error: null pointer passed as argument 2, which is declared to never be null
  • abbf45b: 8332699: ubsan: jfrEventSetting.inline.hpp:31:43: runtime error: index 163 out of bounds for type 'jfrNativeEventSetting [162]'
  • bd046d9: 8222884: ConcurrentClassDescLookup.java times out intermittently
  • 1c80ddb: 8333940: Ensure javax/swing/TestUngrab.java run on all platforms
  • ... and 98 more: https://git.openjdk.org/jdk/compare/3cbdf8d4d4604c92d3760ba4e069216564306bcf...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 12, 2024
@openjdk openjdk bot closed this Jun 12, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 12, 2024
@openjdk
Copy link

openjdk bot commented Jun 12, 2024

@fg1417 Pushed as commit 2c9185e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

6 participants