-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8321308: AArch64: Fix matching predication for cbz/cbnz #16989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For array length check like: ``` if (a.length > 0) { [Block 1] } else { [Block 2] } ``` Since `a.length` is unsigned, it's semantically equivalent to: ``` if (a.length != 0) { [Block 1] } else { [Block 2] } ``` On aarch64 port, we can do the conversion like above, during c2 compiler instruction matching, for certain unsigned integral comparisons. For example, ``` cmpw w11, #0 # unsigned bls label # unsigned [Block 1] label: [Block 2] ``` can be converted to: ``` cbz w11, label [Block 1] label: [Block 2] ``` Currently, we have some matching rules to do the conversion[1]. But the predicate here[2] matches wrong `BoolTest` masks, so these rules fail to convert. I guess it's a typo introduced in JDK-8160006. The patch fixes it. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L16179 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L6140
👋 Welcome back fgao! A progress list of the required criteria for merging this PR into |
Webrevs
|
Suppose the GHA failure of |
@fg1417 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@fg1417 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 92 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Now the predicate the same as riscv's cmpOpUEqNeLeGt
operand.
I'll let this run through our testing and report back once it passed. |
All tests passed. |
src/hotspot/cpu/aarch64/aarch64.ad
Outdated
@@ -16185,15 +16185,15 @@ instruct cmpUI_imm0_branch(cmpOpUEqNeLtGe cmp, iRegIorL2I op1, immI0 op2, label | |||
ins_encode %{ | |||
Label* L = $labl$$label; | |||
Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode; | |||
if (cond == Assembler::EQ || cond == Assembler::LS) | |||
if (cond == Assembler::EQ || cond == Assembler::LE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is an explicitly unsigned comparison CmpU
yet it has been converted into an explicitly signed Assembler::LE
? Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is confusing. Why not fix cmpOpUEqNeLeGt so that it uses the unsigned condition code values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dean-long I am not sure what you mean by 'so that it uses the unsigned condition code values'. Are you suggesting that it would be inappropriate for a CmpU
node to employ an le
condition? That is certainly a possibility that arises with the current opto code.
n.b. Field '$cmp$$cmpcode' of the CmpUNode is actually a value of the enum type BoolTest
. So, strictly the cast on line 16187 ought to perform the conversion using
Assembler::Condition cond = to_assembler_cond(cmp$$cmpcode);
but the result will be the same since the values in enum Assembler::Condition
mirror those in enum BoolTest
.
In this specific case the compiler installs BoolTest::gt
into the CmpUNode
during bytecode parsing because that is the test specified in the example code provided by @fg1417. This condition will be flipped to BoolTest::le
as part of normalization of the resulting IfNode
(the true and false branches attached to the if are switched as part of this change). So, after parsing and normalization we end up with a node that looks like CmpU[le](LoadRange, IntConstant(0))
and this is what is embedded as an operand of the If
node that gets considered as a match candidate by the current rule.
So, while there is nothing to stop an unsigned comparison passed through to the back end from employing a BoolTest::le
comparison against 0, by contrast the compiler should never pass through a BoolTest::lt
unsigned comparison against 0 because it will always evaluate to false. I am not certain but I believe from my brief look at the code that this case gets detected as part of the IfNode normalization and the if is eliminated as a dead node.
Granted then that the back end can see CmpU[le](op1, IntConstant(0))
, the current operand and rule do not cover that case. Instead of LE
they actually test for an LT
against 0 -- which we are never going to match. If we change the rule to match for LE
then it will match and insert an appropriate compare and branch. With the current match failure the CmpU
and If
get translated independently leading to the inefficiency @fg1417 describes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, sorry for the confusion, that's not what I meant. I'm OK with the backend using BoolTest values, but if it is going to use Assembler::Condition values, why not use LO/LS/HI/HS like operand cmpOpU does? To be specific, I'm talking about the interface(COND_INTER) table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dean-long Ah, apologies for missing your point. Yes, I looked into how the generated ad code uses the interface definitions to translate the BoolTest codes and what you say makes sense.
Operand CmpOpUEqNeLeGt
really ought to be translating the BoolTest
values using the same translation scheme as CmpOpU
i.e.
0 == BoolTest::eq --> equal --> 0 == Assembler::Condition::EQ
1 == BoolTest::gt --> greater --> 8 == Assembler::Condition::HI
3 == BoolTest::lt --> less --> 3 == Assembler::Condition::LO
4 == BoolTest::ne --> not_equal --> 1 == Assembler::Condition::NE
5 == BoolTest::le --> less_equal --> 9 == Assembler::Condition::LS
7 == BoolTest::ge --> greater_equal --> 2 == Assembler::Condition::HS
2 == BoolTest::overflow --> overflow --> 6 == Assembler::Condition::VS
6 == BoolTest::no_overflow --> no_overflow --> 7 == Assembler::Condition::VC
These are the correct codes for an unsigned comparison whether or not it is comparing two values or one value against zero.
The current (revised) definition of CmpOpUEqNeLeGt
is translating the BoolTest
values as follows:
0 == BoolTest::eq --> equal --> 0 == Assembler::Condition::EQ
1 == BoolTest::gt --> greater --> 12 == Assembler::Condition::GT
3 == BoolTest::lt --> less --> 11 == Assembler::Condition::LT
4 == BoolTest::ne --> not_equal --> 1 == Assembler::Condition::NE
5 == BoolTest::le --> less_equal --> 13 == Assembler::Condition::LE
7 == BoolTest::ge --> greater_equal --> 10 == Assembler::Condition::GE
2 == BoolTest::overflow --> overflow --> 6 == Assembler::Condition::VS
6 == BoolTest::no_overflow --> no_overflow --> 7 == Assembler::Condition::VC
The amended rules propose that when we have EQ
or LE
we generate cbz
and otherwise (when we have NE
or GT
) we generate cbnz
. However, the rules do not really need changing.
If we amend the interface(COND_ITER)
definition for CmpOpUEqNeLeGt
so it is the same as the one provided for CmpOpU
then the BoolTest
conditions get translated into the correct Assembler conditions for an unsigned comparison.
operand cmpOpUEqNeLeGt()
%{
match(Bool);
op_cost(0);
predicate(n->as_Bool()->_test._test == BoolTest::eq
|| n->as_Bool()->_test._test == BoolTest::ne
|| n->as_Bool()->_test._test == BoolTest::le
|| n->as_Bool()->_test._test == BoolTest::gt);
format %{ "" %}
interface(COND_INTER) %{
equal(0x0, "eq");
not_equal(0x1, "ne");
less(0x3, "lo");
greater_equal(0x2, "hs");
less_equal(0x9, "ls");
greater(0x8, "hi");
overflow(0x6, "vs");
no_overflow(0x7, "vc");
%}
%}
The above change means both the current rules that perform unsigned compare against zero are correct. When we find EQ
or LS
we generate cbz
and otherwise (meaning we have NE
or HI
) we would generate cbnz
e.g.
instruct cmpUI_imm0_branch(cmpOpUEqNeLtGe cmp, iRegIorL2I op1, immI0 op2, label labl, rFlagsRegU cr) %{
match(If cmp (CmpU op1 op2));
effect(USE labl);
ins_cost(BRANCH_COST);
format %{ "cbw$cmp $op1, $labl" %}
ins_encode %{
Label* L = $labl$$label;
Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
if (cond == Assembler::EQ || cond == Assembler::LS)
__ cbzw($op1$$Register, *L);
else
assert (cond == Assembler::NE || cond == Assembler::HI);
__ cbnzw($op1$$Register, *L);
%}
ins_pipe(pipe_cmp_branch);
%}
So, neither of the rules needs 'fixing'. They would only benefit from one change, the addition of an assert for the else case (as shown above).
@fg1417 I agree that this change is correct. Clearly when we have an unsigned compare an As you correctly note, the error was introduced by the change I made as part of JDK-8160006 where the original in-rule occurrences of the predicate tested for |
src/hotspot/cpu/aarch64/aarch64.ad
Outdated
@@ -16185,15 +16185,15 @@ instruct cmpUI_imm0_branch(cmpOpUEqNeLtGe cmp, iRegIorL2I op1, immI0 op2, label | |||
ins_encode %{ | |||
Label* L = $labl$$label; | |||
Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode; | |||
if (cond == Assembler::EQ || cond == Assembler::LS) | |||
if (cond == Assembler::EQ || cond == Assembler::LE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is confusing. Why not fix cmpOpUEqNeLeGt so that it uses the unsigned condition code values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface definition needs revising to translate the input BoolTest codes to the same Assembler Condition codes as are used for OpCmpU.
@fg1417 Regarding the rework, see my response to @dean-long which explains how the interface for |
@fg1417 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@fg1417 This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
/open |
@fg1417 This pull request is now open |
Thanks for all your comments.
In the new commit, I redefined the interface for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Fei. Looks good.
Thanks @adinn . Can I have a second review for the latest commit please? @dean-long @theRealAph @RealFYang @TobiHartmann |
Thanks for all your reviews and comments. I'll integrate it. |
/integrate |
Going to push as commit 2c9185e.
Your commit was automatically rebased without conflicts. |
For array length check like:
Since
a.length
is unsigned, it's semantically equivalent to:On aarch64 port, we can do the conversion like above, during c2 compiler instruction matching, for certain unsigned integral comparisons.
For example,
can be converted to:
Currently, we have some matching rules to do the conversion [1]. But the predicate here [2] matches wrong
BoolTest
masks, so these rules fail to convert. I guess it's a typo introduced in JDK-8160006. The patch fixes it.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16989/head:pull/16989
$ git checkout pull/16989
Update a local copy of the PR:
$ git checkout pull/16989
$ git pull https://git.openjdk.org/jdk.git pull/16989/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16989
View PR using the GUI difftool:
$ git pr show -t 16989
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16989.diff
Webrev
Link to Webrev Comment