-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8301739: AArch64: Add optimized rules for vector compare with immediate for SVE #13200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…te for SVE We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)[1] to avoid the extra scalar2vector operations. The following instruction sequence ``` movi v17.16b, openjdk#12 cmpgt p0.b, p7/z, z16.b, z17.b ``` can be optimized to: ``` cmpgt p0.b, p7/z, z16.b, openjdk#12 ``` This patch does the following: 1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend. SVE cmp<cc>(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to 127) or 5bit signed integer immediate (range from -16 to 15). 2. Add optimized match rules to generate the compare-with-immediate instructions. [1]: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/CMP-cc---immediate---Compare-vector-to-immediate- TEST_LABEL: v1 || n2, aarch64&&ubuntu&&conformance-enabled JDK_SCOPE: hotspot:compiler/vectorapi, jdk:jdk/incubator/vector/ Jira: ENTLLT-5294 Change-Id: I6b915864308faf9a8ec6e35ca1b4948666d75dca
👋 Welcome back changpeng1997! A progress list of the required criteria for merging this PR into |
@changpeng1997 The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
@theRealAph |
src/hotspot/cpu/aarch64/aarch64.ad
Outdated
// BoolTest condition for unsigned compare | ||
operand immI_cmpU_cond() | ||
%{ | ||
predicate(n->get_int() > (int)(BoolTest::unsigned_compare)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is in x86.ad. I suggest you move it to shared code, then use it here.
static inline bool is_unsigned_booltest_pred(int bt) {
return ((bt & BoolTest::unsigned_compare) == BoolTest::unsigned_compare);
}
src/hotspot/cpu/aarch64/aarch64.ad
Outdated
// 5 bit signed integer | ||
operand immI5() | ||
%{ | ||
predicate(((-(1 << 4)) <= n->get_int()) && (n->get_int() < (1 << 4))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point someone must realize that
((-(1 << size)) <= n->get_int()) && (n->get_int() < (1 << size))
could be a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point someone must realize that
((-(1 << size)) <= n->get_int()) && (n->get_int() < (1 << size))
could be a function.
I think this function can make this predicate more clearly, but I found there are many predicates to modify in aarch64.ad.
I suggest using an extra patch to add this function and modify all similar predicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have AbstractAssembler::is_simm(int64_t, uint)
which does exactly this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have
AbstractAssembler::is_simm(int64_t, uint)
which does exactly this.
Thanks.
if (cond == HI || cond == HS || cond == LO || cond == LS) { | ||
guarantee(0 <= imm && imm <= 127, "invalid immediate"); | ||
switch(cond) { | ||
case HI: cond_op = 0b01; break; | ||
case HS: cond_op = 0b00; break; | ||
case LO: cond_op = 0b10; break; | ||
case LS: cond_op = 0b11; break; | ||
default: | ||
ShouldNotReachHere(); | ||
} | ||
f(0b00100100, 31, 24), f(T, 23, 22), f(0b1, 21), f(imm, 20, 14), | ||
f((cond_op >> 1) & 0x1, 13), pgrf(Pg, 10), rf(Zn, 5); | ||
f(cond_op & 0x1, 4), prf(Pd, 0); | ||
} else { | ||
guarantee(-16 <= imm && imm <= 15, "invalid immediate"); | ||
switch(cond) { | ||
case EQ: cond_op = 0b1000; break; | ||
case NE: cond_op = 0b1001; break; | ||
case GE: cond_op = 0b0000; break; | ||
case GT: cond_op = 0b0001; break; | ||
case LE: cond_op = 0b0011; break; | ||
case LT: cond_op = 0b0010; break; | ||
default: | ||
ShouldNotReachHere(); | ||
} | ||
f(0b00100101, 31, 24), f(T, 23, 22), f(0b0, 21), sf(imm, 20, 16), | ||
f((cond_op >> 1) & 0x7, 15, 13), pgrf(Pg, 10), rf(Zn, 5); | ||
f(cond_op & 0x1, 4), prf(Pd, 0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire block of code needs extensive refactoring.
Please write a function from Condition -> int. Use a simple boolean is_unsigned. Extract the common code from the two arms of this if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire block of code needs extensive refactoring.
Please write a function from Condition -> int. Use a simple boolean is_unsigned. Extract the common code from the two arms of this if statement.
Ok, thanks.
@@ -4318,7 +4318,18 @@ operand immI_positive() | |||
// BoolTest condition for signed compare | |||
operand immI_cmp_cond() | |||
%{ | |||
predicate(n->get_int() < (int)(BoolTest::unsigned_compare)); | |||
predicate(!Matcher::is_unsigned_booltest_pred(n->get_int())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate(!Matcher::is_unsigned_booltest_pred(n->get_int())); | |
predicate(! Matcher::is_unsigned_booltest_pred(n->get_int())); |
%} | ||
ins_pipe(pipe_slow); | ||
%}')dnl | ||
VMASKCMP_SVE_IMM_I(immI5, cmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky to review because the two macros here seems to be almost, but not exactly, the same. Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch adds rules for vector comparing with immediate. These immediate may have different types and this manifests as different ConNodes in middle-end (ConI and ConL). ConI and ConL have different methods to get the value, i.e., get_int() for ConI and get_long() for ConL. We should use this value in predicate, so I set two macros, one for integer (byte, short and int) and another one for long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please show me where these differences appear. I can't see them.
%} | ||
ins_pipe(pipe_slow); | ||
%}')dnl | ||
VMASKCMP_SVE_IMM_I(immI5, cmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first marco, imm is ConINode.
VMASKCMP_SVE_IMM_I(immI5, cmp) | ||
VMASKCMP_SVE_IMM_L(immL5, cmp) | ||
VMASKCMP_SVE_IMM_I(immIU7, cmpU) | ||
VMASKCMP_SVE_IMM_L(immLU7, cmpU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And ConLNode for the second marco.
define(`VMASKCMP_SVE_IMM_I', ` | ||
instruct vmask$2_immI_sve(pReg dst, vReg src, $1 imm, immI_$2_cond cond, rFlagsReg cr) %{ | ||
predicate(UseSVE > 0); | ||
match(Set dst (VectorMaskCmp (Binary src (ReplicateB imm)) cond)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theRealAph
The ReplicateXNodes used in match rules are also different in these two marcos.
I think we needn't to merge these two marcos since this will introduce some if-else statements which will reduce the readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theRealAph The ReplicateXNodes used in match rules are also different in these two marcos. I think we needn't to merge these two marcos since this will introduce some if-else statements which will reduce the readability.
It won't introduce if-else, surely. Not if you make the parts that are different into parameters. Then a reviewer can immediately see which parts of the macros are actually different, rather than needing to do a deep study.;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for confusion. I will try to rewrite the marcos according to your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we merge these marcos like following, there will be more generated matching rules.
But I think it is trivial since the number of class in ad_aarch64.hpp will not change.
dnl VMASKCMP_SVE_IMM($1 , $2 , $3 , $4 )
dnl VMASKCMP_SVE_IMM(element_size, element_type, type_imm, type_condition)
define(`VMASKCMP_SVE_IMM', `
instruct vmask$4_imm$2_sve(pReg dst, vReg src, $3 imm, immI_$4_cond cond, rFlagsReg cr) %{
predicate(UseSVE > 0);
match(Set dst (VectorMaskCmp (Binary src (Replicate$2 imm)) cond));
effect(KILL cr);
format %{ "vmask$4_imm$2_sve $dst, $src, $imm, $cond\t# KILL cr" %}
ins_encode %{
Assembler::Condition condition = to_assembler_cond((BoolTest::mask)$cond$$constant);
uint length_in_bytes = Matcher::vector_length_in_bytes(this);
assert(length_in_bytes == MaxVectorSize, "invalid vector length");
__ sve_cmp(condition, $dst$$PRegister, __ $1, ptrue, $src$$FloatRegister, (int)$imm$$constant);
%}
ins_pipe(pipe_slow);
%}')dnl
VMASKCMP_SVE_IMM(B, B, immI5, cmp)
VMASKCMP_SVE_IMM(B, B, immIU7, cmpU)
VMASKCMP_SVE_IMM(H, S, immI5, cmp)
VMASKCMP_SVE_IMM(H, S, immIU7, cmpU)
VMASKCMP_SVE_IMM(S, I, immI5, cmp)
VMASKCMP_SVE_IMM(S, I, immIU7, cmpU)
VMASKCMP_SVE_IMM(D, L, immL5, cmp)
VMASKCMP_SVE_IMM(D, L, immLU7, cmpU)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! Sorry for the delay.
|
@changpeng1997 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 21 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph, @e1iu) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
/integrate |
@changpeng1997 |
/sponsor |
Going to push as commit 0dca573.
Your commit was automatically rebased without conflicts. |
@XiaohongGong @changpeng1997 Pushed as commit 0dca573. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)1 to avoid the extra scalar2vector operations.
The following instruction sequence
can be optimized to:
This patch does the following:
Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend.
SVE cmp(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to
127)or 5bit signed integer immediate (range from -16 to 15).
Add optimized match rules to generate the compare-with-immediate instructions.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13200/head:pull/13200
$ git checkout pull/13200
Update a local copy of the PR:
$ git checkout pull/13200
$ git pull https://git.openjdk.org/jdk.git pull/13200/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13200
View PR using the GUI difftool:
$ git pr show -t 13200
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13200.diff
Webrev
Link to Webrev Comment