Skip to content

8301739: AArch64: Add optimized rules for vector compare with immediate for SVE #13200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

changpeng1997
Copy link
Contributor

@changpeng1997 changpeng1997 commented Mar 28, 2023

We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)1 to avoid the extra scalar2vector operations.

The following instruction sequence

movi    v17.16b, #12
cmpgt   p0.b, p7/z, z16.b, z17.b

can be optimized to:

cmpgt   p0.b, p7/z, z16.b, #12

This patch does the following:

  1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend.
    SVE cmp(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to
    127)or 5bit signed integer immediate (range from -16 to 15).

  2. Add optimized match rules to generate the compare-with-immediate instructions.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8301739: AArch64: Add optimized rules for vector compare with immediate for SVE

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13200/head:pull/13200
$ git checkout pull/13200

Update a local copy of the PR:
$ git checkout pull/13200
$ git pull https://git.openjdk.org/jdk.git pull/13200/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13200

View PR using the GUI difftool:
$ git pr show -t 13200

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13200.diff

Webrev

Link to Webrev Comment

…te for SVE

We can use SVE compare-with-integer-immediate instructions like
cmpgt(immediate)[1] to avoid the extra scalar2vector
operations.

The following instruction sequence

```
movi    v17.16b, openjdk#12
cmpgt   p0.b, p7/z, z16.b, z17.b
```

can be optimized to:

```
cmpgt   p0.b, p7/z, z16.b, openjdk#12
```

This patch does the following:
1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend.

SVE cmp<cc>(immediate) instructions can support vector comparing with
7bit unsigned integer immediate (range from 0 to 127) or 5bit signed
integer immediate (range from -16 to 15).

2. Add optimized match rules to generate the compare-with-immediate
instructions.

[1]: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/CMP-cc---immediate---Compare-vector-to-immediate-

TEST_LABEL: v1 || n2, aarch64&&ubuntu&&conformance-enabled
JDK_SCOPE: hotspot:compiler/vectorapi, jdk:jdk/incubator/vector/

Jira: ENTLLT-5294
Change-Id: I6b915864308faf9a8ec6e35ca1b4948666d75dca
@bridgekeeper
Copy link

bridgekeeper bot commented Mar 28, 2023

👋 Welcome back changpeng1997! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 28, 2023
@openjdk
Copy link

openjdk bot commented Mar 28, 2023

@changpeng1997 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 28, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 28, 2023

Webrevs

@changpeng1997
Copy link
Contributor Author

@theRealAph
Hello. Could you please help to review this patch ? Thanks.

// BoolTest condition for unsigned compare
operand immI_cmpU_cond()
%{
predicate(n->get_int() > (int)(BoolTest::unsigned_compare));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is in x86.ad. I suggest you move it to shared code, then use it here.

static inline bool is_unsigned_booltest_pred(int bt) {
  return  ((bt & BoolTest::unsigned_compare) == BoolTest::unsigned_compare);
}

// 5 bit signed integer
operand immI5()
%{
predicate(((-(1 << 4)) <= n->get_int()) && (n->get_int() < (1 << 4)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point someone must realize that

((-(1 << size)) <= n->get_int()) && (n->get_int() < (1 << size))

could be a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point someone must realize that

((-(1 << size)) <= n->get_int()) && (n->get_int() < (1 << size))

could be a function.

I think this function can make this predicate more clearly, but I found there are many predicates to modify in aarch64.ad.
I suggest using an extra patch to add this function and modify all similar predicates.

Copy link
Member

@merykitty merykitty Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have AbstractAssembler::is_simm(int64_t, uint) which does exactly this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have AbstractAssembler::is_simm(int64_t, uint) which does exactly this.

Thanks.

Comment on lines 3795 to 3823
if (cond == HI || cond == HS || cond == LO || cond == LS) {
guarantee(0 <= imm && imm <= 127, "invalid immediate");
switch(cond) {
case HI: cond_op = 0b01; break;
case HS: cond_op = 0b00; break;
case LO: cond_op = 0b10; break;
case LS: cond_op = 0b11; break;
default:
ShouldNotReachHere();
}
f(0b00100100, 31, 24), f(T, 23, 22), f(0b1, 21), f(imm, 20, 14),
f((cond_op >> 1) & 0x1, 13), pgrf(Pg, 10), rf(Zn, 5);
f(cond_op & 0x1, 4), prf(Pd, 0);
} else {
guarantee(-16 <= imm && imm <= 15, "invalid immediate");
switch(cond) {
case EQ: cond_op = 0b1000; break;
case NE: cond_op = 0b1001; break;
case GE: cond_op = 0b0000; break;
case GT: cond_op = 0b0001; break;
case LE: cond_op = 0b0011; break;
case LT: cond_op = 0b0010; break;
default:
ShouldNotReachHere();
}
f(0b00100101, 31, 24), f(T, 23, 22), f(0b0, 21), sf(imm, 20, 16),
f((cond_op >> 1) & 0x7, 15, 13), pgrf(Pg, 10), rf(Zn, 5);
f(cond_op & 0x1, 4), prf(Pd, 0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire block of code needs extensive refactoring.

Please write a function from Condition -> int. Use a simple boolean is_unsigned. Extract the common code from the two arms of this if statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire block of code needs extensive refactoring.

Please write a function from Condition -> int. Use a simple boolean is_unsigned. Extract the common code from the two arms of this if statement.

Ok, thanks.

@@ -4318,7 +4318,18 @@ operand immI_positive()
// BoolTest condition for signed compare
operand immI_cmp_cond()
%{
predicate(n->get_int() < (int)(BoolTest::unsigned_compare));
predicate(!Matcher::is_unsigned_booltest_pred(n->get_int()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
predicate(!Matcher::is_unsigned_booltest_pred(n->get_int()));
predicate(! Matcher::is_unsigned_booltest_pred(n->get_int()));

%}
ins_pipe(pipe_slow);
%}')dnl
VMASKCMP_SVE_IMM_I(immI5, cmp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky to review because the two macros here seems to be almost, but not exactly, the same. Why is that?

Copy link
Contributor Author

@changpeng1997 changpeng1997 Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch adds rules for vector comparing with immediate. These immediate may have different types and this manifests as different ConNodes in middle-end (ConI and ConL). ConI and ConL have different methods to get the value, i.e., get_int() for ConI and get_long() for ConL. We should use this value in predicate, so I set two macros, one for integer (byte, short and int) and another one for long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please show me where these differences appear. I can't see them.

%}
ins_pipe(pipe_slow);
%}')dnl
VMASKCMP_SVE_IMM_I(immI5, cmp)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first marco, imm is ConINode.

VMASKCMP_SVE_IMM_I(immI5, cmp)
VMASKCMP_SVE_IMM_L(immL5, cmp)
VMASKCMP_SVE_IMM_I(immIU7, cmpU)
VMASKCMP_SVE_IMM_L(immLU7, cmpU)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ConLNode for the second marco.

define(`VMASKCMP_SVE_IMM_I', `
instruct vmask$2_immI_sve(pReg dst, vReg src, $1 imm, immI_$2_cond cond, rFlagsReg cr) %{
predicate(UseSVE > 0);
match(Set dst (VectorMaskCmp (Binary src (ReplicateB imm)) cond));
Copy link
Contributor Author

@changpeng1997 changpeng1997 Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theRealAph
The ReplicateXNodes used in match rules are also different in these two marcos.
I think we needn't to merge these two marcos since this will introduce some if-else statements which will reduce the readability.

Copy link
Contributor

@theRealAph theRealAph Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theRealAph The ReplicateXNodes used in match rules are also different in these two marcos. I think we needn't to merge these two marcos since this will introduce some if-else statements which will reduce the readability.

It won't introduce if-else, surely. Not if you make the parts that are different into parameters. Then a reviewer can immediately see which parts of the macros are actually different, rather than needing to do a deep study.;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusion. I will try to rewrite the marcos according to your comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge these marcos like following, there will be more generated matching rules.
But I think it is trivial since the number of class in ad_aarch64.hpp will not change.

dnl VMASKCMP_SVE_IMM($1          , $2          , $3      , $4            )
dnl VMASKCMP_SVE_IMM(element_size, element_type, type_imm, type_condition)
define(`VMASKCMP_SVE_IMM', `
instruct vmask$4_imm$2_sve(pReg dst, vReg src, $3 imm, immI_$4_cond cond, rFlagsReg cr) %{
  predicate(UseSVE > 0);
  match(Set dst (VectorMaskCmp (Binary src (Replicate$2 imm)) cond));
  effect(KILL cr);
  format %{ "vmask$4_imm$2_sve $dst, $src, $imm, $cond\t# KILL cr" %}
  ins_encode %{
    Assembler::Condition condition = to_assembler_cond((BoolTest::mask)$cond$$constant);
    uint length_in_bytes = Matcher::vector_length_in_bytes(this);
    assert(length_in_bytes == MaxVectorSize, "invalid vector length");
    __ sve_cmp(condition, $dst$$PRegister, __ $1, ptrue, $src$$FloatRegister, (int)$imm$$constant);
  %}
  ins_pipe(pipe_slow);
%}')dnl
VMASKCMP_SVE_IMM(B, B, immI5, cmp)
VMASKCMP_SVE_IMM(B, B, immIU7, cmpU)
VMASKCMP_SVE_IMM(H, S, immI5, cmp)
VMASKCMP_SVE_IMM(H, S, immIU7, cmpU)
VMASKCMP_SVE_IMM(S, I, immI5, cmp)
VMASKCMP_SVE_IMM(S, I, immIU7, cmpU)
VMASKCMP_SVE_IMM(D, L, immL5, cmp)
VMASKCMP_SVE_IMM(D, L, immLU7, cmpU)

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! Sorry for the delay.

@openjdk
Copy link

openjdk bot commented May 5, 2023

⚠️ @changpeng1997 the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout add_sve_cmpU
$ git commit --author='Preferred Full Name <you@example.com>' --allow-empty -m 'Update full name'
$ git push

@openjdk
Copy link

openjdk bot commented May 5, 2023

@changpeng1997 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8301739: AArch64: Add optimized rules for vector compare with immediate for SVE

Reviewed-by: aph, eliu

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 21 new commits pushed to the master branch:

  • 3d3eaed: 8306941: Open source several datatransfer and dnd AWT tests
  • 1f57ce0: 8307446: RISC-V: Improve performance of floating point to integer conversion
  • 4e4828e: 8307553: Remove dead code MetaspaceClosure::push_method_entry
  • 7d58978: 8280031: Deprecate GTK2 for removal
  • b5922c3: 8305846: Support compilation in Proc test utility
  • 73ac710: 8307425: Socket input stream read burns CPU cycles with back-to-back poll(0) calls
  • e2b1013: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects
  • 4386d42: 8307381: Open Source JFrame, JIF related Swing Tests
  • 27764e6: 8306583: Add JVM crash check in CDSTestUtils.executeAndLog
  • 6c71859: 6176679: Application freezes when copying an animated gif image to the system clipboard
  • ... and 11 more: https://git.openjdk.org/jdk/compare/3b430b9f732bc94674bf598c28162e2f5e62bae6...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph, @e1iu) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 5, 2023
Copy link
Member

@e1iu e1iu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@changpeng1997
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 6, 2023
@openjdk
Copy link

openjdk bot commented May 6, 2023

@changpeng1997
Your change (at version 1490562) is now ready to be sponsored by a Committer.

@XiaohongGong
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented May 6, 2023

Going to push as commit 0dca573.
Since your change was applied there have been 21 commits pushed to the master branch:

  • 3d3eaed: 8306941: Open source several datatransfer and dnd AWT tests
  • 1f57ce0: 8307446: RISC-V: Improve performance of floating point to integer conversion
  • 4e4828e: 8307553: Remove dead code MetaspaceClosure::push_method_entry
  • 7d58978: 8280031: Deprecate GTK2 for removal
  • b5922c3: 8305846: Support compilation in Proc test utility
  • 73ac710: 8307425: Socket input stream read burns CPU cycles with back-to-back poll(0) calls
  • e2b1013: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects
  • 4386d42: 8307381: Open Source JFrame, JIF related Swing Tests
  • 27764e6: 8306583: Add JVM crash check in CDSTestUtils.executeAndLog
  • 6c71859: 6176679: Application freezes when copying an animated gif image to the system clipboard
  • ... and 11 more: https://git.openjdk.org/jdk/compare/3b430b9f732bc94674bf598c28162e2f5e62bae6...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 6, 2023
@openjdk openjdk bot closed this May 6, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 6, 2023
@openjdk
Copy link

openjdk bot commented May 6, 2023

@XiaohongGong @changpeng1997 Pushed as commit 0dca573.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants