Skip to content

Conversation

@changpeng1997
Copy link
Contributor

@changpeng1997 changpeng1997 commented Jan 3, 2023

We can use the compare-with-zero instructions like cmgt(zero)1 immediately to avoid the extra scalar2vector operations.

The following instruction sequence

movi  v16.4s, #0x0
cmgt  v16.4s, v17.4s, v16.4s

can be optimized to:

cmgt v16.4s, v17.4s, #0x0

This patch does the following:

  1. Add NEON floating-point compare-with-zero instructions.
  2. Add optimized match rules to generate the compare-with-zero instructions.

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8297753: AArch64: Add optimized rules for vector compare with zero on NEON

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11822/head:pull/11822
$ git checkout pull/11822

Update a local copy of the PR:
$ git checkout pull/11822
$ git pull https://git.openjdk.org/jdk pull/11822/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11822

View PR using the GUI difftool:
$ git pr show -t 11822

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11822.diff

… NEON

We can use the compare-with-zero instructions like cmgt(zero)[1]
immediately to avoid the extra scalar2vector operations.

The following instruction sequence
```
movi  v16.4s, #0x0
cmgt  v16.4s, v17.4s, v16.4s
```
can be optimized to:
```
cmgt v16.4s, v17.4s, #0x0
```
This patch does the following:
1. Add NEON floating-point compare-with-zero instructions.
2. Add optimized match rules to generate the compare-with-zero
instructions.

[1]: https://developer.arm.com/documentation/ddi0602/2022-06/SIMD-FP-Instructions/CMGT--zero---Compare-signed-Greater-than-zero--vector--

Change-Id: If026b477a0cad809bd201feafbfc9ab301a1b569
@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label Jan 3, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented Jan 3, 2023

Hi @changpeng1997, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user changpeng1997" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented Jan 3, 2023

@changpeng1997 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jan 3, 2023
@changpeng1997
Copy link
Contributor Author

/covered

@bridgekeeper bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label Jan 3, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented Jan 3, 2023

Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

@bridgekeeper bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels Jan 27, 2023
@openjdk
Copy link

openjdk bot commented Jan 27, 2023

@changpeng1997 this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout add_cmp0_neon
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch rfr Pull request is ready for review labels Jan 27, 2023
@mlbridge
Copy link

mlbridge bot commented Jan 27, 2023

Webrevs

…mtest.out.h

Change-Id: I896b879c8b7097a99e35fc1e53abab646240281a
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Jan 28, 2023
INSN(fcvtas, 0, 0b00, 0b01, 0b11100);
INSN(fcvtzs, 0, 0b10, 0b01, 0b11011);
INSN(fcvtms, 0, 0b00, 0b01, 0b11011);
INSN(fcmgt, 0, 0b10, 0b01, 0b01100); // Floating-point compare greater than zero (vector)
Copy link
Contributor

@theRealAph theRealAph Jan 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you were to make this fcm(Condition cond, ... rather than having separate definitions for each condition it might make the code simpler and shorter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this would make the code much more simpler. But I was wondering if the function name in assembler_aarch64.hpp should align with ISA definition. 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this would make the code much more simpler. But I was wondering if the function name in assembler_aarch64.hpp should align with ISA definition.

I take your point, but we've never been tied to the ISA definition. And here, as you note, it'd clean up a lot of stuff.

Comment on lines 973 to 976
case BoolTest::ge: fcm(Assembler::GE, dst, size, src); break;
case BoolTest::gt: fcm(Assembler::GT, dst, size, src); break;
case BoolTest::le: fcm(Assembler::LE, dst, size, src); break;
case BoolTest::lt: fcm(Assembler::LT, dst, size, src); break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key to this problem of endless switch statements is a function from BoolTest cond to Assembler::Condition.

Such a function is cmpOpOper(BoolTest::overflow).ccode() .

Please use it everywhere a BoolTest needs to be converted to a Condition.

Copy link
Contributor Author

@changpeng1997 changpeng1997 Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this usage, I didn't know that before. I found that cmpOpOper is generated by cmpOp operand defined in aarch64.ad, and it is able to convert input BoolTest condition and Assembler condition. However, the vector compare operand in VectorMaskCmp is ConINode, while cmpOp operand can only match BoolNode. How about adding a new standalone function like x86

static inline Assembler::ComparisonPredicate booltest_pred_to_comparison_pred(int bt) {
to convert from BoolTest to Assembler::Condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use cmpOpOper by following way in aarch64_vector.ad:

Assembler::Condition condition = (Assembler::Condition)(cmpOpOper((BoolTest::mask) (int)($cond$$constant)).ccode());

but I think this code style is a little ugly and cmpOpOper should be used as a operand but not a utility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not helpful to replicate the logic in cmpOpOper::ccode. Neither is it at all helpful that the BoolTest::mask is passed as an int. I guess it'd be OK to create a function, and add a comment that it replicates the logic in cmpOpOper::ccode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, I agree that where there's existing code to do this in x86 we should copy its structure.

@changpeng1997 changpeng1997 marked this pull request as draft February 20, 2023 09:40
@openjdk openjdk bot removed the rfr Pull request is ready for review label Feb 20, 2023
@changpeng1997 changpeng1997 marked this pull request as ready for review February 20, 2023 10:23
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 20, 2023
@changpeng1997
Copy link
Contributor Author

@theRealAph Could you please help to review this patch? Thanks.

Comment on lines 973 to 983
switch (cond) {
case Assembler::EQ: cmeq(dst, size, src); break;
case Assembler::NE: {
cmeq(dst, size, src);
notr(dst, isQ ? T16B : T8B, dst);
break;
}
case Assembler::GE: cmge(dst, size, src); break;
case Assembler::GT: cmgt(dst, size, src); break;
case Assembler::LE: cmle(dst, size, src); break;
case Assembler::LT: cmlt(dst, size, src); break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch (cond) {
case Assembler::EQ: cmeq(dst, size, src); break;
case Assembler::NE: {
cmeq(dst, size, src);
notr(dst, isQ ? T16B : T8B, dst);
break;
}
case Assembler::GE: cmge(dst, size, src); break;
case Assembler::GT: cmgt(dst, size, src); break;
case Assembler::LE: cmle(dst, size, src); break;
case Assembler::LT: cmlt(dst, size, src); break;
switch (cond) {
case Assembler::NE: {
cm(EQ, dst, size, src);
notr(dst, isQ ? T16B : T8B, dst);
break;
}
case Assembler::EQ:
case Assembler::GE:
case Assembler::GT:
case Assembler::LE:
case Assembler::LT: cm(cond, dst, size, src); break;
``` ...etc.


// Convert BootTest condition to Assembler condition.
// Replicate the logic of cmpOpOper::ccode() and cmpOpUOper::ccode().
Assembler::Condition booltest_cond_to_assembler_cond(BoolTest::mask cond);
Copy link
Contributor

@theRealAph theRealAph Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assembler::Condition booltest_cond_to_assembler_cond(BoolTest::mask cond);
Assembler::Condition to_assembler_cond(BoolTest::mask cond);
``` ...because we already know the type of the arg.

Comment on lines 2535 to 2538
// Replicate the logic of cmpOpOper::ccode() and cmpOpUOper::ccode().
Assembler::Condition booltest_cond_to_assembler_cond(BoolTest::mask cond) {
switch(cond) {
case BoolTest::eq:
Copy link
Contributor

@theRealAph theRealAph Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assert(cmpOpOper(cond).ccode() == result) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this assert will guarantee the correctness of this function.

I found that cmpOpUOper convert signed BoolTest conditions (gt, ge, lt, le) to unsigned Assembler conditions, but unsigned vector comparison in vector API will produce unsigned BoolTest conditions(uge, ugt, ult, ule), and these conditons cannot be passed like following:

assert(cmpOpUOper(unsigned_cond).ccode() == result, "Invalid conversion");

Maybe we will meet some issues when taking unsigned vector comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That sounds like a bug, but OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following is the code of ccode() of cmpOpUOper:

virtual int ccode() const { 
    switch (_c0) {
    case  BoolTest::eq : return equal();
    case  BoolTest::gt : return greater();
    case  BoolTest::lt : return less();
    case  BoolTest::ne : return not_equal();
    case  BoolTest::le : return less_equal();
    case  BoolTest::ge : return greater_equal();
    case  BoolTest::overflow : return overflow();
    case  BoolTest::no_overflow: return no_overflow();
    default : ShouldNotReachHere(); return 0;
    }
  };

I have another patch working on enabling SVE vector unsigned comparison, if we use assert(cmpOpUOper(unsigned_cond).ccode() == result, "Invalid conversion");, the code will enter
ShouldNotReachHere().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but all you have to do then is something like

if (cond & BoolTest::unsigned_compare)
  assert( cmpOpUOper(cond & something).ccode() == result)
else
  assert( cmpOpOper(cond).ccode() == result)

surely?

@changpeng1997 changpeng1997 requested a review from theRealAph March 3, 2023 02:54
Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! That is beautiful.

I felt a bit bad about pushing you so hard on this, but I think the quality of the result justifies the effort. I hope you agree.

Thank you.

@openjdk
Copy link

openjdk bot commented Mar 3, 2023

⚠️ @changpeng1997 the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout add_cmp0_neon
$ git commit --author='Preferred Full Name <you@example.com>' --allow-empty -m 'Update full name'
$ git push

@openjdk
Copy link

openjdk bot commented Mar 3, 2023

@changpeng1997 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8297753: AArch64: Add optimized rules for vector compare with zero on NEON

Reviewed-by: aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • e1745bc: 8303473: Add implied {@code} in java.lang.invoke.MethodHandles

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 3, 2023
@changpeng1997
Copy link
Contributor Author

Alright! That is beautiful.

I felt a bit bad about pushing you so hard on this, but I think the quality of the result justifies the effort. I hope you agree.

Thank you.

Thanks for your review.

@changpeng1997
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 3, 2023
@openjdk
Copy link

openjdk bot commented Mar 3, 2023

@changpeng1997
Your change (at version 1d1458c) is now ready to be sponsored by a Committer.

@adinn
Copy link
Contributor

adinn commented Mar 3, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 3, 2023

Going to push as commit d23a8bf.
Since your change was applied there have been 2 commits pushed to the master branch:

  • 339ca88: 8303539: javadoc typos in Attr
  • e1745bc: 8303473: Add implied {@code} in java.lang.invoke.MethodHandles

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 3, 2023
@openjdk openjdk bot closed this Mar 3, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 3, 2023
@openjdk
Copy link

openjdk bot commented Mar 3, 2023

@adinn @changpeng1997 Pushed as commit d23a8bf.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants