-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8297753: AArch64: Add optimized rules for vector compare with zero on NEON #11822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… NEON We can use the compare-with-zero instructions like cmgt(zero)[1] immediately to avoid the extra scalar2vector operations. The following instruction sequence ``` movi v16.4s, #0x0 cmgt v16.4s, v17.4s, v16.4s ``` can be optimized to: ``` cmgt v16.4s, v17.4s, #0x0 ``` This patch does the following: 1. Add NEON floating-point compare-with-zero instructions. 2. Add optimized match rules to generate the compare-with-zero instructions. [1]: https://developer.arm.com/documentation/ddi0602/2022-06/SIMD-FP-Instructions/CMGT--zero---Compare-signed-Greater-than-zero--vector-- Change-Id: If026b477a0cad809bd201feafbfc9ab301a1b569
|
Hi @changpeng1997, welcome to this OpenJDK project and thanks for contributing! We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user changpeng1997" as summary for the issue. If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing |
|
@changpeng1997 The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
/covered |
|
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
|
@changpeng1997 this pull request can not be integrated into git checkout add_cmp0_neon
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
…mtest.out.h Change-Id: I896b879c8b7097a99e35fc1e53abab646240281a
| INSN(fcvtas, 0, 0b00, 0b01, 0b11100); | ||
| INSN(fcvtzs, 0, 0b10, 0b01, 0b11011); | ||
| INSN(fcvtms, 0, 0b00, 0b01, 0b11011); | ||
| INSN(fcmgt, 0, 0b10, 0b01, 0b01100); // Floating-point compare greater than zero (vector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you were to make this fcm(Condition cond, ... rather than having separate definitions for each condition it might make the code simpler and shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this would make the code much more simpler. But I was wondering if the function name in assembler_aarch64.hpp should align with ISA definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this would make the code much more simpler. But I was wondering if the function name in assembler_aarch64.hpp should align with ISA definition.
I take your point, but we've never been tied to the ISA definition. And here, as you note, it'd clean up a lot of stuff.
| case BoolTest::ge: fcm(Assembler::GE, dst, size, src); break; | ||
| case BoolTest::gt: fcm(Assembler::GT, dst, size, src); break; | ||
| case BoolTest::le: fcm(Assembler::LE, dst, size, src); break; | ||
| case BoolTest::lt: fcm(Assembler::LT, dst, size, src); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key to this problem of endless switch statements is a function from BoolTest cond to Assembler::Condition.
Such a function is cmpOpOper(BoolTest::overflow).ccode() .
Please use it everywhere a BoolTest needs to be converted to a Condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this usage, I didn't know that before. I found that cmpOpOper is generated by cmpOp operand defined in aarch64.ad, and it is able to convert input BoolTest condition and Assembler condition. However, the vector compare operand in VectorMaskCmp is ConINode, while cmpOp operand can only match BoolNode. How about adding a new standalone function like x86
jdk/src/hotspot/cpu/x86/x86.ad
Line 2498 in e245620
| static inline Assembler::ComparisonPredicate booltest_pred_to_comparison_pred(int bt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use cmpOpOper by following way in aarch64_vector.ad:
Assembler::Condition condition = (Assembler::Condition)(cmpOpOper((BoolTest::mask) (int)($cond$$constant)).ccode());
but I think this code style is a little ugly and cmpOpOper should be used as a operand but not a utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not helpful to replicate the logic in cmpOpOper::ccode. Neither is it at all helpful that the BoolTest::mask is passed as an int. I guess it'd be OK to create a function, and add a comment that it replicates the logic in cmpOpOper::ccode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, I agree that where there's existing code to do this in x86 we should copy its structure.
This reverts commit d899238.
unsigned comparison.
|
@theRealAph Could you please help to review this patch? Thanks. |
| switch (cond) { | ||
| case Assembler::EQ: cmeq(dst, size, src); break; | ||
| case Assembler::NE: { | ||
| cmeq(dst, size, src); | ||
| notr(dst, isQ ? T16B : T8B, dst); | ||
| break; | ||
| } | ||
| case Assembler::GE: cmge(dst, size, src); break; | ||
| case Assembler::GT: cmgt(dst, size, src); break; | ||
| case Assembler::LE: cmle(dst, size, src); break; | ||
| case Assembler::LT: cmlt(dst, size, src); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| switch (cond) { | |
| case Assembler::EQ: cmeq(dst, size, src); break; | |
| case Assembler::NE: { | |
| cmeq(dst, size, src); | |
| notr(dst, isQ ? T16B : T8B, dst); | |
| break; | |
| } | |
| case Assembler::GE: cmge(dst, size, src); break; | |
| case Assembler::GT: cmgt(dst, size, src); break; | |
| case Assembler::LE: cmle(dst, size, src); break; | |
| case Assembler::LT: cmlt(dst, size, src); break; | |
| switch (cond) { | |
| case Assembler::NE: { | |
| cm(EQ, dst, size, src); | |
| notr(dst, isQ ? T16B : T8B, dst); | |
| break; | |
| } | |
| case Assembler::EQ: | |
| case Assembler::GE: | |
| case Assembler::GT: | |
| case Assembler::LE: | |
| case Assembler::LT: cm(cond, dst, size, src); break; | |
| ``` ...etc. |
src/hotspot/cpu/aarch64/aarch64.ad
Outdated
|
|
||
| // Convert BootTest condition to Assembler condition. | ||
| // Replicate the logic of cmpOpOper::ccode() and cmpOpUOper::ccode(). | ||
| Assembler::Condition booltest_cond_to_assembler_cond(BoolTest::mask cond); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Assembler::Condition booltest_cond_to_assembler_cond(BoolTest::mask cond); | |
| Assembler::Condition to_assembler_cond(BoolTest::mask cond); | |
| ``` ...because we already know the type of the arg. |
| // Replicate the logic of cmpOpOper::ccode() and cmpOpUOper::ccode(). | ||
| Assembler::Condition booltest_cond_to_assembler_cond(BoolTest::mask cond) { | ||
| switch(cond) { | ||
| case BoolTest::eq: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assert(cmpOpOper(cond).ccode() == result) here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this assert will guarantee the correctness of this function.
I found that cmpOpUOper convert signed BoolTest conditions (gt, ge, lt, le) to unsigned Assembler conditions, but unsigned vector comparison in vector API will produce unsigned BoolTest conditions(uge, ugt, ult, ule), and these conditons cannot be passed like following:
assert(cmpOpUOper(unsigned_cond).ccode() == result, "Invalid conversion");
Maybe we will meet some issues when taking unsigned vector comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That sounds like a bug, but OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following is the code of ccode() of cmpOpUOper:
virtual int ccode() const {
switch (_c0) {
case BoolTest::eq : return equal();
case BoolTest::gt : return greater();
case BoolTest::lt : return less();
case BoolTest::ne : return not_equal();
case BoolTest::le : return less_equal();
case BoolTest::ge : return greater_equal();
case BoolTest::overflow : return overflow();
case BoolTest::no_overflow: return no_overflow();
default : ShouldNotReachHere(); return 0;
}
};
I have another patch working on enabling SVE vector unsigned comparison, if we use assert(cmpOpUOper(unsigned_cond).ccode() == result, "Invalid conversion");, the code will enter
ShouldNotReachHere().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but all you have to do then is something like
if (cond & BoolTest::unsigned_compare)
assert( cmpOpUOper(cond & something).ccode() == result)
else
assert( cmpOpOper(cond).ccode() == result)
surely?
theRealAph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright! That is beautiful.
I felt a bit bad about pushing you so hard on this, but I think the quality of the result justifies the effort. I hope you agree.
Thank you.
|
|
@changpeng1997 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 1 new commit pushed to the Please see this link for an up-to-date comparison between the source branch of this pull request and the As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@theRealAph) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Thanks for your review. |
|
/integrate |
|
@changpeng1997 |
|
/sponsor |
|
@adinn @changpeng1997 Pushed as commit d23a8bf. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
We can use the compare-with-zero instructions like cmgt(zero)1 immediately to avoid the extra scalar2vector operations.
The following instruction sequence
can be optimized to:
This patch does the following:
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11822/head:pull/11822$ git checkout pull/11822Update a local copy of the PR:
$ git checkout pull/11822$ git pull https://git.openjdk.org/jdk pull/11822/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 11822View PR using the GUI difftool:
$ git pr show -t 11822Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11822.diff