-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8349582: APX NDD code generation for OpenJDK #23501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back sparasa! A progress list of the required criteria for merging this PR into |
|
@vamsi-parasa This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 110 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja, @sviswa7, @eme64) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
@vamsi-parasa The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
jatin-bhateja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
| format %{ "eaddq $dst, $src1, $src2\t# long ndd" %} | ||
| ins_encode %{ | ||
| __ eaddq($dst$$Register, $src1$$Register, $src2$$Register, false); | ||
| %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your current scheme favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers belong legacy GPR set. We should extend the assembler layer to demote EEVEX to REX encoding if dst matches with source operands.
FYI, the processor backend also uses Move Elimination to prevent the dispatch of GPR to GPR moves to execution ports.
This can be used to further break NDD patterns with all different legacy register operands, GPR to GPR moves should consume no more than 3 bytes encoding 1 byte for REX prefix, 1 for opcode, and 1 for ModRM byte, still less than 4 byte prefix.
By biasing the allocation of definition towards non-intersecting source operands, we can enhance the chances of EEVEX to REX demotions.
| predicate(UseAPX); | ||
| match(Set dst (AddL src1 src2)); | ||
| effect(KILL cr); | ||
|
|
||
| format %{ "eaddq $dst, $src1, $src2\t# long ndd" %} | ||
| ins_encode %{ | ||
| __ eaddq($dst$$Register, $src1$$Register, $src2$$constant, false); | ||
| %} | ||
| ins_pipe( ialu_reg ); | ||
| %} | ||
|
|
||
| instruct addL_rReg_mem_imm_ndd(rRegL dst, memory src1, immL32 src2, rFlagsReg cr) | ||
| %{ | ||
| predicate(UseAPX); | ||
| match(Set dst (AddL (LoadL src1) src2)); | ||
| effect(KILL cr); | ||
|
|
||
| format %{ "eaddq $dst, $src1, $src2\t# long ndd" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you kindly share the impact of adding all these new patterns on libjvm.so size ?
I am curious to know the amount of ADLC generated code for all these new patterns.
src/hotspot/cpu/x86/x86_64.ad
Outdated
| int reg = ra_->get_encode(this); | ||
| if (reg > 15) { | ||
| return (offset < 0x80) ? 6 : 9; // REX2 | ||
| } else { | ||
| return (offset < 0x80) ? 5 : 8; // REX | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this out to a separate patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create the PR as you identified and fixed this issue? I will remove this code block once your updated BoxLock node fix is integrated into the mainline.
src/hotspot/cpu/x86/x86_64.ad
Outdated
| match(Set dst (CountTrailingZerosL (LoadL src))); | ||
| effect(KILL cr); | ||
|
|
||
| ins_cost(175); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please let me know why you're using ins_cost here and not in above pattern?
src/hotspot/cpu/x86/x86_64.ad
Outdated
| "eshrl $dst, $mem, markWord::klass_shift_at_offset\t# compressed klass ptr, shifted (ndd)\n\t" | ||
| %} | ||
| ins_encode %{ | ||
| __ eshrl($dst$$Register, $mem$$Address, markWord::klass_shift_at_offset, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change could be done as part of loadNKlassCompactHeaders instruct itself as there is no additional register needed.
Something like below:
if (UseAPX_ {
__ eshrl(
} else {
__ movl(
__ shrl(
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this suggestion incorporated in the updated code.
src/hotspot/cpu/x86/x86_64.ad
Outdated
| instruct countLeadingZerosI_nf(rRegI dst, rRegI src) %{ | ||
| predicate(UseAPX && UseCountLeadingZerosInstruction); | ||
| match(Set dst (CountLeadingZerosI src)); | ||
|
|
||
| format %{ "elzcntl $dst, $src\t# count leading zeros (int nf)" %} | ||
| ins_encode %{ | ||
| __ elzcntl($dst$$Register, $src$$Register, true); | ||
| %} | ||
| ins_pipe(ialu_reg); | ||
| %} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This instruct could be removed as this is already an unary operation with separate destination, Likewise other unary operator instructs could also be removed where the destination is already separate from source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ndd instructions epopcnt, elzcnt and etzcnt were removed as they have a separate destination. Please see the update code.
src/hotspot/cpu/x86/x86_64.ad
Outdated
| @@ -5551,6 +5583,17 @@ instruct countLeadingZerosI_mem(rRegI dst, memory src, rFlagsReg cr) %{ | |||
| ins_pipe(ialu_reg_mem); | |||
| %} | |||
|
|
|||
| instruct countLeadingZerosI_mem_nf(rRegI dst, memory src) %{ | |||
| predicate(UseAPX && UseCountLeadingZerosInstruction); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This instruct could be removed as this is already an unary operation with separate destination,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this suggestion incorporated in the updated code.
| # additional tests with rax as destination | ||
| if RegOp in [RegRegImmNddInstruction]: | ||
| test_reg1 = 'rax' | ||
| test_reg2 = random.choice(test_regs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to generate another test_reg2, we could use the same from above. Thereby, we will not modify the existing tests and only have the new test instructions added in asmtest.out.h.
src/hotspot/cpu/x86/x86_64.ad
Outdated
| ins_pipe(ialu_reg); | ||
| %} | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit pick, unnecessary extra blank lines :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the additional blank lines removed in the updated code.
| %} | ||
|
|
||
|
|
||
| instruct cmovI_regUCF2_ne(cmpOpUCF2 cop, rFlagsRegUCF cr, rRegI dst, rRegI src) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cmovI_regUCF2_ne, cmovl_regUCF2_eq, cmovP_regUCF2_ne, cmovP_regUCF2_eq, cmovL_regUCF2_ne, cmovL_regUCF2_eq instructs could also use the ecmovl() instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the added NDD versions for cmovI_regUCF2_ne, cmovl_regUCF2_eq, cmovP_regUCF2_ne, cmovP_regUCF2_eq, cmovL_regUCF2_ne, cmovL_regUCF2_eq.
| %{ | ||
| predicate(UseAPX); | ||
| match(Set dst (AddI src1 src2)); | ||
| effect(KILL cr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also bring in the corresponding flag(PD::...); line from instruct addI_rReg in this and other rules where applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the updated code with flags(PD::...)
src/hotspot/cpu/x86/x86_64.ad
Outdated
| ins_pipe(ialu_reg); | ||
| %} | ||
|
|
||
| instruct negI_rReg_ndd(rRegI src, rRegI dst, immI_0 zero, rFlagsReg cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit pick in many of the new negI/negL instructs, we usually list the dst first in instruct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the order of dst and src fixed.
src/hotspot/cpu/x86/x86_64.ad
Outdated
| %} | ||
|
|
||
| // Arithmetic Shift Right by variable | ||
| instruct sarI_rReg_CL_ndd(rRegI dst, rRegI src, rcx_RegI shift, rFlagsReg cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new instructs sarI_rReg_CL_ndd, shrI_rReg_CL_ndd, salL_rReg_CL_ndd, sarL_rReg_CL_ndd, shrL_rReg_CL_ndd could be removed and the original !bmi2 versions could be kept. We dont need to optimize with APX instructions for non bmi2 platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the updated code which removed NDD support for non bmi2 platforms.
| ins_pipe(ialu_reg_mem); | ||
| %} | ||
|
|
||
| instruct andL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that we only need one of andL_rReg_rReg_mem_ndd & andL_rReg_mem_rReg_ndd as and is a commutative operator. Likewise for other commutative operators like or, add, xor, mul, etc.
| // Or Instructions | ||
| // Or Register with Register | ||
| instruct orI_rReg(rRegI dst, rRegI src, rFlagsReg cr) | ||
| instruct orI_rReg_imm_rReg_ndd(rRegI dst, immI src1, rRegI src2, rFlagsReg cr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that we only need one of orI_rReg_rReg_imm_ndd or orI_rReg_imm_rReg_ndd as orI is a commutative operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After doing a quick test, it was noticed that both the rules (orI_rReg_rReg_imm and orI_rReg_imm_rReg) are needed to generate NDD instruction depending on the position of the immediate value.
src/hotspot/cpu/x86/x86_64.ad
Outdated
| instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) | ||
| %{ | ||
| predicate(!VM_Version::supports_bmi2() && n->bottom_type()->basic_type() == T_INT); | ||
| predicate(!UseAPX && !VM_Version::supports_bmi2() && n->bottom_type()->basic_type() == T_INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to change anything for non bmi2 platforms. The original predicate can be kept as is. This applies to all rorI, rolI, rorL, rolL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the updated code removing rorI, rolI, rorL, rolL for non bmi2 platforms.
sviswa7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
Hi Vladimir (@vnkozlov), Could you please review this PR? Thanks, |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick look over this. It's a bit hard to review for me, because it is basically about specific APX instructions. We probably have to heavily rely on testing. But APX hardware is not yet available, right?
How can be best test this? Is there any way to emulate, maybe using SDE? What testing did you run for this?
src/hotspot/cpu/x86/x86_64.ad
Outdated
| predicate(UseAPX); | ||
| match(Set dst (CMoveI (Binary cop cr) (Binary src1 src2))); | ||
|
|
||
| ins_cost(200); // XXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the XXX stand for?
|
|
||
| // Conditional move | ||
| instruct cmovI_rReg_rReg_mem_ndd(rRegI dst, cmpOp cop, rFlagsReg cr, rRegI src1, memory src2) | ||
| %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this bracket usually not on the previous line?
|
@vamsi-parasa I tried to launch testing, but my script fails because of some merge issue. Would you mind merging from master? |
|
@vamsi-parasa Testing looks good / no related test failures. |
| registers_mapping = { | ||
| # skip rax, rsi, rdi, rsp, rbp as they have special encodings | ||
| # 'rax': {64: 'rax', 32: 'eax', 16: 'ax', 8: 'al'}, | ||
| 'rax': {64: 'rax', 32: 'eax', 16: 'ax', 8: 'al'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably update the copyright year, right?
Also: is this script ever run in testing? I didn't know we had python files in the repository 😅
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot really review he content, nor can I really test the APX features.
But I also cannot directly see any issues with it.
Therefore, I'll approve.
Once the hardware is available, we will probably discover some issues and have to fix them then.
Thanks @vamsi-parasa for the work!
Thank you very much Emanuel! (@eme64) :) |
Thank You Emanuel(@eme64) for doing the review, testing and approving the PR! |
|
/integrate |
|
@vamsi-parasa This pull request has not yet been marked as ready for integration. |
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approved :)
Thank you, Emanuel! :) |
|
/sponsor |
|
@sviswa7 The change author (@vamsi-parasa) must issue an |
|
/integrate |
|
@vamsi-parasa |
|
/sponsor |
|
Going to push as commit c87e1be.
Your commit was automatically rebased without conflicts. |
|
@sviswa7 @vamsi-parasa Pushed as commit c87e1be. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
|
This caused a regression: JDK-8356281 @vamsi-parasa Could you please have a look? |
Hi Tobias, thank you for letting us know. I have been working on a fix for this issue over the last few days. The root cause has been figured out and it will be addressed soon. |
The goal of this PR is to generate x86 code using Intel Advanced Performance Extensions (APX) instructions which doubles the number of general-purpose registers, from 16 to 32. Intel APX adds nondestructive destination (NDD) and no flags (NF) flavor for the scalar instructions through EVEX encoding.
For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23501/head:pull/23501$ git checkout pull/23501Update a local copy of the PR:
$ git checkout pull/23501$ git pull https://git.openjdk.org/jdk.git pull/23501/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 23501View PR using the GUI difftool:
$ git pr show -t 23501Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23501.diff
Using Webrev
Link to Webrev Comment