Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8328998: Encoding support for Intel APX extended general-purpose registers #18476

Closed
wants to merge 35 commits into from

Conversation

steveatgh
Copy link
Contributor

@steveatgh steveatgh commented Mar 25, 2024

Add instruction encoding support for Intel APX extended general-purpose registers:

Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html.

By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8328998: Encoding support for Intel APX extended general-purpose registers (Sub-task - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476
$ git checkout pull/18476

Update a local copy of the PR:
$ git checkout pull/18476
$ git pull https://git.openjdk.org/jdk.git pull/18476/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18476

View PR using the GUI difftool:
$ git pr show -t 18476

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18476.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 25, 2024

👋 Welcome back steveatgh! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 25, 2024

@steveatgh This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8328998: Encoding support for Intel APX extended general-purpose registers

Reviewed-by: kvn, sviswanathan, jbhateja

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 247 new commits pushed to the master branch:

  • ddd73b4: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active
  • 0a9d1f8: 8332749: Broken link in MemorySegment.Scope.html
  • c9a7b97: 8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs
  • 7fd9d6c: 8332340: Add JavacBench as a test case for CDS
  • 417d174: 8331348: Some incremental builds deposit files in the make directory
  • 303ac9f: 8332671: Logging for pretouching thread stacks shows wrong memory range
  • 90758f6: 8332808: Always set java.io.tmpdir to a suitable value in the build
  • e19a421: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array'
  • 2581935: 8332528: Generate code in SwitchBootstraps.generateTypeSwitch that require fewer adaptations
  • b890336: 8328083: degrade virtual thread support for GetObjectMonitorUsage
  • ... and 237 more: https://git.openjdk.org/jdk/compare/6f98d8f58f98827ae454c7ce4839de4071d95767...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @sviswa7, @jatin-bhateja, @eme64) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Mar 25, 2024

@steveatgh The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 25, 2024
@jatin-bhateja
Copy link
Member

jatin-bhateja commented Mar 26, 2024

Hi @steveatgh ,
Please use newly created JBS entry JDK-8328998 for this draft PR.

@steveatgh steveatgh changed the title Add instruction encoding support for APX extended general-purpose registers 8328998: Encoding support for Intel APX extended general-purpose registers Mar 26, 2024
- updates for 32-bit build
- make emit_opcode_prefix_and_encoding functions active for 32-bit and 64-bit builds
@steveatgh steveatgh marked this pull request as ready for review April 2, 2024 01:02
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 2, 2024
@dean-long
Copy link
Member

How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis?

@eme64
Copy link
Contributor

eme64 commented Apr 15, 2024

Also: what if the UseAPX is enabled, but the hardware does not support the feature? Don't we usually then automatically disable the flag, if the feature is not present? We do that with UseAVX for example.

@eme64
Copy link
Contributor

eme64 commented Apr 15, 2024

And I agree with @dean-long : you need to have some test for this. At least some test should have the flag enabled. Something that stresses the registers, and verifies the results could be an idea.

Also: I suspect you would want to put this flag into the IR-framework whitelist, since it has no effect on the IR, right?

@eme64
Copy link
Contributor

eme64 commented Apr 15, 2024

Can the APX features be simulated, maybe even with SDE?

Now you made the flag EXPERIMENTAL and by default false. What is the roadmap with this? It is generally not great to have default false flags, because the code underneath will just slowly rot and become broken. Is there a plan to eventually make it default true? What stops us from doing that already now?

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have few comments.

Comment on lines +668 to +669
assert(((base_enc & 0x7) != 4), "illegal addressing mode");
if (disp == 0 && no_relocation && ((base_enc & 0x7) != 5)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We loost information with this change. Can it be done as is_r13_encoding(base_enc) and is_r12_enxoding(base_enc)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. The "& 0x7" style was suggested to me by @sviswa7 as a efficient way to check for r12, r20, r28 in the assert, and for r13, r21, r29 in the if statement. I originally was comparing against each new APX register encoding. The style in the PR is concise but it can be done either way. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - it is for few registers check now. What is common between rsp, r12, r20, r28 registers (except encoding)?
R12 is used for heap base in compressed oops and RSP is RSP. What are r20 and r28? Why they can't be used in this addressing mode?

Please add comments for all lines where you replaced checks for r*->encoding() to say for which registers you do a check and why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason registers 12, 20, and 28 are asserted out of that else at line 668 is they are handled earlier in an else around line 646.

} else if ((base_enc & 0x7) == 4) { // [rsp + disp]

I added a comment to this effect and have also added comments in the 3 other places in the function where the replacement was done, indicating the registers involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

InstructionAttr *attributes);

int vex_prefix_and_encode(int dst_enc, int nds_enc, int src_enc,
VexSimdPrefix pre, VexOpcode opc,
InstructionAttr *attributes);
InstructionAttr *attributes, bool src_is_gpr = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a lot of usage of this method with src_is_gpr is true. What is actual most common case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vnkozlov. I believe false is the most common case. If I counted right, I see 400+ calls total and 37 calls with src_is_gpr set to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Thank you for checking.


void simd_prefix(XMMRegister xreg, XMMRegister nds, Address adr, VexSimdPrefix pre,
VexOpcode opc, InstructionAttr *attributes);

int simd_prefix_and_encode(XMMRegister dst, XMMRegister nds, XMMRegister src, VexSimdPrefix pre,
VexOpcode opc, InstructionAttr *attributes);
VexOpcode opc, InstructionAttr *attributes, bool src_is_gpr = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for vex_prefix_and_encode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vnkozlov. I believe false is the most common case here too. Again, if my counts are close, I see ~180 calls total and 12 calls with src_is_gpr = true.


"mitigations for the Intel JCC erratum") \
\
product(bool, UseAPX, false, EXPERIMENTAL, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing to \

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveatgh, do you plan to add code to vm_version_x86.* to check for presence of AVX and setting this flag accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vnkozlov.
Spacing fixed locally.
Regarding the UseAPX flag, yes, a subsequent PR (see JDK-8329030) will tie the logic of the flag in with querying the hardware features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@steveatgh
Copy link
Contributor Author

steveatgh commented Apr 16, 2024

Thank you @dean-long for the comment. I agree, tests are needed. Up to this point we have not had a separate formal tool to test encoding of x86. I did a lot of manual testing by adding loops that used r0-r31in different addressing patterns. I put these in a stub file that would be compiled by hotspot but not executed. I manually compared the disassembly of that against the output of similar assembly included in a small C program and run on the SDE. This worked pretty well for debugging but the manual aspect of it makes it error-prone and it takes a lot of time, too much time if iterating an implementation.

Subsequent pull requests will add encoding support for additional APX instructions (e.g. those using New Data Destination). Maybe one of these PRs can include a tool for testing instruction encoding for APX features. What do you think?

@steveatgh
Copy link
Contributor Author

Thank you @eme64 for the comments. The functionality of the UseAPX flag is, as you point out, incomplete in this pull request. A subsequent PR (see JDK-8329030) will tie the logic of the flag in with a query of the hardware features. It was added in this PR thinking it could be useful for testing or debugging the encoding functionality.

void Assembler::prefix(Address adr, Register reg, bool byteinst) {
void Assembler::prefix_rex2(Address adr, bool is_map1) {
int bits = is_map1 ? REX2BIT_M0 : 0;
bits |= get_base_prefix_bits(adr.base()->encoding());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bits |= get_base_prefix_bits(adr.base()->encoding());
bits |= get_base_prefix_bits(adr.base());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per section 3.7.5 of Intel SDM (Index ∗ Scale) + Displacement is a valid addressing mode. Thus we should set the bits corresponding to extended base register encoding only if its a valid register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jatin-bhateja. I've made the change.


void Assembler::prefix_rex2(Address adr, Register reg, bool byteinst, bool is_map1) {
int bits = is_map1 ? REX2BIT_M0 : 0;
bits |= get_base_prefix_bits(adr.base()->encoding());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs fix to:
bits |= get_base_prefix_bits(adr.base());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sviswa7, made the change.

@@ -635,7 +644,8 @@ void Assembler::emit_operand_helper(int reg_enc, int base_enc, int index_enc,
scale, index_enc, base_enc);
emit_data(disp, rspec, disp32_operand);
}
} else if (base_enc == rsp->encoding() LP64_ONLY(|| base_enc == r12->encoding())) {
} else if ((base_enc & 0x7) == 4) {
// rbp | r12 | r20 | r28
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be:
// rsp | r12 | r20 | r28

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

assert(base_enc != rsp->encoding() LP64_ONLY(&& base_enc != r12->encoding()), "illegal addressing mode");
if (disp == 0 && no_relocation &&
base_enc != rbp->encoding() LP64_ONLY(&& base_enc != r13->encoding())) {
// !(rbp | r12 | r20 | r28) were handled above
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment should be:
// (rsp | r12 | r20 | r28) were handled above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

@@ -3550,28 +3617,28 @@ void Assembler::movq(Register dst, XMMRegister src) {
NOT_LP64(assert(VM_Version::supports_sse2(), ""));
InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false);
// swap src/dst to get correct prefix
int encode = simd_prefix_and_encode(src, xnoreg, as_XMMRegister(dst->encoding()), VEX_SIMD_66, VEX_OPCODE_0F, &attributes);
int encode = simd_prefix_and_encode(src, xnoreg, as_XMMRegister(dst->encoding()), VEX_SIMD_66, VEX_OPCODE_0F, &attributes, true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the last argument to simd_prefix_and_encode shouldn't be true as src is not gpr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the swap of src and dst args in this case, I think src is actually a gpr here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. Thanks for the clarification.

Comment on lines 12895 to 12897
if (adr.index_needs_rex2()) {
assert(false, "prefix(Register dst, Address adr) does not support handling of an X");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be written as:
assert(!adr.index_needs_rex2(), "prefix(Register dst, Address adr) does not support handling of an X");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Thank you, done.

emit_int24(get_prefixq(src, dst),
0x0F,
(unsigned char)0xBE);
int prefix = get_prefixq(src, dst, true /* page1 */);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not consistent in the comment is_map1, M0, page1 all refer to the same thing. Also some places there is no comment that the true is for is_map1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Naming is now consistent with APX spec naming (is_map1 for the bool argument and M0 for the bit name). I added inline comment /* is_map1 */ for calls that pass the boolean argument.

@@ -14179,4 +14476,4 @@ void InstructionAttr::set_address_attributes(int tuple_type, int input_size_in_b
_tuple_type = tuple_type;
_input_size_in_bits = input_size_in_bits;
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line missing at the end of file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

int encode = prefixq_and_encode(dst->encoding(), src->encoding());
emit_int24(0x0F, (unsigned char)0xB8, (0xC0 | encode));
int encode = prefixq_and_encode(dst->encoding(), src->encoding(), true);
emit_opcode_prefix_and_encoding((unsigned char)0xB8, 0xC0, encode);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void Assembler::popcntq(Register dst, Address src) also need to be handled for rex2 generation. get_prefixq() will return a 16 bit entity and so call to emit_int32 directly is not correct.
emit_int32((unsigned char)0xF3,
get_prefixq(src, dst),
0x0F,
(unsigned char)0xB8);

Likewise void Assembler::cvttsd2siq(Register dst, Address src) also needs to be updated to handle extended gprs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, missed these. Updated popcntq(Register dst, Address src) and cvttsd2siq(Register dst, Address src) for egpr support.

Comment on lines 13258 to 13260
} else {
emit_int24((prefix & 0xFF00) >> 8, prefix & 0x00FF, b1);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a check for UseAPX > 0 here.

Copy link
Contributor Author

@steveatgh steveatgh Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sviswa7, sorry can you clarify what check is needed here. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I understand now. Have added an assert to require UseAPX if prefix is WREX2.

@steveatgh
Copy link
Contributor Author

Hi @vnkozlov,

Is there anything else you would like to see for this PR. If not, I would like to check it in this week.

@eme64
Copy link
Contributor

eme64 commented May 21, 2024

@steveatgh I just saw that we from Oracle did not run any tests for this. Can you please hold off for a day or two until we have the testing completed? I'm sure you did exhaustive testing - but still I'd like to make sure it runs fine on all the x64 machines we have.

@eme64
Copy link
Contributor

eme64 commented May 21, 2024

@steveatgh I think it could make sense to add a simple "hello world" JTREG test that enables the UseAPX flag, just to test if it is handled correctly, even on platforms that do not have the feature enabled.

@@ -1603,83 +1665,91 @@ void Assembler::andl(Register dst, Register src) {

void Assembler::andnl(Register dst, Register src1, Register src2) {
assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported");
InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false);
int encode = vex_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_38, &attributes);
assert(!needs_eevex(dst, src1, src2) || UseAPX, "extended gpr use requires UseAPX and UseAVX > 2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical detail: UseAPX and UseAVX > 2 sounds wrong. Did you mean to say "or"? Because UseAPX is only enabled when UseAVX >= 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, see comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion comes from the "or" in the code, and the "and" in the assert description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I removed the "and UseAVX > 2" here. Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is repeated a lot. Instead of it I think UseAPX assert check should be done in needs_rex2() and needs_eevex() when they return true.

int8_t w = 0x01;
Prefix p = Prefix_EMPTY;
if (needs_eevex(crc, adr.base(), adr.index())) {
assert(UseAPX, "extended gpr use requires UseAPX and UseAVX > 2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here the "and" makes sense, but not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "and" is technically correct. APX includes instructions that require AVX 3 (evex) encoding for extended gpr use, together with +UseAPX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to say "and UseAVX >2" because you switch off UseAPX in vm_version_x86.cpp when UseAVX < 3.

// APX support code currently requires UseAPX > 2. This may change in the future.
if (UseAPX && (UseAVX < 3)) {
if (!FLAG_IS_DEFAULT(UseAPX)) {
warning("UseAPX is only available when UseAVX > 2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
warning("UseAPX is only available when UseAVX > 2");
warning("UseAPX is only available when UseAVX > 2. Disabling UseAPX.");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would tell the user what we are doing, just like with the UseAVX flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, done.

@eme64
Copy link
Contributor

eme64 commented May 21, 2024

Thank you @eme64 for the comments. The functionality of the UseAPX flag is, as you point out, incomplete in this pull request. A subsequent PR (see JDK-8329030) will tie the logic of the flag in with a query of the hardware features. It was added in this PR thinking it could be useful for testing or debugging the encoding functionality.

Wait. Does this mean that if I enable the UseAPX flag on my AVX512 machine with UseAVX=3, that we will start encoding instructions using APX? Can that lead to wrong results?

@steveatgh
Copy link
Contributor Author

We will not start APX encoding instructions with this PR. APX encoding only comes into play when extended GPRs are used and the register allocator hasn't been extended to allocate extended GPRs yet. That will come in follow-up patches.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments.

@@ -1002,6 +1002,13 @@ void VM_Version::get_processor_features() {
}
}

if (UseAPX && (UseAVX < 3)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to have AVX512F present for APX? What about Knight CPUs which have limited AVX512 features?

@@ -1002,6 +1002,13 @@ void VM_Version::get_processor_features() {
}
}

if (UseAPX && (UseAVX < 3)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add code which checks CPUID features bit to set UseAPX. Or set it to false unconditionally in this PR regardless UseAVX value with comment "APX is not supported on this CPU". Otherwise someone will switch it on command line on avx512 machine.

Or we should push #18562 first. Which I prefer.

int8_t w = 0x01;
Prefix p = Prefix_EMPTY;
if (needs_eevex(crc, adr.base(), adr.index())) {
assert(UseAPX, "extended gpr use requires UseAPX and UseAVX > 2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to say "and UseAVX >2" because you switch off UseAPX in vm_version_x86.cpp when UseAVX < 3.

@@ -1603,83 +1665,91 @@ void Assembler::andl(Register dst, Register src) {

void Assembler::andnl(Register dst, Register src1, Register src2) {
assert(VM_Version::supports_bmi1(), "bit manipulation instructions not supported");
InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ true, /* no_mask_reg */ true, /* uses_vl */ false);
int encode = vex_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_38, &attributes);
assert(!needs_eevex(dst, src1, src2) || UseAPX, "extended gpr use requires UseAPX and UseAVX > 2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is repeated a lot. Instead of it I think UseAPX assert check should be done in needs_rex2() and needs_eevex() when they return true.

@steveatgh
Copy link
Contributor Author

Thanks @vnkozlov for the comments.

  • UseAPX is disabled for now, using the comment you suggested.
  • The asserts are now added to ::needs_rex2 and ::needs_eevex, and removed from the relevant instruction functions.

void Assembler::stmxcsr( Address dst) {
if (UseAVX > 0 ) {
void Assembler::stmxcsr(Address dst) {
if (UseAVX > 0 && !UseAPX ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New && !UseAPX check is strange here. If UseAPX is true we will execute } else { part of code which was executed only for SSE (UseAVX == 0) before. Is this intentional? This needs comment explaining why we do that if it is intentional.

I see in other place you have adr.base_needs_rex2() || adr.index_needs_rex2() check. Do we need it here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The !UseAPX test was added because if UseAPX is enabled we want to support extended register use via rex2 encoding in the else clause. The existing vex encoding remains when UseAPX is not enabled. There is a needs_rex2 check of address registers in the call to ::vex_prefix, asserting if UseAPX not enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment on this to the stmxcsr/ldmxcsr functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

void Assembler::stmxcsr( Address dst) {
if (UseAVX > 0 ) {
void Assembler::stmxcsr(Address dst) {
if (UseAVX > 0 && !UseAPX ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@@ -12858,9 +13027,21 @@ void Assembler::prefix(Address adr) {
prefix(REX_X);
}
}
if (is_map1) emit_int8(0x0F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • First. What is_map1 means? There is no explanation for this name. May be add comment somewhere in assembler_x86.hpp file or use more meaningful name.

  • Second. You added one more byte 0x0F for instructions even when extended registers are not used and APX is not enabled. Why? You added it in several prefix() and prefixq() methods. It can lead to regression since code size will increase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_map1 bool indicates an x86 map1 instruction which, when
legacy encoded, uses a 0x0F opcode prefix. By specification, the
opcode prefix is omitted when using rex2 encoding in support
of APX extended GPRs.

I added this comment before the relevant prefix functions in assembler_x86.hpp. Is this sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is good.

Actually you should have point me to code which shows that code generation did not change, we generated 0x0F before:

  - prefix(src);
  - emit_int16(0x0F, 0x18);
  + prefix(src, true /* is_map1 */);
  + emit_int8(0x18);

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Please wait result of @eme64 testing before integration.

@steveatgh
Copy link
Contributor Author

Hi @eme64, just wanted to check on the status of testing. Thanks.

@vnkozlov
Copy link
Contributor

I looked and the testing passed. No new failures. You also addressed his comments.
I think you can push.

@steveatgh
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 23, 2024
@openjdk
Copy link

openjdk bot commented May 23, 2024

@steveatgh
Your change (at version 47a0cd7) is now ready to be sponsored by a Committer.

@sviswa7
Copy link

sviswa7 commented May 23, 2024

/sponsor

@openjdk
Copy link

openjdk bot commented May 23, 2024

Going to push as commit f8a3e4e.
Since your change was applied there have been 247 commits pushed to the master branch:

  • ddd73b4: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active
  • 0a9d1f8: 8332749: Broken link in MemorySegment.Scope.html
  • c9a7b97: 8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs
  • 7fd9d6c: 8332340: Add JavacBench as a test case for CDS
  • 417d174: 8331348: Some incremental builds deposit files in the make directory
  • 303ac9f: 8332671: Logging for pretouching thread stacks shows wrong memory range
  • 90758f6: 8332808: Always set java.io.tmpdir to a suitable value in the build
  • e19a421: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array'
  • 2581935: 8332528: Generate code in SwitchBootstraps.generateTypeSwitch that require fewer adaptations
  • b890336: 8328083: degrade virtual thread support for GetObjectMonitorUsage
  • ... and 237 more: https://git.openjdk.org/jdk/compare/6f98d8f58f98827ae454c7ce4839de4071d95767...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 23, 2024
@openjdk openjdk bot closed this May 23, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 23, 2024
@openjdk
Copy link

openjdk bot commented May 23, 2024

@sviswa7 @steveatgh Pushed as commit f8a3e4e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
7 participants