Skip to content

Conversation

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Apr 28, 2025

This is a follow-up PR that fixes the crashes seen after the integration of PR #24664

ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2]

In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception.

This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction.

Please review and share your feedback.

Best Regards,
Jatin

[1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873

PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding (Bug - P3)

Reviewers

Contributors

  • Axel Boldt-Christmas <aboldtch@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24919/head:pull/24919
$ git checkout pull/24919

Update a local copy of the PR:
$ git checkout pull/24919
$ git pull https://git.openjdk.org/jdk.git pull/24919/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24919

View PR using the GUI difftool:
$ git pr show -t 24919

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24919.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 28, 2025

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 28, 2025

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding

Co-authored-by: Axel Boldt-Christmas <aboldtch@openjdk.org>
Reviewed-by: aboldtch, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 212 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Apr 28, 2025

@jatin-bhateja The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Apr 28, 2025
@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Apr 29, 2025

Please refer to following comments in relocInfo, which warns against recording relocation against exact patch site as it may pose problems in querying / iterating over relocations corresponding to particular instruction starting address.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/relocInfo.hpp#L85

@TobiHartmann confirmed that the patch fixed crashes.
https://bugs.openjdk.org/browse/JDK-8355363#:~:text=Sounds%20reasonable%2C%20maybe%20mention%20that%20in%20the%20PR%20as%20well.%20All%20testing%20passed.

@jatin-bhateja jatin-bhateja marked this pull request as ready for review April 29, 2025 18:45
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 29, 2025
@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@jatin-bhateja
Copy link
Member Author

/label add hotspot-gc-dev

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 29, 2025
@openjdk
Copy link

openjdk bot commented Apr 29, 2025

@jatin-bhateja
The hotspot-compiler label was successfully added.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Apr 29, 2025
@openjdk
Copy link

openjdk bot commented Apr 29, 2025

@jatin-bhateja
The hotspot-gc label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Apr 29, 2025

Webrevs

@merykitty
Copy link
Member

I think it is more future-proof to enhance the relocation information with the offset of the exact relocation patch from the instruction start instead. I also don't agree with adding nop to the fast path, especially uncolor is used in the load fast path IIUC.

@dean-long
Copy link
Member

An alternative fix would be to change CompiledDirectCall::find_stub_for() so that it ignores relocInfo::barrier_type. Adding a nop for ZBarrierRelocationFormatLoadGoodAfterShX but not other relocations, like ZBarrierRelocationFormatStoreGoodAfterOr, seems less robust.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Apr 30, 2025

I think it is more future-proof to enhance the relocation information with the offset of the exact relocation patch from the instruction start instead. I also don't agree with adding nop to the fast path, especially uncolor is used in the load fast path IIUC.

Thanks for supporting this idea, specializing barrier relocation is an alternative we already discussed, but it may not be able shield against false mapping with subsequent relocatable instruction which is what is causing crash currently.

https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2025-April/088895.html

@dean-long's suggestion to map a relocation to exact patch site was a fullproof solution overcome any such limitation, but it may pose problems while querying/iterating over relocation set against starting address of instruction and is a bigger change which we plan to address after evaluation and considering alternative scheme with https://bugs.openjdk.org/browse/JDK-8355341

Current scheme of adding relocation from end of instruction is not robust either to prevent incorrect mapping with subsequent relocatable instruction, NOP is not dispatched to execution unit by add additional byte to code cache but is full proof.

I am inclined towards dean longs suggestion to skip over barrier relocation in offending code, though it's a localised fix and will not prevent the core issue in future code or existing code in some other flows

@merykitty
Copy link
Member

What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one.

@xmas92
Copy link
Member

xmas92 commented May 5, 2025

I think @merykitty solution with two different relocations based on wether we support APX or not. And only emit the after and nop when VM_Version::supports_apx_f() is true.

On the other hand maybe we can solve this with a minimal change by simply looking for the REX2 prefix when we patch the code. Something along the line of:

diff --git a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
index 9cdf0b229c0..4a956b450bd 100644
--- a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
@@ -1328,7 +1328,13 @@ void ZBarrierSetAssembler::patch_barrier_relocation(address addr, int format) {
   const uint16_t value = patch_barrier_relocation_value(format);
   uint8_t* const patch_addr = (uint8_t*)addr + offset;
   if (format == ZBarrierRelocationFormatLoadGoodBeforeShl) {
-    *patch_addr = (uint8_t)value;
+    if (VM_Version::supports_apx_f()) {
+      NativeInstruction* instruction = nativeInstruction_at(addr);
+      uint8_t* const rex2_patch_addr = patch_addr + (instruction->has_rex2_prefix() ? 1 : 0);
+      *rex2_patch_addr = (uint8_t)value;
+    } else {
+      *patch_addr = (uint8_t)value;
+    }
   } else {
     *(uint16_t*)patch_addr = value;
   }

As for the solution to have the relocation point at the entry. While they were not designed to be used this way, It looks like it works. (At least from a barrier patching point of view, as we only want to iterate over all relocations, never map a PC to an relocation). But changing invariants are scary. And is probably better to evaluate as a part of the JDK-8355341 RFE.

@jatin-bhateja
Copy link
Member Author

Member

Hi @xmas92, Your suggestion looks good to me for this bugfix. I think we can improve upon the existing implementation as part of JDK-8355341 since its a bigger change and also include graal byein.

There is still a possibility of incorrect relocation sharing with subsequent relocatable instructions in other cases, e.g. OR instruction for which we bookkeep the relocation address from the end of the instruction, and it's the last instruction in the pointer coloring primitive. For this bug fix, your suggestion looks fine to me.

@jatin-bhateja
Copy link
Member Author

/contributor add @xmas92

@openjdk
Copy link

openjdk bot commented May 6, 2025

@jatin-bhateja Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk
Copy link

openjdk bot commented May 6, 2025

@jatin-bhateja
Contributor Axel Boldt-Christmas <aboldtch@openjdk.org> successfully added.

Copy link
Member

@xmas92 xmas92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I cannot test this on APX enabled hardware, I will leave the testing and verifying that this approach works up to you.

But the change looks good, and it maintains the original behaviour for none APX enabled hardware.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 7, 2025
@jatin-bhateja
Copy link
Member Author

Hi @TobiHartmann , @eme64 , can you kindly run this version through your test infra. This is an APX-specific issue. I have verified its correctness using SDE, both following tests are now passing.
https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/c2/irTests/gc

@TobiHartmann
Copy link
Member

Sure, I'll run it through testing and report back.

@TobiHartmann
Copy link
Member

All tests passed.

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well.

@jatin-bhateja
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented May 9, 2025

Going to push as commit 53ad4b2.
Since your change was applied there have been 214 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 9, 2025
@openjdk openjdk bot closed this May 9, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 9, 2025
@openjdk
Copy link

openjdk bot commented May 9, 2025

@jatin-bhateja Pushed as commit 53ad4b2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

6 participants