-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding #24919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
|
@jatin-bhateja This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 212 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@jatin-bhateja The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
Please refer to following comments in relocInfo, which warns against recording relocation against exact patch site as it may pose problems in querying / iterating over relocations corresponding to particular instruction starting address. @TobiHartmann confirmed that the patch fixed crashes. |
|
/label add hotspot-compiler-dev |
|
/label add hotspot-gc-dev |
|
@jatin-bhateja |
|
@jatin-bhateja |
Webrevs
|
|
I think it is more future-proof to enhance the relocation information with the offset of the exact relocation patch from the instruction start instead. I also don't agree with adding |
|
An alternative fix would be to change CompiledDirectCall::find_stub_for() so that it ignores relocInfo::barrier_type. Adding a nop for ZBarrierRelocationFormatLoadGoodAfterShX but not other relocations, like ZBarrierRelocationFormatStoreGoodAfterOr, seems less robust. |
Thanks for supporting this idea, specializing barrier relocation is an alternative we already discussed, but it may not be able shield against false mapping with subsequent relocatable instruction which is what is causing crash currently. https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2025-April/088895.html @dean-long's suggestion to map a relocation to exact patch site was a fullproof solution overcome any such limitation, but it may pose problems while querying/iterating over relocation set against starting address of instruction and is a bigger change which we plan to address after evaluation and considering alternative scheme with https://bugs.openjdk.org/browse/JDK-8355341 Current scheme of adding relocation from end of instruction is not robust either to prevent incorrect mapping with subsequent relocatable instruction, NOP is not dispatched to execution unit by add additional byte to code cache but is full proof. I am inclined towards dean longs suggestion to skip over barrier relocation in offending code, though it's a localised fix and will not prevent the core issue in future code or existing code in some other flows |
|
What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one. |
|
I think @merykitty solution with two different relocations based on wether we support APX or not. And only emit the after and nop when On the other hand maybe we can solve this with a minimal change by simply looking for the REX2 prefix when we patch the code. Something along the line of: diff --git a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
index 9cdf0b229c0..4a956b450bd 100644
--- a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp
@@ -1328,7 +1328,13 @@ void ZBarrierSetAssembler::patch_barrier_relocation(address addr, int format) {
const uint16_t value = patch_barrier_relocation_value(format);
uint8_t* const patch_addr = (uint8_t*)addr + offset;
if (format == ZBarrierRelocationFormatLoadGoodBeforeShl) {
- *patch_addr = (uint8_t)value;
+ if (VM_Version::supports_apx_f()) {
+ NativeInstruction* instruction = nativeInstruction_at(addr);
+ uint8_t* const rex2_patch_addr = patch_addr + (instruction->has_rex2_prefix() ? 1 : 0);
+ *rex2_patch_addr = (uint8_t)value;
+ } else {
+ *patch_addr = (uint8_t)value;
+ }
} else {
*(uint16_t*)patch_addr = value;
}As for the solution to have the relocation point at the entry. While they were not designed to be used this way, It looks like it works. (At least from a barrier patching point of view, as we only want to iterate over all relocations, never map a PC to an relocation). But changing invariants are scary. And is probably better to evaluate as a part of the JDK-8355341 RFE. |
…to incorrect encoding
Hi @xmas92, Your suggestion looks good to me for this bugfix. I think we can improve upon the existing implementation as part of JDK-8355341 since its a bigger change and also include graal byein. There is still a possibility of incorrect relocation sharing with subsequent relocatable instructions in other cases, e.g. OR instruction for which we bookkeep the relocation address from the end of the instruction, and it's the last instruction in the pointer coloring primitive. For this bug fix, your suggestion looks fine to me. |
|
/contributor add @xmas92 |
|
@jatin-bhateja Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
|
@jatin-bhateja |
xmas92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I cannot test this on APX enabled hardware, I will leave the testing and verifying that this approach works up to you.
But the change looks good, and it maintains the original behaviour for none APX enabled hardware.
|
Hi @TobiHartmann , @eme64 , can you kindly run this version through your test infra. This is an APX-specific issue. I have verified its correctness using SDE, both following tests are now passing. |
|
Sure, I'll run it through testing and report back. |
|
All tests passed. |
sviswa7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as well.
|
/integrate |
|
Going to push as commit 53ad4b2.
Your commit was automatically rebased without conflicts. |
|
@jatin-bhateja Pushed as commit 53ad4b2. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This is a follow-up PR that fixes the crashes seen after the integration of PR #24664
ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2]
In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception.
This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction.
Please review and share your feedback.
Best Regards,
Jatin
[1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B
[2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873
PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs.
Progress
Issue
Reviewers
Contributors
<aboldtch@openjdk.org>Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24919/head:pull/24919$ git checkout pull/24919Update a local copy of the PR:
$ git checkout pull/24919$ git pull https://git.openjdk.org/jdk.git pull/24919/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24919View PR using the GUI difftool:
$ git pr show -t 24919Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24919.diff
Using Webrev
Link to Webrev Comment