8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods #27530

TheRealMDoerr · 2025-09-26T16:12:14Z

We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it.

We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses.

I've tested this proposal by the following code on x86_64:

diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp
index a6b4efbe4f2..d715e69c850 100644
--- a/src/hotspot/cpu/x86/interp_masm_x86.cpp
+++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp
@@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() {
 void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) {
   prepare_to_jump_from_interpreted();
 
+  if (UseNewCode) {
+    Label ok;
+    movptr(temp, Address(method, Method::from_interpreted_offset()));
+    cmpptr(temp, Address(method, Method::interpreter_entry_offset()));
+    je(ok);
+    movptr(rax, Address(method, Method::from_compiled_offset()));
+    movptr(rbx, rax);
+    addptr(rbx, 128);
+    hlt();
+    bind(ok);
+  }
+
   if (JvmtiExport::can_post_interpreter_events()) {
     Label run_compiled_code;
     // JVMTI events, such as single-stepping, are implemented partly by avoiding running

The output is (requires hsdis library, otherwise we only get the hex dump):

RAX=0x00007f1a75000100 is at entry_point+0 in (nmethod*)0x00007f1a75000008
Compiled method (c1) 2504    1       3       java.lang.Byte::toUnsignedInt (6 bytes)
 total in heap  [0x00007f1a75000008,0x00007f1a750001f8] = 496
 main code      [0x00007f1a75000100,0x00007f1a750001b8] = 184
 stub code      [0x00007f1a750001b8,0x00007f1a750001f8] = 64
 mutable data [0x00007f1a1001e0b0,0x00007f1a1001e0e0] = 48
 relocation     [0x00007f1a1001e0b0,0x00007f1a1001e0d8] = 40
 metadata       [0x00007f1a1001e0d8,0x00007f1a1001e0e0] = 8
 immutable data [0x00007f1a1001dcd0,0x00007f1a1001dd30] = 96
 dependencies   [0x00007f1a1001dcd0,0x00007f1a1001dcd8] = 8
 scopes pcs     [0x00007f1a1001dcd8,0x00007f1a1001dd18] = 64
 scopes data    [0x00007f1a1001dd18,0x00007f1a1001dd30] = 24
0x00007f1a75000100:   89 84 24 00 80 fe ff 55 48 83 ec 20 41 81 7f 20
0x00007f1a75000110:   01 00 00 00 74 05 
--------------------------------------------------------------------------------
  0x00007f1a75000100:   mov    %eax,-0x18000(%rsp)
  0x00007f1a75000107:   push   %rbp
  0x00007f1a75000108:   sub    $0x20,%rsp
  0x00007f1a7500010c:   cmpl   $0x1,0x20(%r15)
  0x00007f1a75000114:   je     0x00007f1a7500011b
--------------------------------------------------------------------------------
RBX=0x00007f1a75000180 is at entry_point+128 in (nmethod*)0x00007f1a75000008
Compiled method (c1) 2505    1       3       java.lang.Byte::toUnsignedInt (6 bytes)
 total in heap  [0x00007f1a75000008,0x00007f1a750001f8] = 496
 main code      [0x00007f1a75000100,0x00007f1a750001b8] = 184
 stub code      [0x00007f1a750001b8,0x00007f1a750001f8] = 64
 mutable data [0x00007f1a1001e0b0,0x00007f1a1001e0e0] = 48
 relocation     [0x00007f1a1001e0b0,0x00007f1a1001e0d8] = 40
 metadata       [0x00007f1a1001e0d8,0x00007f1a1001e0e0] = 8
 immutable data [0x00007f1a1001dcd0,0x00007f1a1001dd30] = 96
 dependencies   [0x00007f1a1001dcd0,0x00007f1a1001dcd8] = 8
 scopes pcs     [0x00007f1a1001dcd8,0x00007f1a1001dd18] = 64
 scopes data    [0x00007f1a1001dd18,0x00007f1a1001dd30] = 24
0x00007f1a7500011b:   48 b8 d8 98 20 38 1a 7f 00 00 8b b8 98 00 00 00
0x00007f1a7500012b:   83 c7 02 89 b8 98 00 00 00 81 e7 fe 07 00 00 85
0x00007f1a7500013b:   ff 0f 84 19 00 00 00 81 e6 ff 00 00 00 48 8b c6
0x00007f1a7500014b:   48 83 c4 20 5d 49 3b 67 28 0f 87 1f 00 00 00 c3
0x00007f1a7500015b:   49 ba 28 bd 15 38 1a 7f 00 00 4c 89 54 24 08 48
0x00007f1a7500016b:   c7 04 24 ff ff ff ff e8 69 a2 47 07 eb c9 49 ba
0x00007f1a7500017b:   50 01 00 75 1a 7f 00 00 4d 89 97 90 05 00 00 
--------------------------------------------------------------------------------
  0x00007f1a75000179:   movabs $0x7f1a75000150,%r10
  0x00007f1a75000183:   mov    %r10,0x590(%r15)
--------------------------------------------------------------------------------

Feedback and further improvement suggestions are welcome.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods (Enhancement - P4)

Reviewers

Andrew Haley (@theRealAph - Reviewer) Review applies to 81dd1c8e
Matthias Baesken (@MBaesken - Reviewer) Review applies to 81dd1c8e
Aleksey Shipilev (@shipilev - Reviewer) Review applies to 81dd1c8e
Thomas Stuefe (@tstuefe - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27530/head:pull/27530
$ git checkout pull/27530

Update a local copy of the PR:
$ git checkout pull/27530
$ git pull https://git.openjdk.org/jdk.git pull/27530/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27530

View PR using the GUI difftool:
$ git pr show -t 27530

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27530.diff

Using Webrev

Link to Webrev Comment

… referencing code in nemthods

bridgekeeper · 2025-09-26T16:13:27Z

👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-09-26T16:15:01Z

@TheRealMDoerr This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods

Reviewed-by: stuefe, aph, mbaesken, shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2025-09-26T16:15:40Z

@TheRealMDoerr The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-09-26T16:19:51Z

Webrevs

shipilev · 2025-10-06T07:33:57Z

Looks like a useful diagnostic tool. At very least we should dump the raw instruction stream around that pc, like we do for Instructions: block. Does Hotspot do that already?

Relocation trick is cute, but it hinges on assumption that relocations are always pointing at instruction boundary. I looked around and I think while most relocs are that way, there are some relocs that do not follow this rule. For example:

// Store Null Pointer
instruct zStorePNull(memory mem, immP0 zero, rRegP tmp, rFlagsReg cr)
%{
  predicate(UseZGC && n->as_Store()->barrier_data() != 0);
  match(Set mem (StoreP mem zero));
  effect(TEMP tmp, KILL cr);

  ins_cost(125); // XXX
  format %{ "movq    $mem, 0\t# ptr" %}
  ins_encode %{
    z_store_barrier(masm, this, $mem$$Address, noreg, $tmp$$Register, false /* is_atomic */);
    // Store a colored null - barrier code above does not need to color
    __ movq($mem$$Address, barrier_Relocation::unpatched);
    // The relocation cant be fully after the mov, as that is the beginning of a random subsequent
    // instruction, which violates assumptions made by unrelated code. Hence the end() - 1
    __ code_section()->relocate(__ code_section()->end() - 1, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterMov);
  %}
  ins_pipe(ialu_mem_reg);
%}

I am guessing it is still fine to attempt to disassemble in this case.

TheRealMDoerr · 2025-10-06T09:30:40Z

Thanks for looking at this PR!
Hotspot currently dumps code (hex or disassembled) when the nmethod is on stack of the crashing thread. That is completely missing when it's not on stack.

Should we print both, hex dump and disassembly?

Interesting. I haven't tried with ZGC. Did you find more relocations which don't point to an instruction start?
We could ignore relocations with format ZBarrierRelocationFormatStoreGoodAfterMov on x86. (Or find the correct start in this case.) E.g. we could fix it like this:

diff --git a/src/hotspot/share/code/codeBlob.cpp b/src/hotspot/share/code/codeBlob.cpp
index 6511b4689ed..2e4d49a81c1 100644
--- a/src/hotspot/share/code/codeBlob.cpp
+++ b/src/hotspot/share/code/codeBlob.cpp
@@ -52,6 +52,9 @@
 #ifdef COMPILER1
 #include "c1/c1_Runtime1.hpp"
 #endif
+#if defined(AMD64) && INCLUDE_ZGC
+#include "gc/z/zBarrierSetAssembler.hpp"
+#endif
 
 #include <type_traits>
 
@@ -919,6 +922,10 @@ void CodeBlob::dump_for_addr(address addr, outputStream* st, bool verbose) const
         // disassemble correctly at instruction start addresses.)
         RelocIterator iter(nm, start);
         while (iter.next() && iter.addr() < addr) { // find relocation before addr
+#if defined(AMD64) && INCLUDE_ZGC
+          // There's a relocation which doesn't point to an instruction start:
+          if ((iter.type() != relocInfo::barrier_type) || (iter.format() != ZBarrierRelocationFormatStoreGoodAfterMov))
+#endif
           start = iter.addr();
         }
         if (iter.has_current()) {

shipilev · 2025-10-06T13:18:01Z

Hotspot currently dumps code (hex or disassembled) when the nmethod is on stack of the crashing thread. That is completely missing when it's not on stack. [...] Should we print both, hex dump and disassembly?

Yes, I think if we know the location is within nmethod, it makes sense to dump around the location.

I think hex dump is most bullet-proof, as we can always disassemble offline it at different offsets. I don't think we want to specialize for reloc types, it does not gain us much? Also, relocs solve the variable-sized encoding only if you are lucky to hit the reloc right at the location you are decoding, right? Anything in between relocs is still pretty foggy. I suspect current patch would work in 99% of the cases, as it is hard to imagine e.g. the value in the register that points into nmethod and does not have some sort of reloc.

Then I also suspect that disassemblers actually able to figure the instruction boundaries pretty well? Because I don't quite see how our usual printout of decode(pc - 64, pc + 64) would otherwise work: pc-64 starts at arbitrary boundary. You might want to check if this whole reloc thing is even needed. What happens if we just do Disassembler::decode(MAX2(nm->entry_point(), addr - 64), MIN2(nm->code_end(), addr + 64))?

TheRealMDoerr · 2025-10-06T21:01:35Z

I think hex dump is most bullet-proof, as we can always disassemble offline it at different offsets.

Right. The disassembler produces garbage if it starts disassembling somewhere besides the correct instruction start (on x86).
If that happens, we can play with the offset in the hex dump until the sequence looks feasible.
So, I think printing some hex dump around the address is always a good thing.

I don't think we want to specialize for reloc types, it does not gain us much? Also, relocs solve the variable-sized encoding only if you are lucky to hit the reloc right at the location you are decoding, right? Anything in between relocs is still pretty foggy. I suspect current patch would work in 99% of the cases, as it is hard to imagine e.g. the value in the register that points into nmethod and does not have some sort of reloc.

Yeah, nmethods don't contain a lot of data which is something else than valid instructions. So, most of the time, disassembly works as proposed here, but we may still produce garbage in rare cases. That's not a big problem if we have the hex dump.

Then I also suspect that disassemblers actually able to figure the instruction boundaries pretty well? Because I don't quite see how our usual printout of decode(pc - 64, pc + 64) would otherwise work: pc-64 starts at arbitrary boundary. You might want to check if this whole reloc thing is even needed. What happens if we just do Disassembler::decode(MAX2(nm->entry_point(), addr - 64), MIN2(nm->code_end(), addr + 64))?

decode(pc - 64, pc + 64) works fine on platforms like aarch64 and PPC64. I hope we don't have such code for x86. I got complete garbage when trying a wrong offset on x86.

…ve comments.

TheRealMDoerr · 2025-10-08T10:41:55Z

@shipilev: I have made some improvements after your feedback. Please take another look! Thanks!

TheRealMDoerr · 2025-10-22T14:07:34Z

Updated example output above. We could potentially dump more code if ExtensiveErrorReports is enabled, but I'd like that topic open for future enhancements and go ahead with this initial proposal. It has already proven helpful to analyze a bug since we are testing it together with other changes.

MBaesken · 2025-10-22T14:16:19Z

Could the new 'nmethod::print_code_snippet' coding crash under some bad circumstances ? If so, maybe we might need a way to disable it easily.

TheRealMDoerr · 2025-10-22T14:22:46Z

Could the new 'nmethod::print_code_snippet' coding crash under some bad circumstances ? If so, maybe we might need a way to disable it easily.

Unlikely, and if it does, we still have the raw values and REATTEMPT_STEP_IF in VMError::report. So, it doesn't look more dangerous to me than other things we are doing, there. Do you have any specific concern?

theRealAph

OK. This looks useful.

shipilev

Looks fine. I see no clear point in referencing ZBarrierRelocationFormatStoreGoodAfterMov in the comments, but I have no strong opinion either.

tstuefe · 2025-10-23T08:58:22Z

src/hotspot/share/code/nmethod.cpp

+      if (iter.addr() == addr) iter.next(); // find relocation after addr
+      if (iter.has_current()) end = iter.addr();
+    }
+


IIUC, the size of the printout is somewhat random. In the extreme cases, this may be either (close to) start-of-method to end-of-method, so almost the whole method. Or, it may be from an address very close to the address, so a very small snippet.

Tying the end address to a relocation is not strictly necessary, no? We could just print to `MIN2(code end, addr + 64)? Disassembler should be fine if the printout stops in the middle of an instruction, as long as instruction addresses are correct?

And could we start printing at the relocation preceding-or-at addr - 64 instead, to ensure we have at least 64 bytes of printout before the crash address?

Right, the size is somewhat random. Relocations seem to be the most fine-grained information we currently have. In addition, they typically point to some meaningful points in the code. This PR disassembles the smallest possible snippet around the given address using relocations as start and end.

Right, having a relocation as end address is technically not strictly required. However, I've seen that the disassembler on x86 produced garbage as well when the end is not an instruction boundary.

I agree with you that we usually want at least 64 Bytes ahead. On the other hand, some people don't want too much, either. See JDK-8274986.
So, I changed only the hex dump for which we can afford printing more without bloating the hs_err file too much. Please take a look at my new commit.

Btw. addr is typically not a crash address, but one which is referenced by a register and somehow related to a crash.

tstuefe

Looks good to me. Thanks for taking my comment into account.

TheRealMDoerr · 2025-10-23T11:54:53Z

Thanks for the reviews!

TheRealMDoerr · 2025-10-24T08:25:28Z

/integrate

openjdk · 2025-10-24T08:26:24Z

Going to push as commit b31bbfc.
Since your change was applied there have been 14 commits pushed to the master branch:

26eed3b: 8068293: [TEST_BUG] Test closed/com/sun/java/swing/plaf/motif/InternalFrame/4150591/bug4150591.java fails with GTKLookAndFeel
87645af: 8370389: JavaFrameAnchor on s390 has unnecessary barriers
5862358: 8370013: Refactor Double.toHexString to eliminate regex and StringBuilder
... and 11 more: https://git.openjdk.org/jdk/compare/da968dc645db498b4315e4c8926e7aeb21cc533a...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-10-24T08:26:32Z

@TheRealMDoerr Pushed as commit b31bbfc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8368787: Error reporting: hs_err files should print instructions when…

660a388

… referencing code in nemthods

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Sep 26, 2025

openjdk bot added the rfr Pull request is ready for review label Sep 26, 2025

TheRealMDoerr added 3 commits October 7, 2025 00:03

Always print hex dump. Plus disassembly when hsdis loaded.

003680d

Move printing code to nmethod.cpp.

4a05d40

Use frame_complete_offset for better start address computation. Impro…

81dd1c8

…ve comments.

TheRealMDoerr mentioned this pull request Oct 22, 2025

8365047: Remove exception handler stub code in C2 #26678

Open

3 tasks

theRealAph approved these changes Oct 22, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Oct 22, 2025

MBaesken approved these changes Oct 23, 2025

View reviewed changes

shipilev approved these changes Oct 23, 2025

View reviewed changes

tstuefe reviewed Oct 23, 2025

View reviewed changes

TheRealMDoerr added 2 commits October 23, 2025 12:02

Ensure to print at least 64 Bytes ahead in hex dump.

db4c64a

Merge remote-tracking branch 'origin' into 8368787_hs_err_nmethod_code

99c248f

openjdk bot removed the ready Pull request is ready to be integrated label Oct 23, 2025

tstuefe approved these changes Oct 23, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Oct 23, 2025

openjdk bot added the integrated Pull request has been integrated label Oct 24, 2025

openjdk bot closed this Oct 24, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 24, 2025

TheRealMDoerr deleted the 8368787_hs_err_nmethod_code branch October 24, 2025 08:26

8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods #27530

8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods #27530

Uh oh!

Conversation

TheRealMDoerr commented Sep 26, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Sep 26, 2025

Uh oh!

openjdk bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

shipilev commented Oct 6, 2025

Uh oh!

TheRealMDoerr commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shipilev commented Oct 6, 2025

Uh oh!

TheRealMDoerr commented Oct 6, 2025

Uh oh!

TheRealMDoerr commented Oct 8, 2025

Uh oh!

TheRealMDoerr commented Oct 22, 2025

Uh oh!

MBaesken commented Oct 22, 2025

Uh oh!

TheRealMDoerr commented Oct 22, 2025

Uh oh!

theRealAph left a comment

Choose a reason for hiding this comment

Uh oh!

shipilev left a comment

Choose a reason for hiding this comment

Uh oh!

tstuefe Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

TheRealMDoerr Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tstuefe left a comment

Choose a reason for hiding this comment

Uh oh!

TheRealMDoerr commented Oct 23, 2025

Uh oh!

TheRealMDoerr commented Oct 24, 2025

Uh oh!

openjdk bot commented Oct 24, 2025

Uh oh!

openjdk bot commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

TheRealMDoerr commented Sep 26, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Sep 26, 2025 •

edited

Loading

openjdk bot commented Sep 26, 2025 •

edited

Loading

mlbridge bot commented Sep 26, 2025 •

edited

Loading

TheRealMDoerr commented Oct 6, 2025 •

edited

Loading

TheRealMDoerr Oct 23, 2025 •

edited

Loading