Skip to content

Conversation

@TheRealMDoerr
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr commented Sep 26, 2025

We'd like to have a little more information in hs_err files in the following scenario: The VM crashes in code which does something with an nmethod. We often have a register pointing into code of the nmethod, but the nmethod is not disassembled in the hs_err file because the crash happens outside of it.

We can disassemble some instructions around the address inside the nmethod code. This is tricky on platforms which have variable length instructions (like x86). We need to find correct instruction start addresses. I'm proposing to use relocations for this purpose. There are usually enough of them distributed over the nmethod and they point to instruction start addresses.

I've tested this proposal by the following code on x86_64:

diff --git a/src/hotspot/cpu/x86/interp_masm_x86.cpp b/src/hotspot/cpu/x86/interp_masm_x86.cpp
index a6b4efbe4f2..d715e69c850 100644
--- a/src/hotspot/cpu/x86/interp_masm_x86.cpp
+++ b/src/hotspot/cpu/x86/interp_masm_x86.cpp
@@ -646,6 +646,18 @@ void InterpreterMacroAssembler::prepare_to_jump_from_interpreted() {
 void InterpreterMacroAssembler::jump_from_interpreted(Register method, Register temp) {
   prepare_to_jump_from_interpreted();
 
+  if (UseNewCode) {
+    Label ok;
+    movptr(temp, Address(method, Method::from_interpreted_offset()));
+    cmpptr(temp, Address(method, Method::interpreter_entry_offset()));
+    je(ok);
+    movptr(rax, Address(method, Method::from_compiled_offset()));
+    movptr(rbx, rax);
+    addptr(rbx, 128);
+    hlt();
+    bind(ok);
+  }
+
   if (JvmtiExport::can_post_interpreter_events()) {
     Label run_compiled_code;
     // JVMTI events, such as single-stepping, are implemented partly by avoiding running

The output is (requires hsdis library, otherwise we only get the hex dump):

RAX=0x00007f1a75000100 is at entry_point+0 in (nmethod*)0x00007f1a75000008
Compiled method (c1) 2504    1       3       java.lang.Byte::toUnsignedInt (6 bytes)
 total in heap  [0x00007f1a75000008,0x00007f1a750001f8] = 496
 main code      [0x00007f1a75000100,0x00007f1a750001b8] = 184
 stub code      [0x00007f1a750001b8,0x00007f1a750001f8] = 64
 mutable data [0x00007f1a1001e0b0,0x00007f1a1001e0e0] = 48
 relocation     [0x00007f1a1001e0b0,0x00007f1a1001e0d8] = 40
 metadata       [0x00007f1a1001e0d8,0x00007f1a1001e0e0] = 8
 immutable data [0x00007f1a1001dcd0,0x00007f1a1001dd30] = 96
 dependencies   [0x00007f1a1001dcd0,0x00007f1a1001dcd8] = 8
 scopes pcs     [0x00007f1a1001dcd8,0x00007f1a1001dd18] = 64
 scopes data    [0x00007f1a1001dd18,0x00007f1a1001dd30] = 24
0x00007f1a75000100:   89 84 24 00 80 fe ff 55 48 83 ec 20 41 81 7f 20
0x00007f1a75000110:   01 00 00 00 74 05 
--------------------------------------------------------------------------------
  0x00007f1a75000100:   mov    %eax,-0x18000(%rsp)
  0x00007f1a75000107:   push   %rbp
  0x00007f1a75000108:   sub    $0x20,%rsp
  0x00007f1a7500010c:   cmpl   $0x1,0x20(%r15)
  0x00007f1a75000114:   je     0x00007f1a7500011b
--------------------------------------------------------------------------------
RBX=0x00007f1a75000180 is at entry_point+128 in (nmethod*)0x00007f1a75000008
Compiled method (c1) 2505    1       3       java.lang.Byte::toUnsignedInt (6 bytes)
 total in heap  [0x00007f1a75000008,0x00007f1a750001f8] = 496
 main code      [0x00007f1a75000100,0x00007f1a750001b8] = 184
 stub code      [0x00007f1a750001b8,0x00007f1a750001f8] = 64
 mutable data [0x00007f1a1001e0b0,0x00007f1a1001e0e0] = 48
 relocation     [0x00007f1a1001e0b0,0x00007f1a1001e0d8] = 40
 metadata       [0x00007f1a1001e0d8,0x00007f1a1001e0e0] = 8
 immutable data [0x00007f1a1001dcd0,0x00007f1a1001dd30] = 96
 dependencies   [0x00007f1a1001dcd0,0x00007f1a1001dcd8] = 8
 scopes pcs     [0x00007f1a1001dcd8,0x00007f1a1001dd18] = 64
 scopes data    [0x00007f1a1001dd18,0x00007f1a1001dd30] = 24
0x00007f1a7500011b:   48 b8 d8 98 20 38 1a 7f 00 00 8b b8 98 00 00 00
0x00007f1a7500012b:   83 c7 02 89 b8 98 00 00 00 81 e7 fe 07 00 00 85
0x00007f1a7500013b:   ff 0f 84 19 00 00 00 81 e6 ff 00 00 00 48 8b c6
0x00007f1a7500014b:   48 83 c4 20 5d 49 3b 67 28 0f 87 1f 00 00 00 c3
0x00007f1a7500015b:   49 ba 28 bd 15 38 1a 7f 00 00 4c 89 54 24 08 48
0x00007f1a7500016b:   c7 04 24 ff ff ff ff e8 69 a2 47 07 eb c9 49 ba
0x00007f1a7500017b:   50 01 00 75 1a 7f 00 00 4d 89 97 90 05 00 00 
--------------------------------------------------------------------------------
  0x00007f1a75000179:   movabs $0x7f1a75000150,%r10
  0x00007f1a75000183:   mov    %r10,0x590(%r15)
--------------------------------------------------------------------------------

Feedback and further improvement suggestions are welcome.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27530/head:pull/27530
$ git checkout pull/27530

Update a local copy of the PR:
$ git checkout pull/27530
$ git pull https://git.openjdk.org/jdk.git pull/27530/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27530

View PR using the GUI difftool:
$ git pr show -t 27530

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27530.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 26, 2025

👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 26, 2025

@TheRealMDoerr This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8368787: Error reporting: hs_err files should show instructions when referencing code in nmethods

Reviewed-by: stuefe, aph, mbaesken, shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Sep 26, 2025
@openjdk
Copy link

openjdk bot commented Sep 26, 2025

@TheRealMDoerr The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 26, 2025
@mlbridge
Copy link

mlbridge bot commented Sep 26, 2025

Webrevs

@shipilev
Copy link
Member

shipilev commented Oct 6, 2025

Looks like a useful diagnostic tool. At very least we should dump the raw instruction stream around that pc, like we do for Instructions: block. Does Hotspot do that already?

Relocation trick is cute, but it hinges on assumption that relocations are always pointing at instruction boundary. I looked around and I think while most relocs are that way, there are some relocs that do not follow this rule. For example:

// Store Null Pointer
instruct zStorePNull(memory mem, immP0 zero, rRegP tmp, rFlagsReg cr)
%{
  predicate(UseZGC && n->as_Store()->barrier_data() != 0);
  match(Set mem (StoreP mem zero));
  effect(TEMP tmp, KILL cr);

  ins_cost(125); // XXX
  format %{ "movq    $mem, 0\t# ptr" %}
  ins_encode %{
    z_store_barrier(masm, this, $mem$$Address, noreg, $tmp$$Register, false /* is_atomic */);
    // Store a colored null - barrier code above does not need to color
    __ movq($mem$$Address, barrier_Relocation::unpatched);
    // The relocation cant be fully after the mov, as that is the beginning of a random subsequent
    // instruction, which violates assumptions made by unrelated code. Hence the end() - 1
    __ code_section()->relocate(__ code_section()->end() - 1, barrier_Relocation::spec(), ZBarrierRelocationFormatStoreGoodAfterMov);
  %}
  ins_pipe(ialu_mem_reg);
%}

I am guessing it is still fine to attempt to disassemble in this case.

@TheRealMDoerr
Copy link
Contributor Author

TheRealMDoerr commented Oct 6, 2025

Thanks for looking at this PR!
Hotspot currently dumps code (hex or disassembled) when the nmethod is on stack of the crashing thread. That is completely missing when it's not on stack.

Should we print both, hex dump and disassembly?

Interesting. I haven't tried with ZGC. Did you find more relocations which don't point to an instruction start?
We could ignore relocations with format ZBarrierRelocationFormatStoreGoodAfterMov on x86. (Or find the correct start in this case.) E.g. we could fix it like this:

diff --git a/src/hotspot/share/code/codeBlob.cpp b/src/hotspot/share/code/codeBlob.cpp
index 6511b4689ed..2e4d49a81c1 100644
--- a/src/hotspot/share/code/codeBlob.cpp
+++ b/src/hotspot/share/code/codeBlob.cpp
@@ -52,6 +52,9 @@
 #ifdef COMPILER1
 #include "c1/c1_Runtime1.hpp"
 #endif
+#if defined(AMD64) && INCLUDE_ZGC
+#include "gc/z/zBarrierSetAssembler.hpp"
+#endif
 
 #include <type_traits>
 
@@ -919,6 +922,10 @@ void CodeBlob::dump_for_addr(address addr, outputStream* st, bool verbose) const
         // disassemble correctly at instruction start addresses.)
         RelocIterator iter(nm, start);
         while (iter.next() && iter.addr() < addr) { // find relocation before addr
+#if defined(AMD64) && INCLUDE_ZGC
+          // There's a relocation which doesn't point to an instruction start:
+          if ((iter.type() != relocInfo::barrier_type) || (iter.format() != ZBarrierRelocationFormatStoreGoodAfterMov))
+#endif
           start = iter.addr();
         }
         if (iter.has_current()) {

@shipilev
Copy link
Member

shipilev commented Oct 6, 2025

Hotspot currently dumps code (hex or disassembled) when the nmethod is on stack of the crashing thread. That is completely missing when it's not on stack. [...] Should we print both, hex dump and disassembly?

Yes, I think if we know the location is within nmethod, it makes sense to dump around the location.

I think hex dump is most bullet-proof, as we can always disassemble offline it at different offsets. I don't think we want to specialize for reloc types, it does not gain us much? Also, relocs solve the variable-sized encoding only if you are lucky to hit the reloc right at the location you are decoding, right? Anything in between relocs is still pretty foggy. I suspect current patch would work in 99% of the cases, as it is hard to imagine e.g. the value in the register that points into nmethod and does not have some sort of reloc.

Then I also suspect that disassemblers actually able to figure the instruction boundaries pretty well? Because I don't quite see how our usual printout of decode(pc - 64, pc + 64) would otherwise work: pc-64 starts at arbitrary boundary. You might want to check if this whole reloc thing is even needed. What happens if we just do Disassembler::decode(MAX2(nm->entry_point(), addr - 64), MIN2(nm->code_end(), addr + 64))?

@TheRealMDoerr
Copy link
Contributor Author

I think hex dump is most bullet-proof, as we can always disassemble offline it at different offsets.

Right. The disassembler produces garbage if it starts disassembling somewhere besides the correct instruction start (on x86).
If that happens, we can play with the offset in the hex dump until the sequence looks feasible.
So, I think printing some hex dump around the address is always a good thing.

I don't think we want to specialize for reloc types, it does not gain us much? Also, relocs solve the variable-sized encoding only if you are lucky to hit the reloc right at the location you are decoding, right? Anything in between relocs is still pretty foggy. I suspect current patch would work in 99% of the cases, as it is hard to imagine e.g. the value in the register that points into nmethod and does not have some sort of reloc.

Yeah, nmethods don't contain a lot of data which is something else than valid instructions. So, most of the time, disassembly works as proposed here, but we may still produce garbage in rare cases. That's not a big problem if we have the hex dump.

Then I also suspect that disassemblers actually able to figure the instruction boundaries pretty well? Because I don't quite see how our usual printout of decode(pc - 64, pc + 64) would otherwise work: pc-64 starts at arbitrary boundary. You might want to check if this whole reloc thing is even needed. What happens if we just do Disassembler::decode(MAX2(nm->entry_point(), addr - 64), MIN2(nm->code_end(), addr + 64))?

decode(pc - 64, pc + 64) works fine on platforms like aarch64 and PPC64. I hope we don't have such code for x86. I got complete garbage when trying a wrong offset on x86.

@TheRealMDoerr
Copy link
Contributor Author

@shipilev: I have made some improvements after your feedback. Please take another look! Thanks!

@TheRealMDoerr
Copy link
Contributor Author

Updated example output above. We could potentially dump more code if ExtensiveErrorReports is enabled, but I'd like that topic open for future enhancements and go ahead with this initial proposal. It has already proven helpful to analyze a bug since we are testing it together with other changes.

@MBaesken
Copy link
Member

Could the new 'nmethod::print_code_snippet' coding crash under some bad circumstances ? If so, maybe we might need a way to disable it easily.

@TheRealMDoerr
Copy link
Contributor Author

Could the new 'nmethod::print_code_snippet' coding crash under some bad circumstances ? If so, maybe we might need a way to disable it easily.

Unlikely, and if it does, we still have the raw values and REATTEMPT_STEP_IF in VMError::report. So, it doesn't look more dangerous to me than other things we are doing, there. Do you have any specific concern?

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. This looks useful.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 22, 2025
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. I see no clear point in referencing ZBarrierRelocationFormatStoreGoodAfterMov in the comments, but I have no strong opinion either.

if (iter.addr() == addr) iter.next(); // find relocation after addr
if (iter.has_current()) end = iter.addr();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the size of the printout is somewhat random. In the extreme cases, this may be either (close to) start-of-method to end-of-method, so almost the whole method. Or, it may be from an address very close to the address, so a very small snippet.

Tying the end address to a relocation is not strictly necessary, no? We could just print to `MIN2(code end, addr + 64)? Disassembler should be fine if the printout stops in the middle of an instruction, as long as instruction addresses are correct?

And could we start printing at the relocation preceding-or-at addr - 64 instead, to ensure we have at least 64 bytes of printout before the crash address?

Copy link
Contributor Author

@TheRealMDoerr TheRealMDoerr Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the size is somewhat random. Relocations seem to be the most fine-grained information we currently have. In addition, they typically point to some meaningful points in the code. This PR disassembles the smallest possible snippet around the given address using relocations as start and end.

Right, having a relocation as end address is technically not strictly required. However, I've seen that the disassembler on x86 produced garbage as well when the end is not an instruction boundary.

I agree with you that we usually want at least 64 Bytes ahead. On the other hand, some people don't want too much, either. See JDK-8274986.
So, I changed only the hex dump for which we can afford printing more without bloating the hs_err file too much. Please take a look at my new commit.

Btw. addr is typically not a crash address, but one which is referenced by a register and somehow related to a crash.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 23, 2025
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for taking my comment into account.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 23, 2025
@TheRealMDoerr
Copy link
Contributor Author

Thanks for the reviews!

@TheRealMDoerr
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Oct 24, 2025

Going to push as commit b31bbfc.
Since your change was applied there have been 14 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 24, 2025
@openjdk openjdk bot closed this Oct 24, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 24, 2025
@openjdk
Copy link

openjdk bot commented Oct 24, 2025

@TheRealMDoerr Pushed as commit b31bbfc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@TheRealMDoerr TheRealMDoerr deleted the 8368787_hs_err_nmethod_code branch October 24, 2025 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

5 participants