Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8272586: emit abstract machine code in hs-err logs #5446

Closed
wants to merge 8 commits into from

Conversation

dougxc
Copy link
Member

@dougxc dougxc commented Sep 9, 2021

This enhances hs-err logs to:

  • Show the abstract machine code of the crashing frame if a disassembler is not installed.
  • Show machine code for the top frames on the native stack.

An interpreter or stub frame is only shown if it is the crashing frame.

A sample of the enhanced hs-err log can be seen here.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8272586: emit abstract machine code in hs-err logs

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5446/head:pull/5446
$ git checkout pull/5446

Update a local copy of the PR:
$ git checkout pull/5446
$ git pull https://git.openjdk.java.net/jdk pull/5446/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5446

View PR using the GUI difftool:
$ git pr show -t 5446

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5446.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 9, 2021

👋 Welcome back dnsimon! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 9, 2021

@dougxc The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Sep 9, 2021
}
// The first 10 unique code units on the stack should be sufficient
// for investigating crashes.
int printed_len = 10;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 10 is too big?

…e abstract assembly if a disassembler is not installed
@dougxc dougxc marked this pull request as ready for review October 6, 2021 14:25
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 6, 2021
@mlbridge
Copy link

mlbridge bot commented Oct 6, 2021

Webrevs

@rkennke
Copy link
Contributor

rkennke commented Oct 6, 2021

This is going to be very useful! I used to copy+paste the instructions hex block into disassembler.io, having relevant assembly in hs_err will make debugging such situations much easier! This is not a review, just a big thumbs-up on the feature as such! :-) Thanks, Roman

Comment on lines +305 to +307
ResourceMark rm(thread);
Disassembler::decode(cb, st);
st->cr();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why RM used only here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you copied existing code. Okay.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good in general. I have few small comments.

// Compiled code may use EBP register on x86 so it looks like
// non-walkable C frame. Use frame.sender() for java frames.
frame invalid;
if (t && t->is_Java_thread()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use (t != nullptr) explicitly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.
BTW, I've never seen nullptr before - where is it defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (os::is_first_C_frame(&fr)) break;
fr = os::get_sender_for_C_frame(&fr);
fr = next_frame(fr, t);
if (!fr.pc()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use == nullptr

memset(printed, 0, sizeof(address) * printed_len);

frame fr = os::fetch_frame_from_context(_context);
for (int count = 0; count < printed_len && fr.pc(); ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use (fr.pc() != nullptr)

Comment on lines 245 to 247
* Determines if the code unit denoted by `owner` should be printed.
* If this method returns true, then `owner` has been added to `printed`.
* If `owner` was already in `printed`, this methods returns false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because of recursive calls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - no point in printing the same code more than once.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.
Someone from runtime group have to look on this too.

@openjdk
Copy link

openjdk bot commented Oct 6, 2021

@dougxc This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8272586: emit abstract machine code in hs-err logs

Reviewed-by: kvn, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 25 new commits pushed to the master branch:

  • 8de7763: 8273152: Refactor CDS FileMapHeader loading code
  • 920e707: 8274920: ProblemList 2 VectorAPI tests failing due to "assert(!vbox->is_Phi()) failed"
  • 7de2cf8: 8273910: Redundant condition and assignment in java.net.URI
  • 8ca0846: 8274407: (tz) Update Timezone Data to 2021c
  • 4ab274a: 8274858: Remove unused dictionary_classes_do functions
  • 03a8d34: 8273826: Correct Manifest file name and NPE checks
  • 722d639: 8274838: runtime/cds/appcds/TestSerialGCWithCDS.java fails on Windows
  • 340c715: 8273929: Remove GzipRandomAccess in heap dump test
  • 8319836: 8274546: Shenandoah: Remove unused ShenandoahUpdateRootsTask copy
  • d5ccfa2: 8274755: Replace 'while' cycles with iterator with enhanced-for in jdk.jdi
  • ... and 15 more: https://git.openjdk.java.net/jdk/compare/cdf89304eafce58e66127d4e277f5fcaf07c9954...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 6, 2021
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Doug,

Seems okay from a runtime perspective. A few minor code nits below - and the test needs a change.

Thanks,
David

src/hotspot/share/utilities/vmError.cpp Outdated Show resolved Hide resolved
return false;
}
if (printed[i] == 0) {
printed[i] = owner;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to have this side-effect in what appears to be a query - perhaps add_to_printed would be more apt: adds owner to the list of printed owners, unless already present. Returns true if owner was added, and false it if was already present, or else we've exceeded the printing limit. ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion. I've changed it to be add_if_absent and made it more generic.

* @param printed_len the length of `printed`
*/
static bool print_code(outputStream* st, Thread* thread, address pc, bool is_crash_pc,
address* printed, int printed_len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please align the parameters with the line above

if (is_crash_pc) {
// The interpreter CodeBlob is very large so try to print the codelet instead.
InterpreterCodelet* codelet = Interpreter::codelet_containing(pc);
if (codelet != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use nullptr in new code.

Comment on lines 105 to 107
"-Xcomp",
"-XX:CompileCommand=compileonly,MachCodeFramesInErrorFile$Crasher.*",
"-XX:CompileCommand=dontinline,MachCodeFramesInErrorFile$Crasher.*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it requires C2, so there should be an @requires for that in case the test is run under Zero or a MinimalVM build.

changed should_print_code to add_if_absent
made printed_capacity a const value
handle platforms where native stack cannot be walked directly by VMError
improved Crasher such that the crash happens in a JIT compiled frame
@dougxc
Copy link
Member Author

dougxc commented Oct 7, 2021

I have pushed further changes to this PR to address test failures:

  • omit timestamp from nmethod decoding and printing (0621eb9)
  • support printing the crashing frame on platforms such as Windows where stack walking cannot be done without platform support
  • changed MachCodeFramesInErrorFile to pass if Crasher fails to crash in the expected frame (driving C2 with VM options is not 100% guaranteed to give the expected outcome)

I would appreciate a review of these changes.

st->print("%7d ", (int)tty->time_stamp().milliseconds());
if (timestamp) {
// Print current time
st->print("%7d ", (int)tty->time_stamp().milliseconds());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why a timestamp should be included when printing the contents of an nmethod outside the context of a CompileTask. The timestamp was causing the DisassembleCodeBlobTest to fail as it expects disassembling a given nmethod twice to produce the same result.

@dougxc
Copy link
Member Author

dougxc commented Oct 7, 2021

The MachCodeFramesInErrorFile test is crashing on macosx-aarch64. I've isolated the problem in lldb:

Process 43544 stopped
* thread #3, stop reason = signal SIGBUS
  * frame #0: 0x00000001064cc8b4 libjvm.dylib`PcDescCache::add_pc_desc(this=0x00000001186083b0, pc_desc=0x0000000118608770) at nmethod.cpp:395:18
    frame #1: 0x00000001064d4588 libjvm.dylib`PcDescContainer::find_pc_desc_internal(this=0x00000001186083b0, pc=" \U00000003\xd5\xe9s@\xd1?\U00000001", approximate=true, search=0x000000017008d0f0) at nmethod.cpp:2220:20
    frame #2: 0x0000000105bd23a8 libjvm.dylib`PcDescContainer::find_pc_desc(this=0x00000001186083b0, pc=" \U00000003\xd5\xe9s@\xd1?\U00000001", approximate=true, search=0x000000017008d0f0) at compiledMethod.hpp:136:12
    frame #3: 0x0000000105bd230c libjvm.dylib`CompiledMethod::find_pc_desc(this=0x0000000118608310, pc=" \U00000003\xd5\xe9s@\xd1?\U00000001", approximate=true) at compiledMethod.hpp:404:31
    frame #4: 0x0000000105d326fc libjvm.dylib`CompiledMethod::pc_desc_near(this=0x0000000118608310, pc=" \U00000003\xd5\xe9s@\xd1?\U00000001") at compiledMethod.hpp:232:45
    frame #5: 0x00000001064d7c8c libjvm.dylib`nmethod::scope_desc_in(this=0x0000000118608310, begin="\U0000001f \U00000003\xd5\xe9s@\xd1?\U00000001", end="\xe9s@\xd1?\U00000001") at nmethod.cpp:3129:15
    frame #6: 0x00000001064d6c6c libjvm.dylib`nmethod::has_code_comment(this=0x0000000118608310, begin="\U0000001f \U00000003\xd5\xe9s@\xd1?\U00000001", end="\xe9s@\xd1?\U00000001") at nmethod.cpp:3255:20
    frame #7: 0x00000001064cf9fc libjvm.dylib`nmethod::decode2(this=0x0000000118608310, ost=0x000000017008df38) const at nmethod.cpp:2983:79
    frame #8: 0x0000000105e3f244 libjvm.dylib`Disassembler::decode(cb=0x0000000118608310, st=0x000000017008df38) at disassembler.cpp:875:21
    frame #9: 0x00000001067dd568 libjvm.dylib`print_code(st=0x000000017008df38, thread=0x000000010080bc20, pc="J\U00000001@\xf9\x8b\xcfA\xf9\U0000007f\U00000001@\xb9\x88\xc3N9\xc8", is_crash_pc=true, printed=0x000000017008dc50, printed_capacity=5) at vmError.cpp:307:9
    frame #10: 0x00000001067dbe18 libjvm.dylib`VMError::report(st=0x000000017008df38, _verbose=true) at vmError.cpp:928:16

This looks like a pre-existing bug.
@vnkozlov would be able to help me with this?

@vnkozlov
Copy link
Contributor

vnkozlov commented Oct 7, 2021

@dougxc I looked on issues with SIGBUS(0xa) and may be related:
https://bugs.openjdk.java.net/browse/JDK-8266453
https://bugs.openjdk.java.net/browse/JDK-8262894
https://bugs.openjdk.java.net/browse/JDK-8262896

Fixes enable WX. May be we need something like this here.

@dougxc
Copy link
Member Author

dougxc commented Oct 7, 2021

Thanks, you're right. I placed it in nmethod::decode2: 5e00264

BTW, I noticed that often nmethod info is written to tty during crash reporting. I tracked this down to this change where the st argument is ignored. @TheRealMDoerr do you think that should be fixed?

@vnkozlov
Copy link
Contributor

vnkozlov commented Oct 7, 2021

Update looks good.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Doug,

Updates to non-compiler code look fine to me.

Thanks,
David

@dougxc
Copy link
Member Author

dougxc commented Oct 8, 2021

Thanks @vnkozlov and @dholmes-ora for the input and reviews.

@dougxc
Copy link
Member Author

dougxc commented Oct 8, 2021

/integrate

@openjdk
Copy link

openjdk bot commented Oct 8, 2021

Going to push as commit b60837a.
Since your change was applied there have been 26 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Oct 8, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 8, 2021
@openjdk
Copy link

openjdk bot commented Oct 8, 2021

@dougxc Pushed as commit b60837a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@TheRealMDoerr
Copy link
Contributor

Thanks, you're right. I placed it in nmethod::decode2: 5e00264

BTW, I noticed that often nmethod info is written to tty during crash reporting. I tracked this down to this change where the st argument is ignored. @TheRealMDoerr do you think that should be fixed?

nm->print_nmethod(verbose); was already used before my change. I had only moved it out of os::print_location. But yes, would be good to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
5 participants