Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8242181: [Linux] Show source information when printing native stack traces in hs_err files #7126

Closed
wants to merge 72 commits into from

Conversation

chhagedorn
Copy link
Member

@chhagedorn chhagedorn commented Jan 18, 2022

When printing the native stack trace on Linux (mostly done for hs_err files), it only prints the method with its parameters and a relative offset in the method:

Stack: [0x00007f6e01739000,0x00007f6e0183a000],  sp=0x00007f6e01838110,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x620d86]  Compilation::~Compilation()+0x64
V  [libjvm.so+0x624b92]  Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec
V  [libjvm.so+0x8303ef]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899
V  [libjvm.so+0x82f067]  CompileBroker::compiler_thread_loop()+0x3df
V  [libjvm.so+0x84f0d1]  CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69
V  [libjvm.so+0x1209329]  JavaThread::thread_main_inner()+0x15d
V  [libjvm.so+0x12091c9]  JavaThread::run()+0x167
V  [libjvm.so+0x1206ada]  Thread::call_run()+0x180
V  [libjvm.so+0x1012e55]  thread_native_entry(Thread*)+0x18f

This makes it sometimes difficult to see where exactly the methods were called from and sometimes almost impossible when there are multiple invocations of the same method within one method.

This patch improves this by providing source information (filename + line number) to the native stack traces on Linux similar to what's already done on Windows (see JDK-8185712):

Stack: [0x00007f34fca18000,0x00007f34fcb19000],  sp=0x00007f34fcb17110,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x620d86]  Compilation::~Compilation()+0x64  (c1_Compilation.cpp:607)
V  [libjvm.so+0x624b92]  Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0xec  (c1_Compiler.cpp:250)
V  [libjvm.so+0x8303ef]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x899  (compileBroker.cpp:2291)
V  [libjvm.so+0x82f067]  CompileBroker::compiler_thread_loop()+0x3df  (compileBroker.cpp:1966)
V  [libjvm.so+0x84f0d1]  CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x69  (compilerThread.cpp:59)
V  [libjvm.so+0x1209329]  JavaThread::thread_main_inner()+0x15d  (thread.cpp:1297)
V  [libjvm.so+0x12091c9]  JavaThread::run()+0x167  (thread.cpp:1280)
V  [libjvm.so+0x1206ada]  Thread::call_run()+0x180  (thread.cpp:358)
V  [libjvm.so+0x1012e55]  thread_native_entry(Thread*)+0x18f  (os_linux.cpp:705)

For Linux, we need to parse the debug symbols which are generated by GCC in DWARF - a standardized debugging format. This patch adds support for DWARF 4, the default of GCC 10.x, for 32 and 64 bit architectures (tested with x86_32, x86_64 and AArch64). DWARF 5 is not supported as it was still experimental and not generated for HotSpot. However, newer GCC version may soon generate DWARF 5 by default in which case this parser either needs to be extended or the build of HotSpot configured to only emit DWARF 4.

The code follows the parsing steps described in the official DWARF 4 spec: https://dwarfstd.org/doc/DWARF4.pdf
I added references to the corresponding sections throughout the code. However, I tried to explain the steps from the DWARF spec directly in the code (method names, comments etc.). This allows to follow the code without the need to actually deep dive into the spec.

The comments at the Dwarf class in the elf.hpp file explain in more detail how a DWARF file is structured and how the parsing algorithm works to get to the filename and line number information. There are more class comments throughout the elf.hpp file about how different DWARF sections are structured and how the parsing algorithm needs to fetch the required information. Therefore, I will not repeat the exact workings of the algorithm here but refer to the code comments. I've tried to add as much information as possible to improve the readability.

Generally, I've tried to stay away from adding any assertions as this code is almost always executed when already processing a VM error. Instead, the DWARF parser aims to just exit gracefully and possibly omit source information for a stack frame instead of risking to stop writing the hs_err file when an assertion would have failed. To debug failures, -Xlog:dwarf can be used with info, debug or trace which provides logging messages throughout parsing.

Testing:
Apart from manual testing, I've added two kinds of tests:

  • A JTreg test: Spawns new VMs to let them crash in various ways. The test reads the created hs_err files to check if the DWARF parsing could correctly find the filename and line number. For normal HotSpot files, I could not check against hardcoded filenames and line numbers as they are subject to change (especially line number can quickly become different). I therefore just added some sanity checks in the form of "found a non-empty file" and "found a non-zero line number". On top of that, I added tests that let the VM crash in custom C files (which will not change). This enables an additional verification of hardcoded filenames and line numbers.
  • Gtests: Directly calling the get_source() method which initiates DWARF parsing. Tested some special cases, for example, having a buffer that is not big enough to store the filename.

On top of that, there are also existing JTreg tests that call -XX:NativeMemoryTracking=detail which will print a native stack trace with the new source information. These tests were also run as part of the standard tier testing and can be considered as sanity tests for this implementation.

To make tests work in our infrastructure or if some other setups want to have debug symbols at different locations, I've added support for an additional _JVM_DWARF_PATH environment variable. This variable can specify a path from which the DWARF symbol file should be read by the parser if the default locations do not contain debug symbols (required some make changes). This is similar to what's done on Windows with _NT_SYMBOL_PATH. The JTreg test, however, also works if there are no symbols available. In that case, the test just skips all the assertion checks for the filename and line number.

I haven't run any specific performance testing as this new code is mainly executed when an error will exit the VM and only if symbol files are available (which is normally not the case when using Java release builds as a user).

Special thanks to @tschatzl for giving me some pointers to start based on his knowledge from a DWARF 2 parser he once wrote in Pascal and for discussing approaches on how to retrieve the source information and to @erikj79 for providing help for the changes required for make!

Thanks,
Christian


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8242181: [Linux] Show source information when printing native stack traces in hs_err files

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/7126/head:pull/7126
$ git checkout pull/7126

Update a local copy of the PR:
$ git checkout pull/7126
$ git pull https://git.openjdk.org/jdk pull/7126/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7126

View PR using the GUI difftool:
$ git pr show -t 7126

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/7126.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 18, 2022

👋 Welcome back chagedorn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 18, 2022

@chhagedorn The following labels will be automatically applied to this pull request:

  • build
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Jan 18, 2022
@chhagedorn chhagedorn changed the title 8242181: [Linux] Show source information in stack trace of VM crash in hs_err file 8242181: [Linux] Show source information when printing native stack traces in hs_err files Jan 19, 2022
@chhagedorn
Copy link
Member Author

/label remove build

@openjdk openjdk bot removed the build build-dev@openjdk.org label Jan 20, 2022
@openjdk
Copy link

openjdk bot commented Jan 20, 2022

@chhagedorn
The build label was successfully removed.

…atrix row, using existing source information for internal errors in first frame.
…more flexible and pass if no libjvm.so debug symbols are found which might not always be there
@chhagedorn
Copy link
Member Author

I've filed JDK-8287021 to later add support for DWARF 5.

To bring up again some general points that are left to decide upon/fix:

  • UL: If not used, we might want to introduce a special flag like -XX:TraceDwarf={1,2,3} to enable/disable different log levels. I'm fine with both approaches. I think we should keep some kind of logging around, especially when later adding DWARF 5 support.

We see test errors on Linux ppcle and x64 in gtests:

[ RUN      ] os_linux.decoder_get_source_info_valid_vm
/sapmnt/sapjvm_work/openjdk/nb/linuxppc64le/jdk-dev/test/hotspot/gtest/runtime/test_os_linux.cpp:442: Failure
Value of: Decoder::get_source_info(valid_function_pointer, buf, sizeof(buf), &line)
  Actual: false
Expected: true
[  FAILED  ] os_linux.decoder_get_source_info_valid_vm (93 ms)
[ RUN      ] os_linux.decoder_get_source_info_valid_truncated_vm
/sapmnt/sapjvm_work/openjdk/nb/linuxppc64le/jdk-dev/test/hotspot/gtest/runtime/test_os_linux.cpp:454: Failure
Value of: Decoder::get_source_info(valid_function_pointer, buf, 7, &line)
  Actual: false
Expected: true

We also see Problems in runtime/ErrorHandling and in jfr/jvm/TestDumpOnCrash. Mostly, these tests now have much longer runtimes (about factor 2). With TestDumpOnCrash, both the error file writer and the test itself timeouted on some of our slower machines.

@tstuefe: Is this still an issue with the latest commit which added -gdwarf-4?

I'm currently running some testing with a merge of latest JDK which is looking good so far.

@tstuefe
Copy link
Member

tstuefe commented May 20, 2022

@tstuefe: Is this still an issue with the latest commit which added -gdwarf-4?

Issues seem to have disappeared. I'll schedule some more tests next week to be sure.

Cheers, Thomas

@chhagedorn
Copy link
Member Author

Thanks Thomas for doing the testing!

@tstuefe
Copy link
Member

tstuefe commented May 25, 2022

Thanks Thomas for doing the testing!

Hi Christian,

I can see no problems on ppcle attributable to your test. However, I may overlook something since tests are still failing all over the place because of loom.

I see test errors in TestDwarf on ppcle and x64, however. On x64:

----------System.out:(52/5198)----------
Command line: [/priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/sapjvm_19/bin/java -cp /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/jtreg_hotspot_tier1_work/JTwork/classes/runtime/ErrorHandling/TestDwarf.d:/priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/grmpf/testdata/jtreg/jtreg_test_19/test/hotspot/jtreg/runtime/ErrorHandling:/priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/jtreg_hotspot_tier1_work/JTwork/classes:/priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/grmpf/testdata/jtreg/jtreg_test_19/test/hotspot/jtreg:/priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/jtreg_hotspot_tier1_work/JTwork/classes/test/lib:/priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/grmpf/testdata/jtreg/jtreg_test_19/test/lib:/net/sapmnt.sapjvm_work/openjdk/tools/jtreg-6+2/lib/javatest.jar:/net/sapmnt.sapjvm_work/openjdk/tools/jtreg-6+2/lib/jtreg.jar -Xmx768m -Djava.awt.headless=true -Djava.util.prefs.userRoot=/priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/jtreg_hotspot_tier1_work/tmp -Djava.io.tmpdir=/priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/jtreg_hotspot_tier1_work/tmp -Dtest.getfreeport.max.tries=40 -ea -esa -Xlog:dwarf=info -XX:-CreateCoredumpOnCrash -Xcomp -XX:CICrashAt=1 --version ]
[2022-05-24T18:37:00.974121691Z] Gathering output for process 44490
[2022-05-24T18:37:02.872100892Z] Waiting for completion for process 44490
[2022-05-24T18:37:02.872338192Z] Waiting for completion finished for process 44490
Output and diagnostic info for process 44490 was saved into 'pid-44490-output.log'
[2022-05-24T18:37:02.889455876Z] Waiting for completion for process 44490
[2022-05-24T18:37:02.889604189Z] Waiting for completion finished for process 44490
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/c1_Compilation.cpp:616
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/sapmnt/sapjvm_work/openjdk/nb/linuxx86_64/jdk-dev/src/hotspot/share/c1/c1_Compilation.cpp:616), pid=44490, tid=44505
#  assert(CICrashAt < 0 || (uintx)_env->compile_id() != (uintx)CICrashAt) failed: just as planned
#
# JRE version: OpenJDK Runtime Environment (19.0) (fastdebug build 19-internal-adhoc.openjdk.jdk-dev)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 19-internal-adhoc.openjdk.jdk-dev, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x73ca34]  Compilation::~Compilation()+0x84
#
# CreateCoredumpOnCrash turned off, no core file dumped
#
# An error report file with more information is saved as:
# /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/jtreg_hotspot_tier1_work/JTwork/scratch/hs_err_pid44490.log
[1.835s][info][dwarf] Open DWARF file: /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/sapjvm_19/lib/server/libjvm.debuginfo
[1.842s][info][dwarf] Failed to parse the line number program header correctly.
[1.842s][info][dwarf] Failed to process the line number program correctly.
[1.842s][info][dwarf] Failed to retrieve file and line number information for /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/sapjvm_19/lib/server/libjvm.so at offset: 0x0074176a
[1.843s][info][dwarf] Failed to parse the line number program header correctly.
[1.843s][info][dwarf] Failed to process the line number program correctly.
[1.843s][info][dwarf] Failed to retrieve file and line number information for /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/sapjvm_19/lib/server/libjvm.so at offset: 0x00a05011
[1.845s][info][dwarf] Failed to parse the line number program header correctly.
[1.845s][info][dwarf] Failed to process the line number program correctly.
[1.845s][info][dwarf] Failed to retrieve file and line number information for /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/sapjvm_19/lib/server/libjvm.so at offset: 0x00a05b78
[1.849s][info][dwarf] Failed to parse the line number program header correctly.
[1.849s][info][dwarf] Failed to process the line number program correctly.
[1.849s][info][dwarf] Failed to retrieve file and line number information for /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/sapjvm_19/lib/server/libjvm.so at offset: 0x018d49d3
[1.852s][info][dwarf] Failed to parse the line number program header correctly.
[1.852s][info][dwarf] Failed to process the line number program correctly.
[1.852s][info][dwarf] Failed to retrieve file and line number information for /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/sapjvm_19/lib/server/libjvm.so at offset: 0x018de546
[1.855s][info][dwarf] Failed to parse the line number program header correctly.
[1.855s][info][dwarf] Failed to process the line number program correctly.
[1.855s][info][dwarf] Failed to retrieve file and line number information for /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/sapjvm_19/lib/server/libjvm.so at offset: 0x014d86e9
#
# Compiler replay data is saved as:
# /priv/jvmtests/output_openjdk19_dev_dbgU_linuxx86_64/jtreg_hotspot_tier1_work/JTwork/scratch/replay_pid44490.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

hs_err_file: hs_err_pid44490.log
----------System.err:(15/1242)----------
java.lang.RuntimeException: Could not find filename or line number in "V  [libjvm.so+0x74176a]  Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1aa": expected true, was false
	at jdk.test.lib.Asserts.fail(Asserts.java:594)
	at jdk.test.lib.Asserts.assertTrue(Asserts.java:486)
	at TestDwarf.runAndCheck(TestDwarf.java:163)
	at TestDwarf.test(TestDwarf.java:99)
	at TestDwarf.main(TestDwarf.java:90)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:1585)

@chhagedorn
Copy link
Member Author

After getting some more logs from @tstuefe, it turns out that they are using GCC 8 in the testing which emits the .debug_line section in DWARF 2 format which we do not allow and we bail out during parsing. This lets TestDwarf fail.

The .debug_line section is already handled differently by the parser compared to the other sections. GCC 10 emits DWARF 3 for some reason (even though it should emit DWARF 4 as for the other sections) while GCC 11 emits DWARF 4. Therefore, the parser can handle both DWARF 3 and 4 for .debug_line. Having a look at the DWARF 2 spec, it turns out that we can reuse the code for parsing DWARF 3. This allows us to update the version bailout of .debug_line to allow DWARF 2 as valid version. I've tested this change by building HotSpot with GCC 8.2 and playing around with it.

In this process, I've also changed TestDwarf.java to skip all its tests if there was a bailout of the parser due to an unsupported DWARF format. I've added some more error logging.

I've also decided to get rid of UL as discussed before and replaced it by tty logging in combination with a new TraceDwarfLevel develop flag (for the different log levels).

@openjdk-notifier
Copy link

@chhagedorn Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information.

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 3, 2022

@chhagedorn This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too.

We are a bit strapped for review- and test-cycles atm, therefore please go ahead and push it. This is a really useful change. If problems show up on some of our weird architectures or old Linux distros, we will deal with it then.

Cheers, Thomas

@chhagedorn
Copy link
Member Author

chhagedorn commented Aug 16, 2022

Thanks Thomas for your review and assessment! That sounds good. I will give it another spin in our testing. Now that UL was replaced by TraceDwarfLevel, I think there are no other outstanding issues to resolve in this PR. If there are no objections, we can go ahead and integrate this - maybe at the start of next week to give some more time for feedback.

Thanks,
Christian

@chhagedorn
Copy link
Member Author

Testing in our CI looked good. I've also tested it again locally on x86_32.

I will integrate this tomorrow morning.

@chhagedorn
Copy link
Member Author

Thanks everybody for your feedback, comments, and reviews!

/integrate

@openjdk
Copy link

openjdk bot commented Aug 24, 2022

Going to push as commit 13c0369.
Since your change was applied there have been 102 commits pushed to the master branch:

  • fe0544f: 8292790: Remove hashtable.hpp from other files
  • 926380d: 8292640: C2: Remove unused scratch register usages on x86
  • f3be673: 8292638: x86: Improve scratch register handling in VM stubs
  • d24b7b7: 8276651: java/lang/ProcessHandle tests fail with "RuntimeException: Input/output error" in java.lang.ProcessHandleImpl$Info.info0
  • 8a45abd: 8292654: G1 remembered set memory footprint regression after JDK-8286115
  • cc28783: 8292777: Remove hashtable.hpp from dependencies.hpp
  • cf00677: 8292368: [ppc64] internal error g1BarrierSet.inline.hpp assert(oopDesc::is_oop(pre_val, true)) failed: Error
  • d92e00a: 8292763: JDK-8292716 breaks configure without jtreg
  • 486f90c: 8292716: Configure should check that jtreg is of the required version
  • b4e1aa8: 8289658: Avoid redundant LinkedHashMap.get call in TagletManager.addNewSimpleCustomTag
  • ... and 92 more: https://git.openjdk.org/jdk/compare/d1edda8ff52e172a85d102d7d5062b9cc401beea...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 24, 2022
@openjdk openjdk bot closed this Aug 24, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 24, 2022
@openjdk
Copy link

openjdk bot commented Aug 24, 2022

@chhagedorn Pushed as commit 13c0369.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

9 participants