Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8274986: max code printed in hs-err logs should be configurable #5875

Closed
wants to merge 6 commits into from

Conversation

dougxc
Copy link
Member

@dougxc dougxc commented Oct 8, 2021

This PR adds a ErrorLogPrintCodeLimit (develop) option for configuring the amount of code printed in a hs-err log file. There's a hard limit of 10 so that the buffer used to avoid duplicates in VMError::print_code is stack allocated.
In addition, the Java stack is also scanned when considering code to print as the native stack may not have any Java compiled frames. For example, a transition into the VM through a RuntimeStub can prevent the native stack walk from seeing the frames above the stub.
The MachCodeFramesInErrorFile test has been made more robust in terms of validating its expectations of how C2 intrinsifies methods.

There's one other minor change to address this comment.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8274986: max code printed in hs-err logs should be configurable

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5875/head:pull/5875
$ git checkout pull/5875

Update a local copy of the PR:
$ git checkout pull/5875
$ git pull https://git.openjdk.java.net/jdk pull/5875/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5875

View PR using the GUI difftool:
$ git pr show -t 5875

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5875.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 8, 2021

👋 Welcome back dnsimon! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 8, 2021

@dougxc The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Oct 8, 2021
@dougxc dougxc force-pushed the JDK-8274986 branch 4 times, most recently from b69792c to a72e8bc Compare Oct 10, 2021
@@ -1336,6 +1336,10 @@ const intx ObjectAlignmentInBytes = 8;
develop(intx, StackPrintLimit, 100, \
"number of stack frames to print in VM-level stack dump") \
\
develop(int, ErrorLogPrintCodeLimit, 3, \
Copy link
Contributor

@tkrodriguez tkrodriguez Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be at least diagnostic? develop flags are fairly useless.

Copy link
Member Author

@dougxc dougxc Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose develop to be consistent with StackPrintLimit.
What's the process for adding a diagnostic flag @dholmes-ora ? Should we change both StackPrintLimit and ErrorLogPrintCodeLimit to be diagnostic? I agree with Tom that having these as develop flags is of limited use.

Copy link
Member

@dholmes-ora dholmes-ora Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the history of StackPrintLimit is but there doesn't seem to be any call to have it be a non-develop flag. I'm assuming 100 is plenty and noone ever needs to change it.

For the new flag, develop is pretty useless if the issue is that the default value may not show enough information. Making it diagnostic seems reasonable. If people hit an issue and the default value doesn't show enough information again then they can change the value and hopefully diagnose the problem further.

David

Copy link
Member Author

@dougxc dougxc Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's just make ErrorLogPrintCodeLimit be diagnostic. Is there more process for this apart from simply making the change in this PR?

Copy link
Member

@dholmes-ora dholmes-ora Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the change in the PR is needed - no CSR request for diagnostic flags.

nm->print_nmethod(verbose);
if (st == tty) {
nm->print_nmethod(verbose);
}
Copy link
Contributor

@tkrodriguez tkrodriguez Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be something like this:

     ResourceMark rm;
     st->print(INTPTR_FORMAT " is at entry_point+%d in (nmethod*)" INTPTR_FORMAT,
               p2i(addr), (int)(addr - nm->entry_point()), p2i(nm));
-    if (verbose) {
-      st->print(" for ");
-      nm->method()->print_value_on(st);
-    }
     st->cr();
-    nm->print_nmethod(verbose);
+    if (verbose && st == tty) {
+      nm->print_nmethod(verbose);
+    } else {
+      nm->print_on(st);
+    }
     return;
   }
   st->print_cr(INTPTR_FORMAT " is at code_begin+%d in ", p2i(addr), (int)(addr - code_begin()));

Copy link
Member Author

@dougxc dougxc Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. As far as I can see, the intent is to have a single line of output for an nmethod followed by an optional " for ..." suffix when verbose == true.
The nm->print_nmethod(verbose) call then prints extra multi-line detail for the nmethod with the number of lines printed governed by verbose.
This code seems like it went from being hard coded to always go to tty and was then parameterized to go to an arbitrary stream but the evolution accidentally overlooked some code that still goes to tty.
I don't want to make extensive changes here as there really should be a single effort to rationalize all dumping to ensure it's parameterized.

Copy link
Contributor

@tkrodriguez tkrodriguez Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree you might not want to bite this off in this PR, but this piece of code is reason you commonly see random nmethods appearing on the tty just before a crash. verbose is only ever true when called from findpc in debug.cpp. All the other non-verbose work print_nmethod does is useless, like writing to the xtty, and otherwise boils down to nm->print_on(tty). But all these printing paths could use a rework so I'm fine if you skip it.

Copy link
Member Author

@dougxc dougxc Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you convinced me in a side conversation, your suggestion has the advantage that we now at least get this output in a hs_err file instead of loosing it altogether.

@dougxc dougxc marked this pull request as ready for review Oct 11, 2021
@openjdk openjdk bot added the rfr label Oct 11, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 11, 2021

Webrevs

@openjdk
Copy link

@openjdk openjdk bot commented Oct 11, 2021

@dougxc This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8274986: max code printed in hs-err logs should be configurable

Reviewed-by: never, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • cfe7471: 8177814: jdk/editpad is not in jdk TEST.groups

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Oct 11, 2021
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Doug,

A couple of minor comments/queries but otherwise this seems okay.

Thanks,
David

@@ -34,6 +34,7 @@
#include "gc/shared/referenceProcessor.hpp"
#include "oops/markWord.hpp"
#include "runtime/task.hpp"
#include "utilities/vmError.hpp"
Copy link
Member

@dholmes-ora dholmes-ora Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Member Author

@dougxc dougxc Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the definition of VMError::max_error_log_print_code to be available just like task.hpp is included to make PeriodicTask::min_interval (used in defining the range of PerfDataSamplingInterval) available.

Copy link
Member

@dholmes-ora dholmes-ora Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing there is some macro weirdness involved when defining the flag in globals.hpp.

// Even though ErrorLogPrintCodeLimit is ranged checked
// during argument parsing, there's no way to prevent it
// being set to a value outside the range.
Copy link
Member

@dholmes-ora dholmes-ora Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean here. Do we not abort the VM if the flag value is out-of-range?

Copy link
Member Author

@dougxc dougxc Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrorLogPrintCodeLimit is a writable global variable so this is just being extra defensive should anything update ErrorLogPrintCodeLimit (e.g. ErrorLogPrintCodeLimit == 100;) after argument parsing.

Copy link
Member Author

@dougxc dougxc Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified the comment: 29060fc

if (limit > 0) {
// Scan the native stack
if (!_print_native_stack_used) {
// Only try print code of the crashing frame since
Copy link
Member

@dholmes-ora dholmes-ora Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing typo: s/try/try to/

Copy link
Member Author

@dougxc dougxc Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@dougxc
Copy link
Member Author

@dougxc dougxc commented Oct 12, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Oct 12, 2021

Going to push as commit 33050f8.
Since your change was applied there have been 3 commits pushed to the master branch:

  • 8de2636: 8274615: Support relaxed atomic add for linux-aarch64
  • 7d2633f: 8275002: Remove unused AbstractStringBuilder.MAX_ARRAY_SIZE
  • cfe7471: 8177814: jdk/editpad is not in jdk TEST.groups

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Oct 12, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Oct 12, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Oct 12, 2021

@dougxc Pushed as commit 33050f8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

// Even though ErrorLogPrintCodeLimit is ranged checked
// during argument parsing, there's no way to prevent it
// subsequently (i.e., after parsing) being set to a
// value outside the range.
Copy link
Member

@dholmes-ora dholmes-ora Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overly defensive IMO. Flags should never be touched after initialization is complete and we assume we can trust ourselves.

Copy link
Member Author

@dougxc dougxc Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree but in error reporting code I get extra defensive plus the defensive code is small and not on a hot path.
That said, I won't object to it being undone in a subsequent PR.

Copy link
Member

@tstuefe tstuefe Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the point of MIN2 here is to handle ErrorLogPrintCodeLimit < max. IMHO ErrorLogPrintCodeLimit > max would be just an assert-worthy error, since as David wrote flag values should not be changed after initialization.

Copy link
Member Author

@dougxc dougxc Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is to ensure that we don't run off the end of the stack allocated printed array in the (granted, unlikely) case that ErrorLogPrintCodeLimit is (accidentally) updated after arg parsing .
I'm not sure an assert is the best thing as it would cause error reporting to recurse.
Maybe I was being too defensive but I figured the overhead is negligible so why not be ultra-safe.

Copy link
Member

@tstuefe tstuefe Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursive asserts would be caught by secondary error handling and show up as "Error occurred during error reporting" printout. Not ideal, but at least won't endanger the rest of the printing.

Copy link
Member Author

@dougxc dougxc Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but in a production scenario (which is where robust error reporting is critical), the assert is ignored and we end up with potential buffer overflow.

Copy link
Member

@tstuefe tstuefe Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a guarantee then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot integrated
4 participants