New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8178287: AsyncGetCallTrace fails to traverse valid Java stacks #4436
Conversation
|
Webrevs
|
Hi Ludovic, |
Whether and how a frame is setup is controlled by the code generator for the specific CodeBlock. The CodeBlock is then in the best place to know how to parse the sender's frame from the current frame in the given CodeBlock. This refactoring proposes to extract this parsing out of `frame` and into a `CodeBlock::FrameParser`. This FrameParser is then specialized in the relevant inherited children of CodeBlock. This change is to largely facilitate adding new supported cases for JDK-8252417 like runtime stubs.
When the signal sent for AsyncGetCallTrace or JFR would land on a stub (like arraycopy), it wouldn't be able to detect the sender (caller) frame because `_cb->frame_size() == 0`. Because we fully control how the prolog and epilog of stub code is generated, we know there are two cases: 1. A stack frame is allocated via macroAssembler->enter(), and consists in `push rbp; mov rsp, rbp;`. 2. No stack frames are allocated and rbp is left unchanged and rsp is decremented with the `call` instruction that push the return `pc` on the stack. For case 1., we can easily know the sender frame by simply looking at rbp, especially since we know that all stubs preserve the frame pointer (on x86 at least). For case 2., we end up returning the sender's sender, but that already gives us more information than what we have today. The results are as follows: - Baseline: Flat Profile (by method): (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] (t 0.5,s 0.2) Prof1::main (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] (t 0.0,s 0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift (t 0.0,s 0.0) java.lang.String::getBytes (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt (t 0.0,s 0.0) java.lang.StringBuilder::delete (t 0.2,s 0.0) java.lang.StringBuilder::append (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt - With StubRoutinesBlob::FrameParser Flat Profile (by method): (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete (t 99.8,s 0.2) Prof1::main (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append (t 98.8,s 0.0) java.lang.StringBuilder::append (t 0.9,s 0.0) java.lang.StringBuilder::delete The program is as follows: ``` public class Prof1 { public static void main(String[] args) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < 1000000; i++) { sb.append("ab"); sb.delete(0, 1); } System.out.println(sb.length()); } } ``` We now account for the arraycopy stub which is called by AbstractStringBuilder::ensureCapacityInternal. It was previously ignored because it would not know how to parse the frame for the arraycopy stub and would fall in the AGCT::Unknown Java[ERR=-5] section. However, it still isn't perfect since it doesn't point to the arraycopy stub directly.
When sampling hits the prolog of a method, Hotspot assumes it's unable to parse the frame. This change allows to parse such frame on x86 by specializing which instruction it's hitting in the prolog. The results are as follows: - Baseline: Flat Profile (by method): (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] (t 39.2,s 35.2) Prof2::main (t 1.4,s 1.4) Prof2::lambda$main$3 (t 1.0,s 1.0) Prof2::lambda$main$2 (t 0.9,s 0.9) Prof2::lambda$main$1 (t 0.7,s 0.7) Prof2::lambda$main$0 (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] (t 0.0,s 0.0) java.lang.Thread::exit (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get - With incomplete frame parsing: Flat Profile (by method): (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] (t 40.3,s 36.1) Prof2::main (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get (t 1.1,s 1.1) Prof2::lambda$main$2 (t 1.1,s 1.1) Prof2::lambda$main$0 (t 1.0,s 1.0) Prof2::lambda$main$1 (t 0.9,s 0.9) Prof2::lambda$main$3 (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] (t 0.0,s 0.0) java.util.Locale::getInstance (t 0.0,s 0.0) AGCT::Not walkable Java[ERR=-6] (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::loadClassOrNull (t 0.0,s 0.0) java.lang.ClassLoader::loadClass (t 0.0,s 0.0) sun.net.util.URLUtil::urlNoFragString (t 0.0,s 0.0) java.lang.Class::forName0 (t 0.0,s 0.0) java.util.Locale::initDefault (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::loadClass (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getLoader (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getResource (t 0.0,s 0.0) java.lang.String::toLowerCase (t 0.0,s 0.0) sun.launcher.LauncherHelper::loadMainClass (t 0.0,s 0.0) sun.launcher.LauncherHelper::checkAndLoadMain (t 0.0,s 0.0) java.util.Locale::<clinit> (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::findClassOnClassPathOrNull (t 0.0,s 0.0) jdk.internal.loader.ClassLoaders$AppClassLoader::loadClass (t 0.0,s 0.0) java.lang.Class::forName The program is as follows: ``` import java.util.function.Supplier; public class Prof2 { public static void main(String[] args) { var rand = new java.util.Random(0); Supplier[] suppliers = { () -> 0, () -> 1, () -> 2, () -> 3, }; long sum = 0; for (int i = 0; i >= 0; i++) { sum += (int)suppliers[i % suppliers.length].get(); } } } ``` We see that the results are particularely useful in this case as the methods are very short (it only returns an integer), and the probability of hitting the prolog is then very high.
The program is the following: ``` import java.util.function.Supplier; public class Prof2 { public static void main(String[] args) { var rand = new java.util.Random(0); Supplier[] suppliers = { () -> 0, () -> 1, () -> 2, () -> 3, }; long sum = 0; for (int i = 0; i >= 0; i++) { sum += (int)suppliers[i % suppliers.length].get(); } } } ``` The results are as follows: - Baseline (from previous commit): Flat Profile (by method): (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] (t 40.3,s 36.1) Prof2::main (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get (t 1.1,s 1.1) Prof2::lambda$main$2 (t 1.1,s 1.1) Prof2::lambda$main$0 (t 1.0,s 1.0) Prof2::lambda$main$1 (t 0.9,s 0.9) Prof2::lambda$main$3 (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] - With unwind from vtable stub Flat Profile (by method): (t 74.1,s 70.3) Prof2::main (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] (t 1.2,s 1.2) Prof2::lambda$main$2 (t 0.9,s 0.9) Prof2::lambda$main$3 (t 0.9,s 0.9) Prof2::lambda$main$1 (t 0.7,s 0.7) Prof2::lambda$main$0 (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] We attribute the vtable stub to the caller and not the callee, which is already an improvement from the existing case.
@sspitsyn thank you for the reminder. It's perfectly fine for this change to land in JDK 18. We'll see in the future if there is a demand to backport it to JDK 17 and we'll do accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a sanity check to make sure that the code was not functionally modified while moving around.
No problems found there.
/label add hotspot-jfr |
@luhenry |
@sspitsyn how could we take this change forward? Thank you |
Hi Ludovic, |
#include "interpreter/interpreter.hpp" | ||
#include "runtime/frame.hpp" | ||
|
||
bool CodeBlob::FrameParser::sender_frame(JavaThread *thread, bool check, address pc, intptr_t* sp, intptr_t* unextended_sp, intptr_t* fp, bool fp_safe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tivrfoa, thanks for making a comment in an OpenJDK project!
All comments and discussions in the OpenJDK Community must be made available under the OpenJDK Terms of Use. If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please Use "Add GitHub user tivrfoa for the summary.
If you are not an OpenJDK Author, Committer or Reviewer, simply check the box below to accept the OpenJDK Terms of Use for your comments.
- I agree to the OpenJDK Terms of Use for all comments I make in a project in the OpenJDK GitHub organization.
Your comment will be automatically restored once you have accepted the OpenJDK Terms of Use.
Hi Ludovic, Thank you for working on this long-standing bug. The issue is not just theoretical: when I ran JDK with this patch with async-profiler, I immediately got the following deadlock:
|
@apangin Thanks for pointing that out! I'm updating it right now and should be pushing an update very soon. I'll also add examples on how it impacts JFR. |
The need for the allocation would be it non async-safe. However, AsyncGetCallTrace is async-safe and thus can't allow for allocations.
@luhenry This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
Hi @sspitsyn! Will you be the one reviewing this? |
Hi Markus! I have a lack of expertize in this area and can't be the primary reviewer. |
Ok. Do you know who should be the primary reviewer? Rickard Bäckman? Nils Eliasson? |
@dcubed-ojdk , @dholmes-ora would you be willing/able to take a look at this change, pretty please? |
Sorry @jbachorik but frames and code-blobs are not an area I know. David |
@dholmes-ora, do you know anyone who could be the primary reviewer for this? |
Hi @luhenry,
For example, here are the results of profiling spring-petclinic application startup.
My biggest concern is the large number (>3%) of truncated stack traces. The probem is, such stack traces are indistinguishable (by the tools) from normal stack traces, and therefore the tools will display misleading information to a user, without even knowing that something went wrong. In this sense I'd prefer to return an error code rather than an invalid stack trace. The above table illustrates the problem in practice. With the proposed patch, async-profiler actually performs worse than without it, because async-profiler can recover from most AGCT failures on its own, but not when AGCT returns an invalid stack trace instead of an error code. In view of the aforementioned, I'll ask not to commit this patch before the problem of truncated stack traces is fixed. Thank you. |
Markus, |
Mailing list message from Ludovic Henry on hotspot-jfr-dev: Hello, As it's been quite a while already, I wanted to nudge this RFR forward. Who Thank you! On Wed, Aug 18, 2021 at 9:21 PM Marcus Hirt <hirt at openjdk.java.net> wrote: |
Mailing list message from Ludovic Henry on hotspot-jfr-dev: [Sending again now with the email address properly registered to the Hello, As it's been quite a while already, I wanted to nudge this RFR forward. Who Thank you! On Fri, Aug 20, 2021 at 8:49 AM Ludovic Henry <ludovic at datadoghq.com> wrote:
|
Mailing list message from David Holmes on hotspot-jfr-dev: On 24/08/2021 9:40 am, Marcus Hirt wrote:
I don't know sorry. David |
Closing it for now until we figure out all the raised points. |
Hi @cowwoc, thanks for making a comment in an OpenJDK project! All comments and discussions in the OpenJDK Community must be made available under the OpenJDK Terms of Use. If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please Use "Add GitHub user cowwoc for the summary. If you are not an OpenJDK Author, Committer or Reviewer, simply check the box below to accept the OpenJDK Terms of Use for your comments.
Your comment will be automatically restored once you have accepted the OpenJDK Terms of Use. |
When the signal sent for AsyncGetCallTrace or JFR would land on a runtime stub (like arraycopy), a vtable stub, or the prolog of a compiled method, it wouldn't be able to detect the sender (caller) frame for multiple reasons. This patch fixes these cases through adding CodeBlob-specific frame parser which are in the best position to know how a frame is setup.
The following examples have been profiled with honest-profiler which uses
AsyncGetCallTrace
.Prof1
StubRoutinesBlob::FrameParser
:Prof2
VtableBlob::FrameParser
andnmethod::FrameParser
:Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4436/head:pull/4436
$ git checkout pull/4436
Update a local copy of the PR:
$ git checkout pull/4436
$ git pull https://git.openjdk.java.net/jdk pull/4436/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4436
View PR using the GUI difftool:
$ git pr show -t 4436
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4436.diff