-
Notifications
You must be signed in to change notification settings - Fork 5.8k
JDK-8305416: runtime/Thread/TestAlwaysPreTouchStacks.java failed with "Did not find expected NMT output" #13295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
892a127
to
ce2a7f2
Compare
ce2a7f2
to
ae23ee2
Compare
@tstuefe - At this point, the failure has also been seen on linux-aarch64 |
@dcubed-ojdk Okay, I problemlisted the test to give us some time. My patch should work, but no need to put pressure on the review. Could you give me a quick thumbs up please for #13301 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the issue of daemon versus non-daemon as that only affects when the VM can exit, not how long individual threads "stick around" ???
You are right, that's badly phrased. The old version of this test fired up threads that printed something, then finished. The JVM waited for them to exit, then exited itself, printing the NMT report in the process. That report did not contain the thread stacks for the already exited test threads. The new test makes sure the threads never exit. They have to be daemonized in order to not hold up the JVM exit. Now, the NMT reported stack sized contain the test thread stack sizes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and nits. I'm a little concerned about reliability aspects of this test across a range of platforms. I will run some tests here.
You need to update the ProblemList too.
Thanks.
try { | ||
// report aliveness, then sleep until VM death | ||
gate.await(); | ||
for(;;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: space after for
for (int s: stackSizes) { | ||
threads.add(createTestThread(s)); | ||
// Add a bunch of large-stacked threads to make a significant imprint on combined thread stack size | ||
for (int i = 0; i < numThreads; i ++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no space before ++
@@ -81,12 +101,10 @@ public static void main(String[] args) throws InterruptedException, IOException | |||
|
|||
output.shouldHaveExitValue(0); | |||
|
|||
for (int s: stackSizes) { | |||
output.shouldContain("Alive: " + Integer.toString(s)); | |||
for (int i = 0; i < numThreads; i ++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no space before ++
// of their reserved size. Requiring them to be committed for over 3/4th shows that pretouch is | ||
// really working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many platforms have you tested this on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it on all our platforms and it passed okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually tested on Windows x64, Linux x64, x86, aarch64 and arm. Arm with the patch based on an older version because HEAD is broken on Arm. For the rest I relied on GHAs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it on all our platforms and it passed okay.
Thank you for that!
long max_reserved = memoryCeilingMB * 3 * MB; | ||
long min_reserved = memoryCeilingMB * MB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were these boundaries determined? What variance do you see in the reported value when the test runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were these boundaries determined?
There is a vagueness about reserved total stack size of course - it contains other thread's stack sizes too, and any overhead that the VM may add atop of user-specified thread stack sizes, mainly guard pages.
Min is the logical lower end and the actual number should be greater.
Max is chosen very generously to be much larger. Maybe too large, but I did not want to play whack-the-mole for false positives on platforms I have not tested manually (e.g. larger paged ones), while still excluding obviously insane cases.
On the platforms I tested on variance was not large at all; my 128MB reserved stack size for the test threads gives me:
linux x86: 133MB
linux aarch64 (4k pages): 154MB
linux x64: 144MB
(The difference between aarch64 and x64 is interesting but I did not dive deeper into that; I assume number of hotspot threads may differ for threads started based on number of cores, but my x64 machine is way beefier than my aarch64-box so the ratio should be reversed)..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all my concerns have been addressed. Thanks.
@tstuefe This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
Friendly ping |
1 similar comment
Friendly ping |
// report, so their stacks should dominate the NMT-reported total stack size. | ||
long max_reserved = memoryCeilingMB * 3 * MB; | ||
long min_reserved = memoryCeilingMB * MB; | ||
if (reserved >= max_reserved || reserved < min_reserved) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is reserved >= max_reserved
a concern? Would we really want/need to look into this if it did happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
If we start x Threads, with stack size y, I expect to see not much more than x*y+(very generous fudge factor owed to unknown threads running). If I see more than that, NMT may be showing the wrong number for thread stacks, or scale the number wrong, see #13665.
test/hotspot/jtreg/ProblemList.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to rewrite this as a gtest and allocate using VM os::alloc() once then check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean? This test tests that if I start a VM with -XX:+AlwaysPreTouchThreadStacks
the VM will touch all - or most - of the thread stacks upon thread start. It exploits that NMT actually probes the thread stacks for committed pages, and checks what NMT says about thread stacks.
I would not know how to do this in gtest, espectially since forking is involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean? This test tests that if I start a VM with
-XX:+AlwaysPreTouchThreadStacks
the VM will touch all - or most - of the thread stacks upon thread start. It exploits that NMT actually probes the thread stacks for committed pages, and checks what NMT says about thread stacks.I would not know how to do this in gtest, espectially since forking is involved.
Sorry, I saw AlwaysPreTouchThreadStacks
, but read AlwaysPreTouch
.
Looks good to me.
@tstuefe this pull request can not be integrated into git checkout JDK-8305416-runtime-Thread-TestAlwaysPreTouchStacks-fails
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
…aysPreTouchStacks-fails
/integrate |
Going to push as commit 8ac7186.
Your commit was automatically rebased without conflicts. |
Fix test. Several issues caused the test to fail intermittently (only on Windows for some reason).
The test starts a a child process with -XX:+AlwaysPreTouchStacks. In this child process, it creates n threads with given stack sizes, then expects them to come up successfully, then checks the final NMT output for the expected stack committed size (which should ideally be very close to the stack reserved size, in contrast to a VM started without -AlwaysPreTouchStacks).
The test had several errors:
The new, fixed test:
Tests: manually ran tests on Windows x64 and Linux x64.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13295/head:pull/13295
$ git checkout pull/13295
Update a local copy of the PR:
$ git checkout pull/13295
$ git pull https://git.openjdk.org/jdk.git pull/13295/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13295
View PR using the GUI difftool:
$ git pr show -t 13295
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13295.diff
Webrev
Link to Webrev Comment