Skip to content

8335167: Test runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved #20531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

afshin-zafari
Copy link
Contributor

@afshin-zafari afshin-zafari commented Aug 9, 2024

The test program runs once with PreTouch and another without PreTouch stacks. For each run the amounts of reserved and committed stack regions are held and then compared the ratio of with and without PreTouch. The ratio of PreTouch test should be greater than the other (because there will be higher committed regions due to pre-touching the pages).

In other words, the old version of this test ran two tests: 1) preTouchTest and 2) noPreTouchTest and extracted the amount of reserved and committed memory for stack. The ratio of the committed to reserved for preTouchTest ($ratio = \frac{committed}{reserved}$) is expected to be high (>75%) and for the noPreTouchTest is expected to be small (<50%). These expected amounts are not robust for all cases and resulted in 8335167.

In this PR, only one test is run with two sets of options for preTouch and noPreTouch stack memory. The amount of reserved and committed are stored after each run and the ratio of two runs are compared. It is expected that the preTouch has a greater ratio than the noPreTouch (preTouch.ratio > noPreTouch.ratio).

Tests

The specific test is run on linux-x64, windows-x64 and macosx-aarch64 in tiers1-5 (100+ times).


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8335167: Test runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20531/head:pull/20531
$ git checkout pull/20531

Update a local copy of the PR:
$ git checkout pull/20531
$ git pull https://git.openjdk.org/jdk.git pull/20531/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20531

View PR using the GUI difftool:
$ git pr show -t 20531

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20531.diff

Webrev

Link to Webrev Comment

…h Expected a higher ratio between stack committed and reserved
…h Expected a higher ratio between stack committed and reserved
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 9, 2024

👋 Welcome back azafari! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 9, 2024

@afshin-zafari This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8335167: Test runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved

Reviewed-by: stuefe, dholmes, gziemski

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 271 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 9, 2024
@openjdk
Copy link

openjdk bot commented Aug 9, 2024

@afshin-zafari The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Aug 9, 2024
@mlbridge
Copy link

mlbridge bot commented Aug 9, 2024

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't discern what the core changes are here such that the now passes more reliably. Why did you change from distinct @test cases to running the test in each mode within a single @test? The refactoring makes it very difficult to see what has actually changed and you haven't described it.

Also what testing did you do for this? What was the old failure rate?

Thanks

Collections.addAll(vmArgs, "TestAlwaysPreTouchStacks", "test");
ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder(vmArgs);
OutputAnalyzer output = new OutputAnalyzer(pb.start());
output.reportDiagnosticSummary();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing, but this should go at the end, otherwise if a test fails you will get two summaries printed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reportDiagnosticSummary should be called to get the outputs to stdout and then look for some specific patterns inside the stdout of the OutputAnalyzer object. The call is not for showing the error case of the tests. The calls to shouldContain , shouldNotContain and more, check the expected pattern to be present in the output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of the checks fail they will automatically call reportDiagnosticSuummary so you only need to call it directly if you want the output in the log when the test passes. Hence it should go at the end.

@afshin-zafari
Copy link
Contributor Author

I can't discern what the core changes are here such that the now passes more reliably. Why did you change from distinct @test cases to running the test in each mode within a single @test? The refactoring makes it very difficult to see what has actually changed and you haven't described it.

Also what testing did you do for this? What was the old failure rate?

Thanks

I have updated the PR description with more details to make the change more clear.
This single test has run on all main platforms in repeated (100+) times on mach5.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. If @tstuefe is okay with this proposed change in the logic then I am okay with it.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 10, 2024
@gerard-ziemski
Copy link

Trying to review the changes here, after the cleanup that you did, is tricky, but it helped to view the files side by side in an IDE, and scroll to compare the relevant sections.

I think I am following your fix and see that you are doing exactly what you describe.

The failure occurred only on macosx-aarch64, but the simplification affects all the platforms, though I guess we could expect the same failure eventually to show up on other platforms.

I think the fix is good.

Thank you!

@gerard-ziemski
Copy link

Can you please re-run the checks to make sure the failure on macosx was a temp hiccup?

Copy link

@gerard-ziemski gerard-ziemski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 12, 2024
@@ -96,7 +96,6 @@ private static ReservedCommitted runPreTouchTest(boolean preTouch) throws Except
Collections.addAll(vmArgs, "TestAlwaysPreTouchStacks", "test");
ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder(vmArgs);
OutputAnalyzer output = new OutputAnalyzer(pb.start());
output.reportDiagnosticSummary();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC by moving this into the exception throwing paths, it means that if main detects a failure and throws, there will not be any output reported. All you needed to do with this was move it after line 118:

   output.shouldMatch("- *Thread.*reserved.*committed");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the misunderstanding and thank you for the precise correction.
Now it is done.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Thanks

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 13, 2024
Copy link

@gerard-ziemski gerard-ziemski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afshin-zafari This is good, and a lot more predictable.

But I would have liked a stronger check.

Lets see if I understand this. NMT tells us the total amount of all thread stacks (test threads and other threads) as "Thread Stacks Reserved" and the touched portions of these threads as "Thread Stacks Committed" (mislabeled as committed, but that is a problem for another day). Right so far?

We create test threads with stacks with a total of 128MB, each thread is 8MB. So, 16 test threads.

We (a) don't pretouch and (b) pretouch. (a) should use very little memory, otherwise something is wrong. (b) should use most of the stack, otherwise the Pretouch does not work.

A very safe bet is that the untouched stack (a) uses less than 1MB per stack. After all, the test threads don't do much and don't build up a deep stack. 1MB is very generous, it should be a lot less - if we were actually using 1MB, we would be in trouble.

For (b), if pretouching stacks works, I would expect at least ~6MB of the 8MB of the thread stacks pretouched. I would expect more, actually, but lets go with 6MB.

16 Threads. 16 * (6MB-1MB) = 80MB. So, I would expect an 80MB delta between no-pretouch and pretouch. If we don't see that, I would be curious as to why, because then pretouch may not work as expected.

@afshin-zafari
Copy link
Contributor Author

@tstuefe, two cases added. Are they those you wanted?

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 18, 2024
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @afshin-zafari , sorry for the sluggish response. Comments inline.

Comment on lines 168 to 179
if (pretouch_result.reserved == 0 || no_pretouch_result.reserved == 0) {
throw new RuntimeException("Could not run with PreTouch flag.");
}
double ratio_with = ((double)pretouch_result.committed) / pretouch_result.reserved;
double ratio_without = ((double)no_pretouch_result.committed) / no_pretouch_result.reserved;
System.out.println("ratio with PreTouch: " + ratio_with + " w/out: " + ratio_without);
if (ratio_without > 0.50) {
throw new RuntimeException("Expected a lower ratio between stack committed and reserved.");
}
if (ratio_with < ratio_without) {
throw new RuntimeException("Expected a higher ratio between stack committed and reserved.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simplify. You improved the test, so we don't really need to look at "reserved" save for some basic sanity tests. The reserved-committed ratio is of no interest save to ensure that committed <= reserved (and we do this in other NMT tests).

For both runs, "reserved" should be near identical: It is the combined stack size for all threads running at VM stop. All test threads should still be alive, so the only thing that can vary randomly is whether or not other threads managed to finish before the NMT report is printed. So, I think, you can simplify to just test that "Reserved", in both runs, is >= combined size of test thread stacks (128MB).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't save the Reserved and just look at it for each run independently, we get back to the original test which has problems in macos-aarch64 environments. We decided/agreed to have two tests and save the amounts and then compare them to check if pretouch was done or not.
Maybe I misunderstood you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afshin-zafari I just mean that since you compare the committed values of two runs, one with and one without pretouch, you don't really need to compare the reserved values or calculate the reserved to committed ratio. In fact, in both cases, reserved should be near identical.

All you really need to do is to make sure that committed (with pretouch) is > (committed (without pretouch) + expected delta). So, committed with pretouch should be at least 80MB larger than committed without pretouch.

Am I making a thinking error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusion. It is clear to me now. The changes are pushed.

throw new RuntimeException("Expected a higher ratio between stack committed and reserved.");
}
double estimated_stack_usage = 1 * MB;
double expected_stack_use_with_pretouch = 0.75 * threadStackSizeMB * MB;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you hard-code both values up where the other constants are?

I also would name them clearly for what they are. "estimated_stack_usage" sounds wrong, since we don't really estimate that, the real number should be far lower (for no pretouch) and far higher (for pretouch).

Proposal: max_stack_usage_with_pretouch and min_stack_usage_with_pretouch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
double expected_delta = numThreads * (expected_stack_use_with_pretouch - estimated_stack_usage);
if ((pretouch_result.committed - no_pretouch_result.committed) < expected_delta) {
throw new RuntimeException("Expected a higher delta between stack committed of with and without pretouch.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you print out the expected values, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -82,6 +72,76 @@ static private final Thread createTestThread(int num) {
t.setDaemon(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are here, could you add a comment here?

Suggested change
t.setDaemon(true);
// Let test threads run as daemons to ensure that they are still running and
// that their stacks are still allocated when the JVM shuts down and the final
// NMT report is printed.
t.setDaemon(true);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@afshin-zafari afshin-zafari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understood you correctly, in not saving the Reserved amount.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 23, 2024
@afshin-zafari
Copy link
Contributor Author

Thanks for reviews @dholmes-ora, @tstuefe and @gerard-ziemski.
The PR description is changed according to modified approach( not using reserved amount, and only use the committed).
The test passes 100+ times in linux-x64, macosx-aarch64 and windows-x64. (GHA failure is unrelated)

/integrate

@openjdk
Copy link

openjdk bot commented Sep 24, 2024

Going to push as commit e60e882.
Since your change was applied there have been 291 commits pushed to the master branch:

  • 9176f68: 8340623: Remove outdated PROCESSOR_ARCHITECTURE_IA64 from Windows coding
  • 88801ca: 8340590: RISC-V: C2: Small improvement to vector gather load and scatter store
  • 1dd60b6: 8323688: C2: Fix UB of jlong overflow in PhaseIdealLoop::is_counted_loop()
  • 4098acc: 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages
  • 6c91a16: 8340367: Opensource few AWT image tests
  • 865d99f: 8340596: Remove dead code from RequiresSetenv function in java.base/unix/native/libjli/java_md.c
  • 3411f9d: 8339995: Open source several AWT focus tests - series 6
  • 40cde00: 8340114: Remove outdated SelectVersion() function from the launcher and update the code comments explaining the code flow
  • c8ae848: 8340707: ProblemList applications/ctw/modules/java_base.java due to JDK-8340683
  • cd796e0: 8338918: Remove non translated file name from WinResources resource bundle
  • ... and 281 more: https://git.openjdk.org/jdk/compare/9d7d85a6aa20ed95166f5f2f951597bca1fde841...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 24, 2024
@openjdk openjdk bot closed this Sep 24, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 24, 2024
@openjdk
Copy link

openjdk bot commented Sep 24, 2024

@afshin-zafari Pushed as commit e60e882.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@afshin-zafari afshin-zafari deleted the _8335167_higher_ratio_commit branch June 13, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants