-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8373022: serviceability/sa/ClhsdbScanOops.java assumes no GC should occur #28637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8373022: serviceability/sa/ClhsdbScanOops.java assumes no GC should occur #28637
Conversation
|
👋 Welcome back jsikstro! A progress list of the required criteria for merging this PR into |
|
@jsikstro This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
|
Most of the SA tests probably have this requirement. What is it that made these two tests be noticed? Why was a small initial heap size being used? |
We've seen that these tests fail when changing the value of InitialRAMPercentage to 0, which results in a very small initial heap size. I'm about to add a few more tests which also needs to be run with a "non-small" initial heap size. |
|
It's probably just the timing of the GC that determines whether the initial small heap is a problem or not. If you want the SA tests to be reliable with something like InitialRAMPercentage=0, probably all of the tests should be updated. However, personally I don't think this type of fix should be necessary unless you feel testing in the manner is something we want to support. There are plenty of tests that start failing when non-standard command line options are used. |
| * @run build TestScaffold VMConnection TargetListener TargetAdapter | ||
| * @run compile -g MethodInvokeWithTraceOnTest.java | ||
| * @run driver MethodInvokeWithTraceOnTest | ||
| * @run driver MethodInvokeWithTraceOnTest -XX:InitialHeapSize=100M |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you seeing failures due to an ObjectCollectionException? If so, avoiding GC is not the proper way to fix it. Even with a larger initial heap, there can still be an object collected with ZGC. We shouldn't have any debugger tests that rely on a GC not happening in the debugee. The proper fix is usually calling ObjectReference.disableCollection(), although sometimes even that is not enough (the call can happen too late if the debugee is not suspended).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we're seeing a failure due to ObjectCollectionException. This is because the heap size is now much smaller, causing more frequent GCs. "Reverting" to a larger initial heap size is a straightforward fix so that we can continue running this test.
I think a more "robust" approach is better, which we could do in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me how to run this test with a small GC that reproduces the ObjectCollectionException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've only seen this on windows-x64-debug so far, it should fail on mainline right now, given it is removed from the ProblemList it's in. I haven't updated this PR to remove the ProblemList just yet, but will do.
If you want to be explicit you can run it with -XX:InitialRAMPercentage=0, but this is the new default in mainline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a JDI bug, not a test bug. I'll fix this with #28666.
FYI we just integrated a change that sets InitialRAMPercentage=0 for JDK 26 that we've been working on (see #28641). We've run up to Oracle's tier8 twice now, and apart from the tests that are included in this PR, we've not seen any other SA failures. Of course there might be other intermittent failures in the future, in which case I see two approaches moving forward: problem listing or bumping the initial heap size for the affected tests, or going over all SA tests and making sure that they all run with a "large" initial heap size (like 100MB). Unless we start seing many (for some definition of many) test failures from now, a pragmatic compromise is to selectively bump the initial heap size of such tests, like I do in this PR. Of course, the optimal approach would be to make any affected SA tests more robst to GC timings. But, since I'm not sure how much time we want to invest in improving SA tests, bumping the heap size is likely a good compromise here. |
For the most part the SA tests are fine if there is a GC. The way they usually work is to launch the debuggee, wait for it to reach a stable point (all threads idle), and then start to query the debuggee. If a GC happens before reaching stability, that should be fine, and after stability we wouldn't expect any GCs no matter what the heap size is. There are some SA tests that run on active processes where GCs can happen, but they are written to allow for errors. |
| // This test assumes that no GC should happen during its execution. | ||
| // Setting the initial heap size to a reasonably high number avoids | ||
| // running a GC. | ||
| theApp = LingeredApp.startApp(gc, "-XX:InitialHeapSize=100M"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test tries to scan a section of the java heap. When a GC happens, this section becomes empty, so the expected output is not present. Your fix seems reasonable.
|
Thank you @plummercj for analysing the tests. I've removed the explicit InitialHeapSize from test/jdk/com/sun/jdi/MethodInvokeWithTraceOnTest.java in favor of #28666. I've also removed test/hotspot/jtreg/serviceability/sa/ClhsdbScanOops.java from the ProblemList. The two sets of tests that are associated with this issue but not addressed in this PR are about to be solved in #28666 and #28655. I'm holding off on this PR until those changes are integrated. I've updated both the issue and this PR to reflect the new changes. |
|
After merging with master and considering the change in this PR, no more tests referencing this issue are problem listed. |
Hello,
If the initial heap size is set too low in serviceability/sa/ClhsdbScanOops.java, a GC migh run, which will interfere with the test and might cause it to fail.
The test is scanning the oops in a region of the heap, and after a GC that region appears to be empty, so the output that the test expects is not present. Running the test with a larger explicit InitialHeapSize gives enough headroom to not run a GC.
Testing:
-XX:InitialRAMPercentage=0(which is the new default). We now explicitly set-XX:InitialHeapSize=100M. I've rerun the test 10 times with Serial and Parallel for each test and they all pass.Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28637/head:pull/28637$ git checkout pull/28637Update a local copy of the PR:
$ git checkout pull/28637$ git pull https://git.openjdk.org/jdk.git pull/28637/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28637View PR using the GUI difftool:
$ git pr show -t 28637Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28637.diff
Using Webrev
Link to Webrev Comment