8307958: Metaspace verification is slow causing extreme class unloading times#14084
8307958: Metaspace verification is slow causing extreme class unloading times#14084xmas92 wants to merge 4 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back aboldtch! A progress list of the required criteria for merging this PR into |
Webrevs
|
This reverts commit 5991e03.
Yes, that is much better than changing the test helper. |
coleenp
left a comment
There was a problem hiding this comment.
This seems fine. To be clear, was most of the time spent in the ChunkManager::get_chunk() and return_chunk(), and maybe purge() calls, and not much in the split and merge chunks? I agree with adding SOMETIMES to all for consistency, but I hope we're not doing more of these operations that I assume should be infrequent.
|
@xmas92 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 147 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
A regression test would be good, since Metaspace will evolve and so will verifications, and I would like to know what performance costs are acceptable. |
Maybe it is overly defensive to add it to the split and get_chunk paths. The observed slowdown was for |
Oracles internal testing will most likely provoke regressions in this area. Similarly how this issue was brought to light. To see the really degenerate behaviour, required constant loading and unloading of many class loaders with very few classes over hours with infrequent class unloading (due to generational ZGC not performing many major collections, low old heap pressure). While it is probably possible to reduce the variables and create a simpler test which solely stresses metaspace debug timings and little else, it seems like a rather complex task. Any product performance regression is caught by our normal performance testing and any relevant and degenerate debug regression would be caught by Oracle internal tests. Hopefully this is enough for now? |
Sure, if you guys are okay with defusing my future too-eager verifications :-) ? I'd be happy with something simple and pragmatic. It does not have to be automatic. Tell me a text X and a time Y it should not overstep on reasonable hardware. Can be very fuzzy. Then I can test this myself instead of causing work for you. |
Ok, thanks for the info. That makes sense. I thought there were more GC tests for class unloading but I can't find them. If you create one test with loading/unloading a class 100 times with the SOMETIMES verification on, I think that would run fast enough and exercise the code paths that won't be exercised regularly. You can put it in runtime/ClassUnload with some other tests. Or if you can find tests already in metaspace that do this, that would be sufficient. |
I'll look at creating a more fuzzy stress test which uses |
There's a ClassUnloadCommon.java class for help with writing class loading/unloading tests. |
|
Thanks for the review. I will integrate this now to improve the CI testing for tests that may stumble on this. |
|
Going to push as commit 8d8153e.
Your commit was automatically rebased without conflicts. |
From JBS:
The approach here is to resolve this by putting the verification behind the SOMETIMES macro. It is then possible to turn it of completely with
-XX:VerifyMetaspaceInterval=0while still benefiting from some stochastic verification.The goal here is to have the calls to Verify unchanged. Leaving all explicit gtest verify and Universe::verify calls unaffected.
The verification was also left unchanged when committing the memory.
All SOMETIMES uses the same frequency,
VerifyMetaspaceInterval. I looked at the possibility of adding weight to some of the call site, but it did not seem to be worth the complexity.Testing: GHA and Oracle CI (tier1-3 ongoing), various long running stress tests.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14084/head:pull/14084$ git checkout pull/14084Update a local copy of the PR:
$ git checkout pull/14084$ git pull https://git.openjdk.org/jdk.git pull/14084/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14084View PR using the GUI difftool:
$ git pr show -t 14084Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14084.diff
Webrev
Link to Webrev Comment