-
Notifications
You must be signed in to change notification settings - Fork 6.1k
JDK-8256844: Make NMT late-initializable #4874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
79b152b
to
56563b1
Compare
56563b1
to
b568d55
Compare
b568d55
to
42f2ca2
Compare
Webrevs
|
Mailing list message from David Holmes on hotspot-dev: Hi Thomas, On 26/07/2021 10:15 pm, Thomas Stuefe wrote:
Before looking at this, have you checked the startup performance impact? Thanks, |
Hi David, performance should not be a problem. The potentially costly thing is the underlying hashmap. But we keep it operating with a very small load factor. More details: Adding entries is O(1). Since during pre-init phase almost only adds happen, startup time is not affected. Still, to make sure this is true, I did a bunch of tests:
The expensive thing is lookup since we potentially need to walk a very full hashmap. Lookup affects post-init more than pre-init. To get an idea of the cost of a too-full preinit lookup table, I modified the VM to do a configurable number of pre-init test-allocations, with the intent of artificially inflating the lookup table. Then, after NMT initialization, I measured the cost of lookup. The short story, I was not able to measure anything, even with a million pre-init allocations. Of course, with more allocations lookup table got fuller and the VM got slower, but the time increase was caused by the cost of the malloc calls themselves, not the table lookup. Finally, I did an isolated test for the lookup table, testing pure adding and retrieval cost with artificial values. There, I could see costs for add were static (as expected), and lookup cost increased with table population. On my machine:
As you can see, if lookup table population goes beyond 1 mio entries, lookup time starts being noticeable over background noise. But with these numbers, I am not worried. Standard lookup population should be around 300-500, with very long command lines resulting in table populations of ~1000. We should never seen 10000 entries, let alone millions of them. Still, I added a jtreg test to verify the expected hash table population. To catch errors like an unforeseen mass of pre-init allocations (lets say a leak or badly written code sneaked in), or if the hash algorithm suddenly is not good anymore. Two more points
The latter I had implemented already but removed it again to keep complexity down, and because I saw no need.
|
Mailing list message from David Holmes on hotspot-dev: On 28/07/2021 12:17 am, Thomas Stuefe wrote:
Thanks Thomas! I appreciate the detailed investigation. Cheers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Thomas,
I had a look through this and it seems reasonable, but I'm not familiar enough with the details to approve at this stage.
A few nits below.
Thanks,
David
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting and it seems a better way to solve this problem. Where were you all those years ago?? I hope @zhengyu123 has a chance to review it.
Also interesting is that we were wondering how we could return malloc'd memory on JVM creation failure, and this might partially help with that larger problem.
@tstuefe This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 78 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thank you! I was here, but we were not yet doing much upstream :) To be fair, this problem got quite involved and did cost me some cycles and false starts. I fully understand that the first solution uses the environment variable approach. I spend some time investigating different ideas with this one; at first I did not use a hash-table but a static pre-allocated buffer from which I fed early allocations. But the code got too complex, and Kim's suggestion with the side table turned out just to be a lot simpler.
Yes, this would be trivial now, to return that memory. Though I am afraid it would be a small part only. But NMT may be instrumental in releasing all memory, since it knows everything. Only, its not always enabled. Is that a real-life problem? Are there cases where the launcher would want to live on if the JVM failed to load? |
I did not expect a quick review for this one, so thanks for looking at this! All your suggestions make sense, I'll incorporate them. ..Thomas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late review.
Did a quick scan and have a few questions, will do more detail reading later.
Thanks a lot, I appreciate your feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general.
Hi Reviewers, thanks for all the feedback! New version:
Thanks for your review work! ..Thomas |
There are a lot of other reasons why the launcher couldn't live on if the JVM fails to load. This was only one of them. We used to think about this problem once but don't really think it's solveable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Thanks for fixing the mysterious (to me) cast.
}; | ||
static TestAllocations g_test_allocations; // make this an automatic object to let ctor run during in C++ dynamic initialization. | ||
#endif // ASSERT | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good. I was hoping this didn't need to be an in-jvm test.
Thank you, Coleen! |
Nightlies at SAP showed no problems for several runs now. The failed GHA test (StringDeduplication) seems to have nothing to do with my patch. @zhengyu123 are you fine with the latest version of this patch? |
Still good. |
Thanks @coleenp and @zhengyu123 ! /integrate |
Going to push as commit eec64f5.
Your commit was automatically rebased without conflicts. |
Short: this patch makes NMT available in custom-launcher scenarios and during gtests. It simplifies NMT initialization. It adds a lot of NMT-specific testing, cleans them up and makes them sideeffect-free.
NMT continues to be an extremely useful tool for SAP to tackle memory problems in the JVM.
However, NMT is of limited use due to the following restrictions:
gtestlauncher
.JAVA_TOOL_OPTIONS
and-XX:Flags=<file>
.The reason for all this is that NMT initialization happens very early, on the first call to
os::malloc()
. And those calls happen already during dynamic C++ initialization - a long time before the VM gets around parsing arguments. So, regular VM argument parsing is too late to parse NMT arguments.The current solution is to pass NMT arguments via a specially prepared environment variable:
NMT_LEVEL_<PID>=<NMT arguments>
. That environment variable has to be set by the embedding launcher, before it loads the libjvm. Since its name contains the PID, we cannot even set that variable in the shell before starting the launcher.All that means that every launcher needs to especially parse and process the NMT arguments given at the command line (or via whatever method) and prepare the environment variable.
java
itself does this. This only works before the libjvm.so is loaded, before its dynamic C++ initialization. For that reason, it does not work if the launcher links statically against the hotspot, since in that case C++ initialization of the launcher and hotspot are folded into one phase with no possibility of executing code beforehand.And since it bypasses argument handling in the VM, it bypasses a number of argument processing ways, e.g.,
JAVA_TOOL_OPTIONS
.This patch fixes these shortcomings by making NMT late-initializable: it can now be initialized after normal VM argument parsing, like all other parts of the VM. This greatly simplifies NMT initialization and makes it work automagically for every third party launcher, as well as within our gtests.
The glaring problem with late-initializing NMT is the NMT malloc headers. If we rule out just always having them (unacceptable in terms of memory overhead), there is no safe way to determine, in os::free(), if an allocation came from before or after NMT initialization ran, and therefore what to do with its malloc headers. For a more extensive explanation, please see the comment block
nmtPreInit.hpp
and the discussion with @kimbarrett and @zhengyu123 in the JBS comment section.The heart of this patch is a new way to track early, pre-NMT-init allocations. These are tracked via a lookup table. This was a suggestion by Kim and it worked out well.
Changes in detail:
pre-NMT-init handling:
nmtPreInit.hpp/cpp
take case of NMT pre-init handling. They contain a small global lookup table managing C-heap blocks allocated in the pre-NMT-init phase.os::malloc()/os::realloc()/os::free()
defer to this code before doing anything else.nmtPreinit.hpp
explaining the details.Changes to NMT:
initialize()
andlate_initialize()
. Those were merged into one and simplified - there is only one initialization now which happens after argument parsing.NMT_TrackingLevel
enum - to simplify code, I changed NMT_unknown to be numerically 0. A new comment block innmtCommon.hpp
now clearly specifies what's what, including allowed level state transitions.NMTUtil
Gtests:
gtest/nmt/test_nmtpreinitmap.cpp
.jtreg:
runtime/NMT/NMTInitializationTest.java
, testing NMT initialization in the face of many many VM arguments.Tests:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4874/head:pull/4874
$ git checkout pull/4874
Update a local copy of the PR:
$ git checkout pull/4874
$ git pull https://git.openjdk.java.net/jdk pull/4874/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 4874
View PR using the GUI difftool:
$ git pr show -t 4874
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4874.diff