-
Notifications
You must be signed in to change notification settings - Fork 5.8k
JDK-8293313: NMT: Rework MallocLimit #11371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
486e0a4
to
2899eef
Compare
/label hotspot |
@tstuefe |
@tstuefe |
2899eef
to
3d6f477
Compare
3d6f477
to
a3622a3
Compare
/issue JDK-8293292 |
@tstuefe |
@tstuefe This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
Just FYI I'm taking a look at this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I've added some nits for style and a couple of comments concerning style which aren't nits. Logic wise, easy to follow and straight forward code, I could find nothing to complain about.
Thanks @jdksjolen, I massaged in your feedback. Cheers, Thomas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing, just wanted to share this potential issue right away...
src/hotspot/share/runtime/os.cpp
Outdated
const size_t old_size = MallocTracker::malloc_header(memblock)->size(); | ||
|
||
// Observe MallocLimit | ||
if (size > old_size && MemTracker::check_exceeds_limit(size - old_size, memflags)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it a bit easier to parse by rewriting it as:
if ((size > old_size) && MemTracker::check_exceeds_limit(size - old_size, memflags)) {
To me personally that is easier to read.
I see a bigger issue here, however. In the worst case scenario, where the os is unable to expand the memory chunk in place, it will need to allocate size
bytes in a new region, so the total potential resources from the os point of view is size+old_size
bytes for that particular allocation. Shouldn't we assume this worst case and test for that? I.e.
if ((size > old_size) && MemTracker::check_exceeds_limit(size, memflags)) {
We would need a comment here explaining this choice if we make this change.
I'd rather get some false positives that miss a single true positive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it a bit easier to parse by rewriting it as:
if ((size > old_size) && MemTracker::check_exceeds_limit(size - old_size, memflags)) {
To me personally that is easier to read.
Sure
I see a bigger issue here, however. In the worst case scenario, where the os is unable to expand the memory chunk in place, it will need to allocate
size
bytes in a new region, so the total potential resources from the os point of view issize+old_size
bytes for that particular allocation. Shouldn't we assume this worst case and test for that? I.e.
if ((size > old_size) && MemTracker::check_exceeds_limit(size, memflags)) {
We would need a comment here explaining this choice if we make this change.
I'd rather get some false positives that miss a single true positive...
Nah, no need to overthink this.
If you try to predict how much memory your malloc causes your process to allocate, it gets very fuzzy very quickly:
Below are at least two allocating layers (libc and the kernel's vm manager). Both of them balance memory and CPU footprint and allocation speed. No libc or kernel is the same, too. Each layer below you may return cached or partly cached memory and apply an often sizable overhead to what you are allocating. So our mallocs and frees have only a very indirect effect on RSS. And we don't even see all mallocs here, just the ones from libjvm.
Therefore it is best to limit the meaning of MallocLimit to "limit to how much hotspot is allowed to allocate". That is also the most useful. For limiting the memory of a process, different os-side tools exist (cgroup limits, ulimit, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire point of this effort is to catch elusive OOM errors. You even moved the check for the memory limits before we actually allocate the memory, from the current way where we check the limit only after we acquire new memory.
Isn't assuming the worst case scenario, where realloc(), might allocate new memory first, in the same vein as that and we are just being cautious, to catch those hard to reproduce bugs?
Basically we are saying that if the OS had to allocate new memory on top of the old one and doing so would exceed the limit we would like to know about it. Will it happen often? No? Will it happen during the lifetime of a memory intensive process? Most likely yes.
These cases might not occur very often, granted, but why not catch them if we can?
Aren't there any platforms where we run Java where realloc() is implemented in terms of malloc() and memcpy()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire point of this effort is to catch elusive OOM errors. You even moved the check for the memory limits before we actually allocate the memory, from the current way where we check the limit only after we acquire new memory.
Isn't assuming the worst case scenario, where realloc(), might allocate new memory first, in the same vein as that and we are just being cautious, to catch those hard to reproduce bugs?
Basically we are saying that if the OS had to allocate new memory on top of the old one and doing so would exceed the limit we would like to know about it. Will it happen often? No? Will it happen during the lifetime of a memory intensive process? Most likely yes.
These cases might not occur very often, granted, but why not catch them if we can?
Aren't there any platforms where we run Java where realloc() is implemented in terms of malloc() and memcpy()?
I think its just wrong. False positives are not helpful.
-
its semantically wrong. The malloc limit shall triggers when our global or category-specific malloc load exceeds a given threshold. Be it either the malloc that just happened (old version) or the malloc that we are about to do (new version). But I don't want the mechanism to second-guess what I'm telling it and proactively trigger alerts. Example: I set 10m as limit and have a 5MB buffer. Expanding this buffer to 5.5 MB should not trigger any alert. I don't want false positives, they are not helpful.
-
You try to predict an OOM kill caused by a malloc event, right? For that to work the prediction must be at least somewhat correct. But you only have very limited knowledge. Some examples:
- The allocation request is fulfilled from cached memory. No new memory needed at all. The smaller the allocation the more likely this is, and if the libc uses sbrk exclusively, this is very likely.
- The libc always has overhead per allocation. E.g. glibc, at the very least, ~40ish bytes. Should I account for this? With many fine-grained allocations this can have an enourmous impact. But overhead depends on a number of factors, and every libc is different.
- The allocation request is large and the libc decides to mmap it (not all do). So we allocate in page granularity. Should I now adjust my prediction for page size? What page size? The libc may still reuse already mapped memory though. Or the underlying kernel vm may do that.
- The realloc is done by a new thread, and the libc uses thread-specific memory pools. A new memory pool is created. Is this memory committed immediately (e.g. older glibcs) or committed on demand? Depending on the answer, we may see a large bump in RSS or none at all. Depends on libc and libc version.
- The realloc is large and cannot be enlarged in place. But I could think of several ways of enlarging such an allocation out-of-place without needing memory * 2. If even I can think of those, libc devs probably do too.
So, these examples show that an allocation may not move the needle at all or may move the needle much farther than you think. Trying to guess this just makes the code complex and causes false positives that are difficult to argue about. All without having any real benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the time to explain.
I'm worrying about those hard to pin mysterious issues that we have plenty of that possibly could be related to memory pressure, but I understand your point of view and will defer to you (even if there is a 5% of this that continues to bother me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. This is not the tool to solve all that, but it can help.
There are other tools. At SAP, we added something called "High Memory Reports", the ability to automatically generate customizable reports when certain memory thresholds are reached [1]. In addition to that, we have our SapMachine Vitals [2], which are awesome - basically like a little inbuilt pidstat but with a lot more information. For instance, it tells you the history of c-heap allocations (both from the JVM view and from the libc's view).
Would love to see this in OpenJDK, but have zero hopes of doing so - tried to get the Vitals in years ago but hit a wall of resistance from JFR people.
[1] https://github.com/SAP/SapMachine/wiki/SapMachine-High-Memory-Reports
[2] https://github.com/SAP/SapMachine/wiki/SapMachine-Vitals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made more progress on the review today, tomorrow I have the parser to review and java tests...
"If non-zero, maximum number of words that malloc/realloc can " \ | ||
"allocate (for testing only)") \ | ||
range(0, max_uintx) \ | ||
\ | ||
product(ccstr, MallocLimit, nullptr, DIAGNOSTIC, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing, but I wish this flag was named NMTMallocLimit
, not MallocLimit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer MallocLimit, short and snappy. And the fact that NMT is supervising the limit is an implementation detail. It used to be different with MallocMaxTestWords and may be different in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, you named the test file test/hotspot/gtest/nmt/test_nmt_malloclimit.cpp
, not test/hotspot/gtest/nmt/test_malloclimit.cpp
:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not perfect :-) honestly, I just like short, easy to remember and to talk about command line names. I dislike the JVM tradition of long camel-cased names a bit.
#include "utilities/globalDefinitions.hpp" | ||
#include "utilities/vmError.hpp" | ||
|
||
static inline bool suppress_limit_handling() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we bother to wrap VMError::is_error_reported()
into suppress_limit_handling()
?
Are you anticipating more exclusions here in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we bother to wrap
VMError::is_error_reported()
intosuppress_limit_handling()
?
Because during error handling, code may malloc() too (bad practice, but it can happen). If it does, I don't want circular assertions to fire; I want a clean, complete hs-err file.
Are you anticipating more exclusions here in the future?
None I can think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That explains why we want to call VMError::is_error_reported()
sure, but not why we wrapped it in suppress_limit_handling()
SPI, unless I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay. This was mostly a documentation thing. I can remove the wrapper.
if (l->sz > 0) { | ||
size_t so_far = as_snapshot()->total(); | ||
if ((so_far + s) > l->sz) { // hit the limit | ||
if (!suppress_limit_handling()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see suppress_limit_handling()
checked inside total_limit_reached()
, same for category_limit_reached()
. This way we would force those APIs to always obey the suppression without having to bother to add suppress_limit_handling()
, they could call VMError::is_error_reported()
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see
suppress_limit_handling()
checked insidetotal_limit_reached()
, same forcategory_limit_reached()
. This way we would force those APIs to always obey the suppression without having to bother to addsuppress_limit_handling()
, they could callVMError::is_error_reported()
directly.
Note that total_limit_reached()
and category_limit_reached()
are private APIs.
If we move suppress_limit_handling()
into total_limit_reached()
, total_limit_reached()
would need to report it back to MallocMemorySummary::check_exceeds_limit
since its processed there too, it affects the return code. So total_limit_reached()
cannot be void anymore, and gets somewhat more complex. I prefer it this way, tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following.
If we get rid of suppress_limit_handling()
SPI, and do:
bool MallocMemorySummary::total_limit_reached(size_t s, size_t so_far, const malloclimit* limit) {
if (VMError::is_error_reported()) {
return false;
}
#define FORMATTED \
"MallocLimit: reached global limit (triggering allocation size: " PROPERFMT ", allocated so far: " PROPERFMT ", limit: " PROPERFMT ") ", \
PROPERFMTARGS(s), PROPERFMTARGS(so_far), PROPERFMTARGS(limit->sz)
if (limit->mode == MallocLimitMode::trigger_fatal) {
fatal(FORMATTED);
} else {
log_warning(nmt)(FORMATTED);
}
#undef FORMATTED
return true;
}
bool MallocMemorySummary::category_limit_reached(MEMFLAGS f, size_t s, size_t so_far, const malloclimit* limit) {
if (VMError::is_error_reported()) {
return false;
}
#define FORMATTED \
"MallocLimit: reached category \"%s\" limit (triggering allocation size: " PROPERFMT ", allocated so far: " PROPERFMT ", limit: " PROPERFMT ") ", \
NMTUtil::flag_to_enum_name(f), PROPERFMTARGS(s), PROPERFMTARGS(so_far), PROPERFMTARGS(limit->sz)
if (limit->mode == MallocLimitMode::trigger_fatal) {
fatal(FORMATTED);
} else {
log_warning(nmt)(FORMATTED);
}
#undef FORMATTED
return true;
}
then we can simply have:
// Returns true if allocating s bytes on f would trigger either global or the category limit
inline bool MallocMemorySummary::check_exceeds_limit(size_t s, MEMFLAGS f) {
// Note: checks are ordered to have as little impact as possible on the standard code path,
// when MallocLimit is unset, resp. it is set but we have reached no limit yet.
// Somewhat expensive are:
// - as_snapshot()->total(), total malloc load (requires iteration over arena types)
// - VMError::is_error_reported() is a load from a volatile.
if (MallocLimitHandler::have_limit()) {
// Global Limit ?
const malloclimit* l = MallocLimitHandler::global_limit();
if (l->sz > 0) {
size_t so_far = as_snapshot()->total();
if ((so_far + s) > l->sz) { // hit the limit
if (total_limit_reached(s, so_far, l)) {
return true;
}
}
} else {
// Category Limit ?
l = MallocLimitHandler::category_limit(f);
if (l->sz > 0) {
const MallocMemory* mm = as_snapshot()->by_type(f);
size_t so_far = mm->malloc_size() + mm->arena_size();
if ((so_far + s) > l->sz) {
if (category_limit_reached(f, s, so_far, l)) {
return true;
}
}
}
}
}
return false;
}
Doesn't that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, its better. Can remove vmError.hpp dependency from the header too.
Thanks for your review work! I'll wait for your full review before continuing. |
Sorry, could you please convert the NULLs into nullptr :)? |
Done. Thanks for the review; I'll wait for the final GHAs, then I'll push |
/integrate |
@tstuefe This pull request has not yet been marked as ready for integration. |
/integrate |
@tstuefe This pull request has not yet been marked as ready for integration. |
/integrate |
@tstuefe This pull request has not yet been marked as ready for integration. |
@jdksjolen @gerard-ziemski could you please re-approve? For some reason I cannot integrate and I suspect its because reviews are stale. |
/integrate |
@tstuefe This pull request has not yet been marked as ready for integration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for the work and relentless discussion.
@tstuefe This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 22 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Will there be a release note documenting the changed/removed XX-flags ? Maybe there should be one .
Thanks guys!
That is not necessary since this was a diagnostic switch. Also, the new format is backward compatible with the old one (so no behavioral change if someone had set it) Cheers, Thomas /integrate |
Going to push as commit 90e0922.
Your commit was automatically rebased without conflicts. |
Mailing list message from Thomas Stüfe on hotspot-dev: Danke Jungs! On Thu, Feb 16, 2023 at 3:13 PM Matthias Baesken <mbaesken at openjdk.org> -------------- next part -------------- |
@tstuefe with regards to a release note ... while the flag is now diagnostic that is a change made in JDK 21, so in terms of users of JDK 20 migrating to JDK 21, a release note may be need to both explain the flag is now diagnostic (and subject to change without notice), and that "none" is no longer a valid value. |
I assume you mean JDK-8302129, right? The MetaspaceReclaimPolicy=none removal? Makes sense, I will do that. I opened https://bugs.openjdk.org/browse/JDK-8302708 to track that. |
Sorry @tstuefe - no idea how I got the two issues mixed up. :( |
Rework NMT
MallocLimit
to make it more useful for native OOM analysis. Also removeMallocMaxTestWords
, which had been mostly redundant with NMT (for background, see JDK-8293292).Background
Some months ago we had JDK-8291919, a compiler regression that caused compiler arenas to grow boundlessly. Process was usually OOM-killed before we could take a look, so this was difficult to analyze.
To improve analysis options, we added NMT malloc limits with JDK-8291878. Malloc limits let us set one or multiple limits to malloc load. Surpassing a limit causes the VM to stop with a fatal error in the allocation function, giving us a hs-err file and a core right at the point that (probably) causes the leak. This makes error analysis a lot simpler, and is also valuable for regression-testing footprint usage.
Some people wished for a way to not end with a fatal error but to mimic a native OOM (causing os::malloc to return NULL). This would allow us to test resilience in the face of native OOMs, among other things.
Patch
MallocLimit
option syntax, allowing for an "oom mode" that mimics an oom:moves parsing of
-XX:MallocLimit
out of arguments.cpp intomallocLimit.hpp/cpp
, and rewrites it.changes when limits are checked. Before, the limits were checked after the allocation that caused the threshold overflow. Now we check beforehand.
removes
MallocMaxTestWords
, replacing its uses withMallocLimit
adds new gtests and new jtreg tests
removes a bunch of jtreg tests which are now better served by the new gtests.
gives us more precise error messages upon reaching a limit. For example:
before:
now:
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11371/head:pull/11371
$ git checkout pull/11371
Update a local copy of the PR:
$ git checkout pull/11371
$ git pull https://git.openjdk.org/jdk pull/11371/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 11371
View PR using the GUI difftool:
$ git pr show -t 11371
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11371.diff