-
Notifications
You must be signed in to change notification settings - Fork 6.1k
JDK-8277822: Remove debug-only heap overrun checks in os::malloc and friends #6554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
44ed71c
to
9f0e0d7
Compare
e6163f8
to
f0bca7a
Compare
f0bca7a
to
a571f6d
Compare
Webrevs
|
@tstuefe This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
…-checks-in-os-malloc-and-friends
…-checks-in-os-malloc-and-friends
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a very nice cleanup, although I have a couple of questions.
} | ||
static void break_if_ptr_caught(void* ptr) { | ||
if (p2i(ptr) == (intptr_t)MallocCatchPtr) { | ||
log_warning(malloc, free)("ptr caught: " PTR_FORMAT, p2i(ptr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the same as MallocCatchPtr, doesn't this output simply print MallocCatchPtr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this does. ie. please tell what it does in a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the same as MallocCatchPtr, doesn't this output simply print MallocCatchPtr?
Yes
I don't understand what this does. ie. please tell what it does in a comment.
Oh, I did not invent it, I just unified the various places where before we checked MallocCatchPtr. E.g. here:
jdk/src/hotspot/share/runtime/os.cpp
Lines 700 to 703 in fb8fdc0
if ((intptr_t)ptr == (intptr_t)MallocCatchPtr) { | |
log_warning(malloc, free)("os::malloc caught, " SIZE_FORMAT " bytes --> " PTR_FORMAT, size, p2i(ptr)); | |
breakpoint(); | |
} |
The coding is old, I believe it predates the OpenJDK. I would love to get rid of MallocCatchPtr, if nobody uses it anymore.
@@ -38,7 +38,6 @@ | |||
#include "logging/log.hpp" | |||
#include "logging/logStream.hpp" | |||
#include "memory/allocation.inline.hpp" | |||
#include "memory/guardedMemory.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this file now be deleted if not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it's used in jniCheck.cpp - was it double-guarded ? It's ok to leave this maybe to investigate for another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jniCheck.cpp was double guarded, yes. It's also horrible since it does a complete copy of the user buffer, including payload. I'd love to throw it away, have actually planned for it. But people get suspicious if I remove too much in a single RFE, therefore I wanted to keep this for a separate discussion. The discussion, in this case, is that we would have to bind NMT=enabled to -XcheckJni to get NMT to take care of overflow checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yes, I see maybe it's not a good idea since NMT is enabled by default only on debug and Xcheck:jni is a product level feature. Maybe Xcheck:jni can turn on NMT though. I suppose that would be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought, but it would be good to have this discussion separately.
|
||
const NMT_TrackingLevel level = MemTracker::tracking_level(); | ||
|
||
// If NMT is enabled, this checks for heap overwrites, then de-accounts the old block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a link where NMT checks for heap overwrites to save me looking for it? Thanks.
@tstuefe This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 25 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Hi Coleen, thanks for looking at this! I already had given up hope :)
Sure. This came all with https://bugs.openjdk.java.net/browse/JDK-8275320. Upon free (either one of os::free or os::realloc) we call, if NMT is enabled, jdk/src/hotspot/share/services/mallocTracker.cpp Lines 160 to 220 in fb8fdc0
I also added an ASCII picture of the NMT headers: jdk/src/hotspot/share/services/mallocTracker.hpp Lines 240 to 292 in fb8fdc0
that may clarify things. We have two canaries, one in the header directly preceding the user payload, one following the user payload. |
Lots of large reviews seem to fallen through the cracks for me. I'd just noticed it. I'd also missed the NMT change. I think this is a nice cleanup. Hopefully @zhengyu123 can review this also. |
Thank you. It is not that urgent, we missed jdk18 already. OTOH, larger cleanups like this are better placed at the start of a new release anyway, so no harm done. |
Yes, much better as a JDK 19 change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -543,7 +543,7 @@ const intx ObjectAlignmentInBytes = 8; | |||
"compression. Otherwise the level must be between 1 and 9.") \ | |||
range(0, 9) \ | |||
\ | |||
product(ccstr, NativeMemoryTracking, "off", \ | |||
product(ccstr, NativeMemoryTracking, DEBUG_ONLY("summary") NOT_DEBUG("off"), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Thomas,
I'm a bit concerned about always enabling part of NMT in debug as that affects the bulk of our hotspot testing. What is the potential overhead from enabling NMT this way? We would need to run this patch through our CI to ensure it doesn't introduce any issues.
Cheers,
David
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi David,
I understand your concern. I have had this patch in our internal tests enabled for several months now, but am fine with you guys doing more tests.
My arguments for this:
- using NMT in debug builds in place of the old hard wired fences should be less overhead, not more. NMT is quite frugal. NMT=summary does:
- increase the counter per NMT category (basically atomic-incing a number in a static array)
- add and populate a malloc header and a footer
That's it. We did the same thing before. The old code atomically increased two global counters (inc_stat_counter
), then added a header and a footer. Only the old header/footer had been much larger than what NMT does. So we use less memory, not more, in debug.
- Another argument for this is that NMT is important. It gets used by support, typically in stressful situations. I rather have it thoroughly tested as part of our internal tests. I am not sure that we have ironed out every single NMT issue. NMT feels very stable to me, but there may be lingering issues. But then we should fix them.
All that said, I'd be happy for you to give this patch a spin in your CI. I can hold off the push for a bit.
Cheers, Thomas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional info Thomas. I will run it through our CI and report back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing a timeout with langtools tier1 test tools/javac/failover/CheckAttributedTree.java
on macosx-aarch64. It timed out after 9m. I'll try a few re-runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 re-runs all passed fine - maximum time 3m01s. A re-run on the same machine as previously timed-out was only 47s. So I think we are generally good to go here. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this David.
Thanks @coleenp, @zhengyu123 and @dholmes-ora Tests at SAP went through smoothly too. /integrate |
Going to push as commit 39b1d75.
Your commit was automatically rebased without conflicts. |
After integrating C heap overrun checks into NMT (https://bugs.openjdk.java.net/browse/JDK-8275320), we can now remove debug-only guarding memory for os::malloc() and friends.
This will:
os::malloc
,os::free
,os::realloc
(especially that one since realloc is tricky)Currently, we have always-on heap overrun checks in debug VMs. In order to preserve this feature after the patch, NMT is switched always on per default in debug builds (but can be switched off if needed, as opposed to the baked-in debug-only guards today).
We use NMT level summary here, which is very cheap since we don't track stacks, we just count some numbers in NMT categories. Memory overhead will still be a lot less than what we paid before with the quite generous debug-only guards.
Finally, this would also be a good coverage test for NMT.
For more information, please refer to the umbrella JBS (https://bugs.openjdk.java.net/browse/JDK-8275301).
Memory stomp reports are better now since NMT buffer overrun reporting (see JDK-8275320) is better.
Before:
Old reporting was susceptible to C-heap corruption: a sufficiently corrupted C-heap caused the libc to abort before reporting was completed:
The new reports, done by NMT, print out a hex dump of the corrupted area, before calling into libc. So this printout is something we always get, even if C-heap is corrupted beyond help:
At first glance it seems that "nof_mallocs" etc are missing from this output. This is true, I removed them. They are not needed since the hs-err file contains an NMT summary report, which has more and finer grained information than that.
Tests:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6554/head:pull/6554
$ git checkout pull/6554
Update a local copy of the PR:
$ git checkout pull/6554
$ git pull https://git.openjdk.java.net/jdk pull/6554/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6554
View PR using the GUI difftool:
$ git pr show -t 6554
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6554.diff