Skip to content

Conversation

@jdksjolen
Copy link
Contributor

@jdksjolen jdksjolen commented Oct 17, 2024

Today NMT has two canaries: A header and a footer canary. These enable mainly two things:

  1. For NMT to aid in describing a pointer
  2. A basic form of out-of-bounds protection

With the introduction of UBSan and Asan into OpenJDK we have gained stronger tools for this sort of analysis, without requiring NMT to be activated. Therefore, I believe that point 2 is no longer something that NMT needs to support. For point number one, we will unfortunately be losing this ability.

I want to delete these canaries to open up a few free bytes. These can allow us to have "practically unlimited" (4 bytes) of memory tags.

tier1-tier2 tests succeeded.

I am awaiting discussion on the Hotspot-dev mailing list, but keeping this PR open for review.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8342504: Remove NMT header and footer canaries (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21560/head:pull/21560
$ git checkout pull/21560

Update a local copy of the PR:
$ git checkout pull/21560
$ git pull https://git.openjdk.org/jdk.git pull/21560/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21560

View PR using the GUI difftool:
$ git pr show -t 21560

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21560.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 17, 2024

👋 Welcome back jsjolen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 17, 2024

@jdksjolen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8342504: Remove NMT header and footer canaries

Reviewed-by: gziemski, azafari, stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 90 new commits pushed to the master branch:

  • 3e46480: 8350770: [BACKOUT] Protection zone for easier detection of accidental zero-nKlass use
  • bd112c4: 8350443: GHA: Split static-libs-bundles into a separate job
  • 2731712: 8287749: Re-enable javadoc -serialwarn option
  • 0f82268: 8345598: Upgrade NSS binaries for interop tests
  • ea2c923: 8323807: Async UL: Add a stalling mode to async UL
  • e7d4b36: 8350667: Remove startThread_lock() and _startThread_lock on AIX
  • 1e18fff: 8328473: StringTable and SymbolTable statistics delay time to safepoint
  • a0dd565: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup
  • aac9cb4: 8349906: G1: Improve initial survivor rate for newly used young regions
  • a431046: 8350518: org.openjdk.bench.java.util.TreeMapUpdate.compute fails with "java.lang.IllegalArgumentException: key out of range"
  • ... and 80 more: https://git.openjdk.org/jdk/compare/0ef1c40991e703592fc79325bda1a6d2fc6caf4e...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title 8342504 8342504: Remove NMT header and footer canaries Oct 17, 2024
@openjdk
Copy link

openjdk bot commented Oct 17, 2024

@jdksjolen The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Oct 17, 2024
@jdksjolen jdksjolen marked this pull request as ready for review October 17, 2024 13:03
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 17, 2024
@mlbridge
Copy link

mlbridge bot commented Oct 17, 2024

Webrevs

@gerard-ziemski
Copy link

Nice code removal!

I wonder if there is any measurable performance impact?

Not a review yet, but I took a superficial peek and I like it.

Real review coming up...

Copy link

@gerard-ziemski gerard-ziemski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see code removal, tweaks to malloc header size, tests fixes, so all looks good, except for my question about MallocTracker::print_pointer_information()

// accidentally like a valid malloc block header (canaries and all) so this is not
// totally failproof and may give a wrong answer. It is safe in that it will never
// crash, even when encountering unmapped memory.
bool MallocTracker::print_pointer_information(const void* p, outputStream* st) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only thing that had to go was candidate->looks_valid()

Isn't it useful to keep the rest of this code though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've also removed the is_live() call. So now we really can't make a reasonable guess whether a potential pointer really points to a MallocHeader. Since, in my opinion, we can't find a reasonable candidate (not sufficient clues), I decided to remove the code.

* 8 9 10 11 12 13 14 15 16 ++
* +--------+--------+--------+--------+--------+--------+--------+--------+ ------------------------
* ... | malloc site table marker | flags | unused | canary | ... User payload ....
* ... | malloc site table marker | flags | unused | ... User payload ....
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags -> Memory Tags

* 8 9 10 11 12 13 14 15 16 ++
* +--------+--------+--------+--------+--------+--------+--------+--------+ ------------------------
* ... | malloc site table marker | flags | unused | canary | ... User payload ....
* ... | malloc site table marker | flags | unused | ... User payload ....
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags -> Memory Tags


// realloc is the trickiest of the bunch. Test that realloc works and correctly takes over
// NMT header and footer to the resized block. We just test that nothing crashes - if the
// header/footer get corrupted, NMT heap corruption checker will trigger alert on os::free()).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing canary, can NMT still check memory corruption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not anymore. The canaries are how we perform memory corruption checks. As mentioned in the PR description, I believe that since we have introduced UBSan and ASan it is no longer necessary for NMT to be able to detect memory corruption.

Copy link
Contributor

@TheShermanTanker TheShermanTanker Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to caution that this would trash the ability to check for corruption for platforms that don't support ubsan and asan. Windows, with the VC compiler, comes to mind (Actually, even the Windows/gcc Port cannot use ubsan and asan either, since the sanitizers haven't been properly ported for Windows in general)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input, Julian. It seems like both ubsan and asan are available on Clang for Windows (8.1+ for asan), so we'd have a way of getting memory corruption detection on those platforms also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang and its many variants (clang-cl, clang in MinGW mode, etc) are not supported as compilers for the Windows JDK. The only compiler that the build system allows for Windows is VC. I do have a port that supports gcc (Which unfortunately also does not have access to ubsan or asan), but have not started work on clang because it was a bit above my capabilities

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'm personally not aware of why using a different C++-compiler on a particular platform would be a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ASAN is supported by VC on Windows platforms, we can still be safe to remove the memory corruption checking by NMT. ASAN is already available in JDK build, so available for the Windows platform.

Copy link
Contributor

@afshin-zafari afshin-zafari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work.
Some flags vs MemTag found.

@jdksjolen
Copy link
Contributor Author

Thanks! I switched the name to 'tags' to make sure that the name fits within the description.

Copy link

@gerard-ziemski gerard-ziemski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 18, 2024
@kimbarrett
Copy link

Not a review, just a comment in passing. Removing the NMT header/footer canaries is probably needed to give us a
path to C++17 dynamic allocation of overaligned types. See https://bugs.openjdk.org/browse/JDK-8318721.

@jdksjolen
Copy link
Contributor Author

jdksjolen commented Nov 20, 2024

Not a review, just a comment in passing. Removing the NMT header/footer canaries is probably needed to give us a path to C++17 dynamic allocation of overaligned types. See https://bugs.openjdk.org/browse/JDK-8318721.

I don't think that it'll help too much, we will still have a header in NMT, same size. It's just not used for canaries any more.

Edit: Scratch that. I just used the fact that we no longer have headers to propose a way of solving the problem.

Copy link
Contributor

@afshin-zafari afshin-zafari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment line saying that NMT can detect memory corruptions is not valid any more and is still in the code.

@tstuefe
Copy link
Member

tstuefe commented Dec 2, 2024

Not a review, just a comment in passing. Removing the NMT header/footer canaries is probably needed to give us a path to C++17 dynamic allocation of overaligned types. See https://bugs.openjdk.org/browse/JDK-8318721.

I disagree. See my comment in 8318721: https://bugs.openjdk.org/browse/JDK-8318721?focusedId=14727326&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14727326

@tstuefe
Copy link
Member

tstuefe commented Dec 2, 2024

As I said in preparation of this work, I don't oppose it, but I am not happy.

ASAN is not a replacement. ASAN is a special build, slow, needs tons of additional memory, stops at the first (often false) positive, and is often bitrotted since, to my knowledge, no vendor builds ASAN-enabled JVMs regularly. More importantly, if you have a problem in the field, it is easy to convince a customer to switch on NMT. You will not convince a customer to switch their production JVMs against an ASAN-enabled one.

But okay, let's remove it. I hope the capabilities this will enable are worth the loss of this capability.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 30, 2024

@jdksjolen This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@jdksjolen
Copy link
Contributor Author

We're now in JDK-25 so I'm planning to integrate this ASAP.

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 31, 2025

@jdksjolen This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk
Copy link

openjdk bot commented Feb 12, 2025

@jdksjolen this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout delete-canaries
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Feb 12, 2025
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 20, 2025
@gerard-ziemski
Copy link

As I said in preparation of this work, I don't oppose it, but I am not happy.

ASAN is not a replacement. ASAN is a special build, slow, needs tons of additional memory, stops at the first (often false) positive, and is often bitrotted since, to my knowledge, no vendor builds ASAN-enabled JVMs regularly. More importantly, if you have a problem in the field, it is easy to convince a customer to switch on NMT. You will not convince a customer to switch their production JVMs against an ASAN-enabled one.

But okay, let's remove it. I hope the capabilities this will enable are worth the loss of this capability.

Do you have any recent examples of NMT canaries helping out "in the field"? I do feel a little uneasy about loosing this feature, assuming it still helps out in the field.

Copy link

@gerard-ziemski gerard-ziemski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 26, 2025
@tstuefe
Copy link
Member

tstuefe commented Feb 26, 2025

As I said in preparation of this work, I don't oppose it, but I am not happy.
ASAN is not a replacement. ASAN is a special build, slow, needs tons of additional memory, stops at the first (often false) positive, and is often bitrotted since, to my knowledge, no vendor builds ASAN-enabled JVMs regularly. More importantly, if you have a problem in the field, it is easy to convince a customer to switch on NMT. You will not convince a customer to switch their production JVMs against an ASAN-enabled one.
But okay, let's remove it. I hope the capabilities this will enable are worth the loss of this capability.

Do you have any recent examples of NMT canaries helping out "in the field"? I do feel a little uneasy about loosing this feature, assuming it still helps out in the field.

Nothing I can share, but I had two customers using Unsafe.AllocateMemory (which atm are the only allocations from outside the JVM that are tracked), and they overwrote the boundaries. Took a while to find this, but the canaries added by NMT (mainly the hex dump outputs) were helpful.

I wonder, do we have to completely throw this feature away? Even a single byte as canary would be better than nothing?

@gerard-ziemski
Copy link

As I said in preparation of this work, I don't oppose it, but I am not happy.
ASAN is not a replacement. ASAN is a special build, slow, needs tons of additional memory, stops at the first (often false) positive, and is often bitrotted since, to my knowledge, no vendor builds ASAN-enabled JVMs regularly. More importantly, if you have a problem in the field, it is easy to convince a customer to switch on NMT. You will not convince a customer to switch their production JVMs against an ASAN-enabled one.
But okay, let's remove it. I hope the capabilities this will enable are worth the loss of this capability.

Do you have any recent examples of NMT canaries helping out "in the field"? I do feel a little uneasy about loosing this feature, assuming it still helps out in the field.

Nothing I can share, but I had two customers using Unsafe.AllocateMemory (which atm are the only allocations from outside the JVM that are tracked), and they overwrote the boundaries. Took a while to find this, but the canaries added by NMT (mainly the hex dump outputs) were helpful.

I wonder, do we have to completely throw this feature away? Even a single byte as canary would be better than nothing?

Or make the feature optional for those devs, who are willing to pay the price?

I would be OK with this being a dynamic, optional feature that can be turned OFF/ON.

@tstuefe
Copy link
Member

tstuefe commented Feb 26, 2025

As I said in preparation of this work, I don't oppose it, but I am not happy.

ASAN is not a replacement. ASAN is a special build, slow, needs tons of additional memory, stops at the first (often false) positive, and is often bitrotted since, to my knowledge, no vendor builds ASAN-enabled JVMs regularly. More importantly, if you have a problem in the field, it is easy to convince a customer to switch on NMT. You will not convince a customer to switch their production JVMs against an ASAN-enabled one.

But okay, let's remove it. I hope the capabilities this will enable are worth the loss of this capability.

Do you have any recent examples of NMT canaries helping out "in the field"? I do feel a little uneasy about loosing this feature, assuming it still helps out in the field.

Nothing I can share, but I had two customers using Unsafe.AllocateMemory (which atm are the only allocations from outside the JVM that are tracked), and they overwrote the boundaries. Took a while to find this, but the canaries added by NMT (mainly the hex dump outputs) were helpful.

I wonder, do we have to completely throw this feature away? Even a single byte as canary would be better than nothing?

Or make the feature optional for those devs, who are willing to pay the price?

I would be OK with this being a dynamic, optional feature that can be turned OFF/ON.

increases complexity, though. malloc header is 16 byte, we could use eg the most significant byte of the size field. Heck, we could use the topmost four bytes, since probably Nobody ever allocates more than 4g with malloc. But certainly the topmost byte. 56 bit are larger than most addrss spaces.

@jdksjolen
Copy link
Contributor Author

Hi all,

As you might suspect, there is a reason that this PR is still open. I am also a bit unsure about removing this. The goal is to replace the two canary bytes in the header with 2 bytes for MemTags, and using the unused byte to gain a total of 4 bytes. One thing we can do is to replace only one of the canary bytes, to gain a total of 3 bytes for MemTags. A 3 byte MemTag would absolutely be sufficient for what we want.

@tstuefe
Copy link
Member

tstuefe commented Feb 28, 2025

As I said in preparation of this work, I don't oppose it, but I am not happy.
ASAN is not a replacement. ASAN is a special build, slow, needs tons of additional memory, stops at the first (often false) positive, and is often bitrotted since, to my knowledge, no vendor builds ASAN-enabled JVMs regularly. More importantly, if you have a problem in the field, it is easy to convince a customer to switch on NMT. You will not convince a customer to switch their production JVMs against an ASAN-enabled one.
But okay, let's remove it. I hope the capabilities this will enable are worth the loss of this capability.

Do you have any recent examples of NMT canaries helping out "in the field"? I do feel a little uneasy about loosing this feature, assuming it still helps out in the field.

Thinking about this, I actually rely on this feature a lot more often:

A) Customer: "Look at my hs-err file"

B) Me: (see crashes in somewhere in libc, or something that looks like corrupted C-heap) "Switch on NMT please"

C) Customer: "ok" - does it, no change.

D) Now I know it is not a simple double free or overwrite of memory allocated by the hotspot or direct byte buffers. I can exclude, for now, us (the VM vendor) as a culprit and take a closer look at whatever third-party JNI libraries are running in the process. In the end, it may still turn out we broke something, but for now a misuse of C-heap from non-JDK code is more likely. So, NMT is a useful first-responder tool in these cases.

This is not that rare. And this feature will be a lot more useful once we have full-process - or at least, with Johan's plans - full JDK integration of NMT. Because then, the surface of instrumented code is larger.

If, at step (B), I would have asked the customer: "Can you please switch out the JVM against one I give you that has ASAN enabled", that runs into tons of problems: they may not be in a position to switch out binaries (maybe deployed out of reach somewhere), they may not be allowed to, ASAN build may be too slow, ASAN crashes may be unacceptable in production (and these crashes will be much more likely than NMT finding some problem).

For that reason, I propose to leave the footer canary as it is. I cannot see what purpose removing it serves (other than "oh it is simpler now"), and arguably it's the more important of the two canaries for catching end-of-buffer overruns. The leading canary let's shrink to one byte. Ideally placed right in front of the user payload.

@jdksjolen
Copy link
Contributor Author

As I said in preparation of this work, I don't oppose it, but I am not happy.
ASAN is not a replacement. ASAN is a special build, slow, needs tons of additional memory, stops at the first (often false) positive, and is often bitrotted since, to my knowledge, no vendor builds ASAN-enabled JVMs regularly. More importantly, if you have a problem in the field, it is easy to convince a customer to switch on NMT. You will not convince a customer to switch their production JVMs against an ASAN-enabled one.
But okay, let's remove it. I hope the capabilities this will enable are worth the loss of this capability.

Do you have any recent examples of NMT canaries helping out "in the field"? I do feel a little uneasy about loosing this feature, assuming it still helps out in the field.

Thinking about this, I actually rely on this feature a lot more often:

A) Customer: "Look at my hs-err file"

B) Me: (see crashes in somewhere in libc, or something that looks like corrupted C-heap) "Switch on NMT please"

C) Customer: "ok" - does it, no change.

D) Now I know it is not a simple double free or overwrite of memory allocated by the hotspot or direct byte buffers. I can exclude, for now, us (the VM vendor) as a culprit and take a closer look at whatever third-party JNI libraries are running in the process. In the end, it may still turn out we broke something, but for now a misuse of C-heap from non-JDK code is more likely. So, NMT is a useful first-responder tool in these cases.

This is not that rare. And this feature will be a lot more useful once we have full-process - or at least, with Johan's plans - full JDK integration of NMT. Because then, the surface of instrumented code is larger.

If, at step (B), I would have asked the customer: "Can you please switch out the JVM against one I give you that has ASAN enabled", that runs into tons of problems: they may not be in a position to switch out binaries (maybe deployed out of reach somewhere), they may not be allowed to, ASAN build may be too slow, ASAN crashes may be unacceptable in production (and these crashes will be much more likely than NMT finding some problem).

For that reason, I propose to leave the footer canary as it is. I cannot see what purpose removing it serves (other than "oh it is simpler now"), and arguably it's the more important of the two canaries for catching end-of-buffer overruns. The leading canary let's shrink to one byte. Ideally placed right in front of the user payload.

This sounds like a good middle-ground.

@jdksjolen jdksjolen closed this Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-runtime hotspot-runtime-dev@openjdk.org ready Pull request is ready to be integrated rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

6 participants