Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8268893: jcmd to trim the glibc heap #4510

Closed
wants to merge 7 commits into from

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Jun 16, 2021

Proposal to add a Linux+glibc-only jcmd to manually induce malloc_trim(3) in the VM process.

The glibc is somewhat notorious for retaining released C Heap memory: calling free(3) returns memory to the glibc, and most libc variants will return at least a portion of it back to the Operating System, but the glibc often does not.

This depends on the granularity of the allocations and a number of other factors, but we found that many small allocations in particular may cause the process heap segment (hence RSS) to get bloaty. This can cause the VM to not recover from C-heap usage spikes.

The glibc offers an API, "malloc_trim", which can be used to cause the glibc to return free'd memory back to the Operating System.

This may cost performance, however, and therefore I hesitate to call malloc_trim automatically. That may be an idea for another day.

Instead of an automatic trim I propose to add a jcmd which allows to manually trigger a libc heap trim. Such a command would have two purposes:

  • when analyzing cases of high memory footprint, it allows to distinguish "real" footprint, e.g. leaks, from a cases where the glibc just holds on to memory
  • as a stop gap measure it allows to release pressure from a high footprint scenario.

Note that this command also helps with analyzing libc peaks which had nothing to do with the VM - e.g. peaks created by customer code which just happens to share the same process as the VM. Such memory does not even have to show up in NMT.

I propose to introduce this command for Linux only. Other OSes (apart maybe AIX) do not seem to have this problem, but Linux is arguably important enough in itself to justify a Linux specific jcmd.

CSR for this command: https://bugs.openjdk.java.net/browse/JDK-8269345

Note that an alternative to a Linux-only jcmd would be a command which would trim the C-heap on all platforms, with implementations to be filled out later.

=========

This patch:

  • introduces a new jcmd, "VM.trim_libc_heap", no arguments, which trims the glibc heap on glibc platforms.
  • includes a (rather basic) test
  • the command calls malloc_trim(3), and additionally prints out its effect (changes caused in virt size, rss and swap space)
  • I refactored some code in os_linux.cpp to factor out scanning /proc/self/status to get kernel memory information.

=========

Example:

A programm causes a temporary peak in C-heap usage (in this case, triggered via Unsafe.allocateMemory), right away frees the memory again, so its not leaky. The peak in RSS was ~8G (even though the user allocation was way smaller - glibc has a lot of overhead). The effects of this peak linger even after returning that memory to the glibc:

thomas@starfish:~$ jjjcmd AllocCHeap VM.info | grep Resident
Resident Set Size: 8685896K (peak: 8685896K) (anon: 8648680K, file: 37216K, shmem: 0K)
                   ^^^^^^^^

We execute the new trim command via jcmd:

thomas@starfish:~$ jjjcmd AllocCHeap VM.trim_libc_heap
18770:
Attempting trim...
Done.
Virtual size before: 28849744k, after: 28849724k, (-20k)
RSS before: 8685896k, after: 920740k, (-7765156k)  <<<<
Swap before: 0k, after: 0k, (0k)

It prints out reduction in virtual size, rss and swap. The virtual size did not decrease since no mappings had been unmapped by the glibc. However, the process heap was shrunk heavily by the glibc, resulting in a large drop in RSS (8.5G->900M), freeing >7G of memory:

thomas@starfish:~$ jjjcmd AllocCHeap VM.info | grep Resident
Resident Set Size: 920740K (peak: 8686004K) (anon: 883460K, file: 37280K, shmem: 0K)
                   ^^^^^^^

When the VM is started with -Xlog:os, this is also logged:

[139,068s][info][os] malloc_trim:
[139,068s][info][os] Virtual size before: 28849744k, after: 28849724k, (-20k)
RSS before: 8685896k, after: 920740k, (-7765156k)
Swap before: 0k, after: 0k, (0k)

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4510/head:pull/4510
$ git checkout pull/4510

Update a local copy of the PR:
$ git checkout pull/4510
$ git pull https://git.openjdk.java.net/jdk pull/4510/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4510

View PR using the GUI difftool:
$ git pr show -t 4510

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4510.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 16, 2021

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 16, 2021

@tstuefe The following labels will be automatically applied to this pull request:

  • hotspot-runtime
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org labels Jun 16, 2021
@tstuefe tstuefe changed the title JDK-8268893: jcmd command to trim the glibc heap JDK-8268893: jcmd to trim the glibc heap Jun 17, 2021
@tstuefe tstuefe marked this pull request as ready for review June 17, 2021 06:36
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 17, 2021
@mlbridge
Copy link

mlbridge bot commented Jun 17, 2021

Webrevs

@mlbridge
Copy link

mlbridge bot commented Jun 21, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Thomas,

On 17/06/2021 4:41 pm, Thomas Stuefe wrote:

The glibc is somewhat notorious for retaining released C Heap memory: calling free(3) returns memory to the glibc, and most libc variants will return at least a portion of it back to the Operating System, but the glibc often does not.

This depends on the granularity of the allocations and a number of other factors, but we found that many small allocations in particular may cause the process heap segment (hence RSS) to get bloaty. This can cause the VM to not recover from C-heap usage spikes.

The glibc offers an API, "malloc_trim", which can be used to cause the glibc to return free'd memory back to the Operating System.

This may cost performance, however, and therefore I hesitate to call malloc_trim automatically. That may be an idea for another day.

Instead of an automatic trim I propose to add a jcmd which allows to manually trigger a libc heap trim. Such a command would have two purposes:
- when analyzing cases of high memory footprint, it allows to distinguish "real" footprint, e.g. leaks, from a cases where the glibc just holds on to memory
- as a stop gap measure it allows to release pressure from a high footprint scenario.

Note that this command also helps with analyzing libc peaks which had nothing to do with the VM - e.g. peaks created by customer code which just happens to share the same process as the VM. Such memory does not even have to show up in NMT.

I propose to introduce this command for Linux only. Other OSes (apart maybe AIX) do not seem to have this problem, but Linux is arguably important enough in itself to justify a Linux specific jcmd.

Is it perhaps worthwhile trying to generalize this to a jcmd to request
an attempt to release system resources and then each platform can do
whatever may be available to assist in that - including doing nothing,
or in this case trimming the glibc heap ?

Thanks,
David
-----

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 21, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Thomas,

On 17/06/2021 4:41 pm, Thomas Stuefe wrote:

The glibc is somewhat notorious for retaining released C Heap memory: calling free(3) returns memory to the glibc, and most libc variants will return at least a portion of it back to the Operating System, but the glibc often does not.

This depends on the granularity of the allocations and a number of other factors, but we found that many small allocations in particular may cause the process heap segment (hence RSS) to get bloaty. This can cause the VM to not recover from C-heap usage spikes.

The glibc offers an API, "malloc_trim", which can be used to cause the glibc to return free'd memory back to the Operating System.

This may cost performance, however, and therefore I hesitate to call malloc_trim automatically. That may be an idea for another day.

Instead of an automatic trim I propose to add a jcmd which allows to manually trigger a libc heap trim. Such a command would have two purposes:
- when analyzing cases of high memory footprint, it allows to distinguish "real" footprint, e.g. leaks, from a cases where the glibc just holds on to memory
- as a stop gap measure it allows to release pressure from a high footprint scenario.

Note that this command also helps with analyzing libc peaks which had nothing to do with the VM - e.g. peaks created by customer code which just happens to share the same process as the VM. Such memory does not even have to show up in NMT.

I propose to introduce this command for Linux only. Other OSes (apart maybe AIX) do not seem to have this problem, but Linux is arguably important enough in itself to justify a Linux specific jcmd.

Is it perhaps worthwhile trying to generalize this to a jcmd to request
an attempt to release system resources and then each platform can do
whatever may be available to assist in that - including doing nothing,
or in this case trimming the glibc heap ?

Thanks,
David
-----

@tstuefe
Copy link
Member Author

tstuefe commented Jun 21, 2021

Hi David,

thanks for looking into this!

Is it perhaps worthwhile trying to generalize this to a jcmd to request
an attempt to release system resources and then each platform can do
whatever may be available to assist in that - including doing nothing,
or in this case trimming the glibc heap ?

I thought about this too. It would certainly make the jcmd look nicer.

But I think that this would be one of the cases where the supporter needs to know the details of what the command does under the hood, so by hiding the complexity behind the platform nothing is gained; you need to know what happens to gauge the influence it has on the system (eg performance loss by dropping caches) and to analyze the precise after-effects. I

I am undecided on this. I will ask around offline.

..Thomas

Copy link
Member

@simonis simonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this little new diagnostic command which I think can be quite useful in specific situations.

However, in contrast to other reviewers, I'd rather keep this simple and Glibc specific instead of extending it to a more general but mostly empty command.

I'd therefore propose to rename this command to glibc_trim_heap to make it evident from the command name already that it is Glibc-specific.

Besides that, just cosmetic changes and suggestions.

src/hotspot/os/linux/os_linux.cpp Outdated Show resolved Hide resolved
src/hotspot/os/linux/trimCHeapDCmd.cpp Show resolved Hide resolved
src/hotspot/share/services/diagnosticCommand.cpp Outdated Show resolved Hide resolved
src/hotspot/os/linux/trimCHeapDCmd.cpp Show resolved Hide resolved
@tstuefe
Copy link
Member Author

tstuefe commented Jun 21, 2021

I like this little new diagnostic command which I think can be quite useful in specific situations.

Thanks a lot Volker!

However, in contrast to other reviewers, I'd rather keep this simple and Glibc specific instead of extending it to a more general but mostly empty command.

Yes, that was my thought too. Lets wait for @dholmes-ora to chime in, whether we can all agree on a glibc specific variant. I also preferred that one.

I'd therefore propose to rename this command to glibc_trim_heap to make it evident from the command name already that it is Glibc-specific.

Besides that, just cosmetic changes and suggestions.

All of which make sense, I'll work them in.

Thanks!

..Thomas

@mlbridge
Copy link

mlbridge bot commented Jun 21, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 21/06/2021 7:58 pm, Thomas Stuefe wrote:

On Mon, 21 Jun 2021 09:15:05 GMT, Volker Simonis <simonis at openjdk.org> wrote:

I like this little new diagnostic command which I think can be quite useful in specific situations.

Thanks a lot Volker!

However, in contrast to other reviewers, I'd rather keep this simple and Glibc specific instead of extending it to a more general but mostly empty command.

Yes, that was my thought too. Lets wait for @dholmes-ora to chime in, whether we can all agree on a glibc specific variant. I also preferred that one.

I don't totally oppose the specialised variant, but it certainly isn't
my ideal solution. Not sure if we already have platform specific dcmds?
This will need a CSR request anyway.

Cheers,
David

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 21, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 21/06/2021 7:58 pm, Thomas Stuefe wrote:

On Mon, 21 Jun 2021 09:15:05 GMT, Volker Simonis <simonis at openjdk.org> wrote:

I like this little new diagnostic command which I think can be quite useful in specific situations.

Thanks a lot Volker!

However, in contrast to other reviewers, I'd rather keep this simple and Glibc specific instead of extending it to a more general but mostly empty command.

Yes, that was my thought too. Lets wait for @dholmes-ora to chime in, whether we can all agree on a glibc specific variant. I also preferred that one.

I don't totally oppose the specialised variant, but it certainly isn't
my ideal solution. Not sure if we already have platform specific dcmds?
This will need a CSR request anyway.

Cheers,
David

@tstuefe
Copy link
Member Author

tstuefe commented Jun 21, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 21/06/2021 7:58 pm, Thomas Stuefe wrote:

On Mon, 21 Jun 2021 09:15:05 GMT, Volker Simonis wrote:

I like this little new diagnostic command which I think can be quite useful in specific situations.

Thanks a lot Volker!

However, in contrast to other reviewers, I'd rather keep this simple and Glibc specific instead of extending it to a more general but mostly empty command.

Yes, that was my thought too. Lets wait for @dholmes-ora to chime in, whether we can all agree on a glibc specific variant. I also preferred that one.

I don't totally oppose the specialised variant, but it certainly isn't
my ideal solution. Not sure if we already have platform specific dcmds?
This will need a CSR request anyway.

Cheers,
David

I'll prepare the CSR and bring it to review.

Cheers, Thomas

@simonis
Copy link
Member

simonis commented Jun 21, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 21/06/2021 7:58 pm, Thomas Stuefe wrote:

On Mon, 21 Jun 2021 09:15:05 GMT, Volker Simonis wrote:

I like this little new diagnostic command which I think can be quite useful in specific situations.

Thanks a lot Volker!

However, in contrast to other reviewers, I'd rather keep this simple and Glibc specific instead of extending it to a more general but mostly empty command.

Yes, that was my thought too. Lets wait for @dholmes-ora to chime in, whether we can all agree on a glibc specific variant. I also preferred that one.

I don't totally oppose the specialised variant, but it certainly isn't
my ideal solution. Not sure if we already have platform specific dcmds?

Yes, we have. E.g.:

#ifdef LINUX
class PerfMapDCmd : public DCmd {
public:
  PerfMapDCmd(outputStream* output, bool heap) : DCmd(output, heap) {}
  static const char* name() {
    return "Compiler.perfmap";
  }
  static const char* description() {
    return "Write map file for Linux perf tool.";
  }
...

This will need a CSR request anyway.

Cheers,
David

}
_output->print_raw(ss_report.base());
log_info(os)("malloc_trim: ");
log_info(os)("%s", ss_report.base());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also make sense to put this into a single log line like log_info(os)("malloc_trim:\n%s", ss_report.base()); to get something like:

[139,068s][info][os] malloc_trim:
Virtual size before: 28849744k, after: 28849724k, (-20k)
RSS before: 8685896k, after: 920740k, (-7765156k)
Swap before: 0k, after: 0k, (0k)

instead of:

[139,068s][info][os] malloc_trim:
[139,068s][info][os] Virtual size before: 28849744k, after: 28849724k, (-20k)
RSS before: 8685896k, after: 920740k, (-7765156k)
Swap before: 0k, after: 0k, (0k)

@tstuefe
Copy link
Member Author

tstuefe commented Jun 25, 2021

Hi,

I created the CSR for this command: https://bugs.openjdk.java.net/browse/JDK-8269345

When you have time, please take a look. Thank you.

..Thomas

Copy link
Member

@simonis simonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanups. Looks good to me now.

@openjdk
Copy link

openjdk bot commented Jun 25, 2021

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8268893: jcmd to trim the glibc heap

Reviewed-by: simonis, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 92 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 25, 2021
@jerboaa
Copy link
Contributor

jerboaa commented Jun 25, 2021

/csr

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Jun 25, 2021
@openjdk
Copy link

openjdk bot commented Jun 25, 2021

@jerboaa this pull request will not be integrated until the CSR request JDK-8269345 for issue JDK-8268893 has been approved.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jun 25, 2021
@tstuefe
Copy link
Member Author

tstuefe commented Jul 9, 2021

I renamed the command as agreed upon in the CSR discussion.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 14, 2021

Gentle ping. Need a second review. @dholmes-ora ?

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Thomas,

This seems okay to me.

Thanks,
David

@tstuefe
Copy link
Member Author

tstuefe commented Jul 14, 2021

Hi Thomas,

This seems okay to me.

Thanks,
David

Thank you David and Volker!

@tstuefe
Copy link
Member Author

tstuefe commented Jul 14, 2021

/integrate

@openjdk
Copy link

openjdk bot commented Jul 14, 2021

@tstuefe This PR has not yet been marked as ready for integration.

@jerboaa
Copy link
Contributor

jerboaa commented Jul 14, 2021

The CSR https://bugs.openjdk.java.net/browse/JDK-8269345 is in Provisional. Once it's approved it should become ready for integration.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 14, 2021

The CSR https://bugs.openjdk.java.net/browse/JDK-8269345 is in Provisional. Once it's approved it should become ready for integration.

Oh, you are right. Good that skara caught that.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 21, 2021

Not sure what to do here honestly.

The CSR seems stuck in "provisional" without any indication if any action is required from my side. This PR has gathered enough reviews meanwhile, but I cannot integrate it without the CSR moving forward.

Someone from Oracle, please advise?

@simonis
Copy link
Member

simonis commented Jul 21, 2021

You have to move the CSR to "Proposed" or "Finalized" otherwise it probably won't appear on Joe's work list.

I've moved it to "Finalized" now because I think all the issues have been resolved. Please feel free to change the state again if you think that's not appropriate.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 21, 2021

You have to move the CSR to "Proposed" or "Finalized" otherwise it probably won't appear on Joe's work list.

I've moved it to "Finalized" now because I think all the issues have been resolved. Please feel free to change the state again if you think that's not appropriate.

Thanks a lot Volker!

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed csr Pull request needs approved CSR before integration labels Jul 22, 2021
@tstuefe
Copy link
Member Author

tstuefe commented Jul 22, 2021

/integrate

@openjdk
Copy link

openjdk bot commented Jul 22, 2021

Going to push as commit 6096dd9.
Since your change was applied there have been 92 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jul 22, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 22, 2021
@openjdk
Copy link

openjdk bot commented Jul 22, 2021

@tstuefe Pushed as commit 6096dd9.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tstuefe tstuefe deleted the jcmd-for-malloc-trim branch August 23, 2021 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
4 participants