Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8293114: GC should trim the native heap #10085

Closed
wants to merge 41 commits into from

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Aug 30, 2022

(Updated 2023-07-05 to reflect the current state of the patch)

This RFE adds the option to auto-trim the Glibc heap as part of the GC cycle. If the VM process suffered high temporary malloc spikes (regardless of whether from JVM- or user code), this could recover significant amounts of memory.

We discussed this a year ago [1], but the item got pushed to the bottom of my work pile, therefore, it took longer than I thought.

Motivation

The Glibc is reluctant to return memory to the OS, more so than other allocators. Temporary malloc spikes often carry over as permanent RSS increase. Note that C-heap retention is difficult to observe. Since it is freed memory, it won't appear in NMT; it is just a part of RSS.

This is, effectively, caching, and a performance tradeoff by the glibc. It makes a lot of sense with applications that cause high traffic on the C-heap (the typical native application). The JVM, however, clusters allocations and for a lot of use cases rolls its own memory management via mmap. And app's malloc load can fluctuate wildly, with temporary spikes and long idle periods.

To help, Glibc exports an API to trim the C-heap: malloc_trim(3). With JDK 18 [2], SAP contributed a new jcmd command to manually trim the C-heap on Linux. This RFE adds a complementary way to trim automatically.

Is this even a problem?

Yes.

The JVM clusters most native memory allocations and satisfies them with mmap. But there are enough C-heap allocations left to cause malloc spikes that are subject of memory retention. Note that one example are hotspot arenas themselves.

But many cases of high memory retention in Glibc I have seen in third-party JNI code. Libraries allocate large buffers via malloc as temporary buffers. In fact, since we introduced the jcmd "System.trim_native_heap", some of our customers started to call this command periodically in scripts to counter these issues.

How trimming works

Trimming is done via malloc_trim(2). malloc_trim will iterate over all arenas and trim each one subsequently. While doing that, it will lock the arena, which may cause some (but not all) subsequent actions on the same arenas to block. glibc also trims automatically on free, but that is very limited (see #10085 (comment) for details).

malloc_trim offers almost no way to control its behavior; in particular, no way to limit its runtime. Its runtime will depend on the size of the reclaimed memory. Not reclaiming anything is very fast (sub-ms). Reclaiming very large memory sections (many GB) may take considerably longer.

When to trim?

We cannot use the ServiceThread to trim, since the runtime the trim takes is unknown. Therefore we need to do this fully concurrently in an own thread.

We trim in regular intervals, but pause the trimming during "important" phases: STW GC phases, or when doing bulk heap operations. Note that "pausing" here means we delay the start of the next trim to after the pause. If a trim is already running, there is no way to stop it.

How it works:

Patch adds new options (experimental for now, and shared among all GCs):

-XX:+GCTrimNativeHeap
-XX:GCTrimNativeHeapInterval=<seconds> (defaults to 60)

GCTrimNativeHeap is off by default. If enabled, it will cause the VM to trim the native heap on full GCs as well as periodically. The period is defined by GCTrimNativeHeapInterval.

Examples:

This is an artificial test that causes two high malloc spikes with long idle periods. Observe how RSS recovers with trim but stays up without trim. The trim interval was set to 15 seconds for the test, and no GC was invoked here; this is periodic trimming.

alloc-test

(See here for parameters: run script )

Spring pet clinic boots up, then idles. Once with, once without trim, with the trim interval at 60 seconds default. Of course, if it were actually doing something instead of idling, trim effects would be smaller. But the point of trimming is to recover memory in idle periods.

petclinic bootup

(See here for parameters: run script )

Tests

Tested older Glibc (2.31), and newer Glibc (2.35) (mallinfo() vs mallinfo2()), on Linux x64.

The rest of the tests will be done by GHA and in our SAP nightlies.

Remarks

How about other allocators?

I have seen this retention problem mainly with the Glibc and the AIX libc. Muslc returns memory more eagerly to the OS. I also tested with jemalloc and found it also reclaims more aggressively, therefore I don't think MacOS or BSD are affected that much by retention either.

Trim costs?

Trim-native is a tradeoff between memory and performance. We pay

  • The cost to do the trim depends on how much is trimmed. Time ranges on my machine between < 1ms for no-op trims, to ~800ms for 32GB trims.
  • The cost for re-acquiring the memory, should the memory be needed again, is the second cost factor.

glibc.malloc.trim_threshold?

glibc has a tunable that looks like it could influence the willingness of Glibc to return memory to the OS, the "trim_threshold". In practice, I could not get it to do anything useful.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8293114: GC should trim the native heap (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/10085/head:pull/10085
$ git checkout pull/10085

Update a local copy of the PR:
$ git checkout pull/10085
$ git pull https://git.openjdk.org/jdk.git pull/10085/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10085

View PR using the GUI difftool:
$ git pr show -t 10085

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10085.diff

Webrev

Link to Webrev Comment

@tstuefe tstuefe changed the title trim-native JDK-8293114: Let GC trim the native heap Aug 30, 2022
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 30, 2022

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 30, 2022

@tstuefe The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Aug 30, 2022
@tstuefe tstuefe changed the title JDK-8293114: Let GC trim the native heap JDK-8293114: GC should trim the native heap Aug 31, 2022
@tstuefe tstuefe marked this pull request as ready for review August 31, 2022 08:23
@tstuefe
Copy link
Member Author

tstuefe commented Aug 31, 2022

/label remove shenandoah
/label remove serviceability

@openjdk openjdk bot added rfr Pull request is ready for review and removed shenandoah shenandoah-dev@openjdk.org labels Aug 31, 2022
@openjdk
Copy link

openjdk bot commented Aug 31, 2022

@tstuefe
The shenandoah label was successfully removed.

@openjdk openjdk bot removed the serviceability serviceability-dev@openjdk.org label Aug 31, 2022
@openjdk
Copy link

openjdk bot commented Aug 31, 2022

@tstuefe
The serviceability label was successfully removed.

@mlbridge
Copy link

mlbridge bot commented Aug 31, 2022

Webrevs

@zhengyu123
Copy link
Contributor

@tstuefe Nice work!

I also looked into memory usage recently, I found that os::print_os_info() is very usefully. I would like to purpose to add a diagnostics flag, e.g. PrintProcessInfoAtExit to capture some insights of the process, such as RSS, Peak RSS, C-Heap and C-Heap Retained.

Here are some numbers I captured

C-Heap Outstanding (K) C-Heap Retained (K)
Compiler 106387 271692
Sunflow 103882 173929
Compress 37794 11605
CrytioAes 37579 30580
CryptoRsa 38318 73001
CryptoSignVerify 37790 38769
Derby 55389 186954
MpegAudio 40280 71743
ScimarkFFT.large 36698 8945
ScimarkFFT.small 37071 10824
ScimarkLU.large 38018 8529
ScimarkLU.small 39198 8585
ScimarkMonteCarlo 36946 6529
ScimarkSOR.large 36648 8787
ScimarkSOR.small 36569 9078
ScimarkSparse.large 36745 9182
ScimarkSparse.small 37253 8574
Serial. 38652 127231
Sunflow.test. 45450 56657
XmlTransform. 83116 289431
XmlValidation 59883 59883
 
I captured the number at JVM exit, so it might not reflect runtime characters. But some number is quite interesting, e.g. Derby retained set is about x3 of outstanding set.

@tstuefe
Copy link
Member Author

tstuefe commented Sep 1, 2022

Hi Zhengyu,

@tstuefe Nice work!

Thanks :)

I also looked into memory usage recently, I found that os::print_os_info() is very usefully. I would like to purpose to add a diagnostics flag, e.g. PrintProcessInfoAtExit to capture some insights of the process, such as RSS, Peak RSS, C-Heap and C-Heap Retained.

A small problem with "C-Heap Retained" is that it also contains blocks that are either trimmed or have never been (fully) paged in, meaning, that number may not relate 1:1 to RSS. Glibc observability is really annoying.

About the "AtExit", in our SAP JVM I added a simple "DumpInfoAtExit", which prints out what "jcmd VM.info" would give you. Basically, a hs-err file without the err part. That contains a whole lot of information, including NMT and process memory information. That may be a better way than adding individual XXAtExit flags, what do you think?

Another thing I would like to do is to enable Peak usage numbers in NMT for release builds too. That way one can see at VM exit how much memory each subsystem used.

Here are some numbers I captured

C-Heap Outstanding (K) C-Heap Retained (K)
Compiler 106387 271692
Sunflow 103882 173929
Compress 37794 11605
CrytioAes 37579 30580
CryptoRsa 38318 73001
CryptoSignVerify 37790 38769
Derby 55389 186954
MpegAudio 40280 71743
ScimarkFFT.large 36698 8945
ScimarkFFT.small 37071 10824
ScimarkLU.large 38018 8529
ScimarkLU.small 39198 8585
ScimarkMonteCarlo 36946 6529
ScimarkSOR.large 36648 8787
ScimarkSOR.small 36569 9078
ScimarkSparse.large 36745 9182
ScimarkSparse.small 37253 8574
Serial. 38652 127231
Sunflow.test. 45450 56657
XmlTransform. 83116 289431
XmlValidation 59883 59883
 
I captured the number at JVM exit, so it might not reflect runtime characters.
But some number is quite interesting, e.g. Derby retained set is about x3 of outstanding set.

Interesting, since it means malloc peaks are more common then we think. All the more reason to have a feature like this, to trim the C-heap.

Cheers, Thomas

@tstuefe
Copy link
Member Author

tstuefe commented Sep 1, 2022

/label hotspot-gc

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Sep 1, 2022
@openjdk
Copy link

openjdk bot commented Sep 1, 2022

@tstuefe
The hotspot-gc label was successfully added.

@zhengyu123
Copy link
Contributor

Hi Zhengyu,

@tstuefe Nice work!

Thanks :)

I also looked into memory usage recently, I found that os::print_os_info() is very usefully. I would like to purpose to add a diagnostics flag, e.g. PrintProcessInfoAtExit to capture some insights of the process, such as RSS, Peak RSS, C-Heap and C-Heap Retained.

A small problem with "C-Heap Retained" is that it also contains blocks that are either trimmed or have never been (fully) paged in, meaning, that number may not relate 1:1 to RSS. Glibc observability is really annoying.

About the "AtExit", in our SAP JVM I added a simple "DumpInfoAtExit", which prints out what "jcmd VM.info" would give you. Basically, a hs-err file without the err part. That contains a whole lot of information, including NMT and process memory information. That may be a better way than adding individual XXAtExit flags, what do you think?

I am not sure, because it overlaps many other XXAtExit options?

Another thing I would like to do is to enable Peak usage numbers in NMT for release builds too. That way one can see at VM exit how much memory each subsystem used.

Here are some numbers I captured
C-Heap Outstanding (K) C-Heap Retained (K)
Compiler 106387 271692
Sunflow 103882 173929
Compress 37794 11605
CrytioAes 37579 30580
CryptoRsa 38318 73001
CryptoSignVerify 37790 38769
Derby 55389 186954
MpegAudio 40280 71743
ScimarkFFT.large 36698 8945
ScimarkFFT.small 37071 10824
ScimarkLU.large 38018 8529
ScimarkLU.small 39198 8585
ScimarkMonteCarlo 36946 6529
ScimarkSOR.large 36648 8787
ScimarkSOR.small 36569 9078
ScimarkSparse.large 36745 9182
ScimarkSparse.small 37253 8574
Serial. 38652 127231
Sunflow.test. 45450 56657
XmlTransform. 83116 289431
XmlValidation 59883 59883
 
I captured the number at JVM exit, so it might not reflect runtime characters.
But some number is quite interesting, e.g. Derby retained set is about x3 of outstanding set.

Interesting, since it means malloc peaks are more common then we think. All the more reason to have a feature like this, to trim the C-heap.

Cheers, Thomas

@zhengyu123
Copy link
Contributor

Another question: does it have to create another thread? can it piggyback to e.g. ServiceThread?

@tstuefe
Copy link
Member Author

tstuefe commented Sep 3, 2022

Another question: does it have to create another thread? can it piggyback to e.g. ServiceThread?

It does so on Shenandoah, but the problem is the time trimming takes. As I wrote, it can take <1ms to up to almost a second. You then block the service thread for that time. So, even on Shenandoah I am not sure I shouldn't use a different thread.

@zhengyu123
Copy link
Contributor

Another question: does it have to create another thread? can it piggyback to e.g. ServiceThread?

It does so on Shenandoah, but the problem is the time trimming takes. As I wrote, it can take <1ms to up to almost a second. You then block the service thread for that time. So, even on Shenandoah I am not sure I shouldn't use a different thread.

Yea, you probably should not use ShenandoahControlThread, it may delay GC from happening.

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jun 15, 2023
@tstuefe
Copy link
Member Author

tstuefe commented Jun 23, 2023

Down Bot. Down.

@shipilev
Copy link
Member

Do you need help moving this forward, @tstuefe?

@robehn
Copy link
Contributor

robehn commented Jun 30, 2023

Do you need help moving this forward, @tstuefe?

Agreed, I think you should open PR.

@tstuefe
Copy link
Member Author

tstuefe commented Jun 30, 2023

Thanks @robehn and @shipilev !

This keeps getting pushed down the pile. I'll dust this up next week and post a PR.

@shipilev
Copy link
Member

Okay, cool. The reason I am asking it that glibc "memory leaks" are not uncommon in production cases. There are quite a few libraries that churn native memory (looks at Netty), even with internal pooling. Having something that is backportable to 21u and 17u would be a plus.

@tstuefe
Copy link
Member Author

tstuefe commented Jun 30, 2023

Okay, cool. The reason I am asking it that glibc "memory leaks" are not uncommon in production cases. There are quite a few libraries that churn native memory (looks at Netty), even with internal pooling. Having something that is backportable to 21u and 17u would be a plus.

Thanks, nice to see a confirmation that this is useful.

This patch started out simple, and then I fear I started to seriously over-engineer the GC part of it. I'll give it a look next week to see if I can dumb it down.

@simonis
Copy link
Member

simonis commented Jul 3, 2023

My main concern with this change is increased latency. You wrote "..concurrent malloc/frees are usually not blocked while trimming if they are satisfied from the local arena..". Not sure what "usually" means here and how many mallocs are satisfied from a local arena. But introducing pauses up to a second seems significant for some applications.

The other question is that I still don't understand if glibc-malloc will ever call malloc_trim() automatically (and in that case introduce the latency anyway). The manpage says that malloc_trim() "..is automatically called by free(3) in certain circumstances; see the discussion of M_TOP_PAD and M_TRIM_THRESHOLD in mallopt(3).." but you reported that you couldn't observe any cleanup effect when playing around with M_TRIM_THRESHOLD. In the end, calling malloc_trim() periodically might even help to decrease latency if this prevents seldom, but longer automatic invocations of malloc_trim() by glibc itself.

@robehn
Copy link
Contributor

robehn commented Jul 3, 2023

My main concern with this change is increased latency. You wrote "..concurrent malloc/frees are usually not blocked while trimming if they are satisfied from the local arena..". Not sure what "usually" means here and how many mallocs are satisfied from a local arena. But introducing pauses up to a second seems significant for some applications.

The other question is that I still don't understand if glibc-malloc will ever call malloc_trim() automatically (and in that case introduce the latency anyway). The manpage says that malloc_trim() "..is automatically called by free(3) in certain circumstances; see the discussion of M_TOP_PAD and M_TRIM_THRESHOLD in mallopt(3).." but you reported that you couldn't observe any cleanup effect when playing around with M_TRIM_THRESHOLD. In the end, calling malloc_trim() periodically might even help to decrease latency if this prevents seldom, but longer automatic invocations of malloc_trim() by glibc itself.

The trim performed automatically on some free() is one done in the 'chunk' you were freeing in.
While the explicit call visits all 'chunks'. @jdksjolen can explain this more deeply.

I share your concern.
But as this is a opt-in and the benefits for a certain set of workloads overwhelms the risk of latency increases I'm for this change.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 4, 2023

@simonis @robehn Thanks for thinking this through.

My main concern with this change is increased latency. You wrote "..concurrent malloc/frees are usually not blocked while trimming if they are satisfied from the local arena..". Not sure what "usually" means here and how many mallocs are satisfied from a local arena. But introducing pauses up to a second seems significant for some applications.

From looking at the sources (glibc 2.31), I see malloc_trim iterates over all arenas and locks each arena while trimming it. I also see this lock getting locked when:

  • creating, re-assigning arenas
  • on statistics (malloc_stats, mallinfo2, malloc_info)
  • on arena_get_retry() which AFAICS seems to be "stealing" from a neighboring arena if my arena is full
  • on the free path, when adding chunks to the fast bin, but guarded by __builtin_expect(.., 0), so probably a very rare path
  • on realloc'ing non-mmapped chunks.

So, malloc_trim will incovenience concurrent reallocs, and rarely frees, or allocations that cause arena stealing or allocating new arenas. I may have missed some cases, but it makes sense that glibc attempts to avoid locking as much as possible.

About the "up to a second" - this was measured on my machine with ~32GB of reclaimable memory. Having that much floating garbage in the C-heap would hopefully be rare.

The other question is that I still don't understand if glibc-malloc will ever call malloc_trim() automatically (and in that case introduce the latency anyway). The manpage says that malloc_trim() "..is automatically called by free(3) in certain circumstances; see the discussion of M_TOP_PAD and M_TRIM_THRESHOLD in mallopt(3).." but you reported that you couldn't observe any cleanup effect when playing around with M_TRIM_THRESHOLD. In the end, calling malloc_trim() periodically might even help to decrease latency if this prevents seldom, but longer automatic invocations of malloc_trim() by glibc itself.

From looking at the sources, the glibc trims on free:

  • the returned chunk may go into the thread local cache (tcache) or into the fastbin. In both cases, the chunk still counts as used. Nothing happens.
  • Otherwise, the returned chunk gets merged with its immediate neighbors (once, so not recursively). If the resulting size is larger than 64K, glibc calls trim, but only for the arena the chunk is contained in.

As you can see, trim only happens sometimes. I did experiments with mallocing, then freeing, 64K 30000 times:

  1. done from 10 threads will leave me with a remainder of 300-500MB unreclaimed RSS after all is freed.
  2. done from 1 thread, or setting MALLOC_ARENA_MAX=1, ends up reclaiming most of the memory.

Unfortunately, most of C-Heap allocations are a lot finer grained than 64K.

Update

I see we also lock on the malloc path if we don't pull the chunk from the tcache... .

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Jul 5, 2023
@shipilev
Copy link
Member

shipilev commented Jul 5, 2023

So I see that malloc_trim actually has the parameter pad:

       The pad argument specifies the amount of free space to leave
       untrimmed at the top of the heap.  If this argument is 0, only
       the minimum amount of memory is maintained at the top of the heap
       (i.e., one page or less).  A nonzero argument can be used to
       maintain some trailing space at the top of the heap in order to
       allow future allocations to be made without having to extend the
       heap with [sbrk(2)](https://man7.org/linux/man-pages/man2/sbrk.2.html).

Does current glibc honor that argument at all? Can we use that to control the incrementality of the trim? Since malloc_trim return value also describes if trimming was successful, we could probably come up with the adaptive scheme that guesses which pad to use, so that we don't trim the entirety of the memory, but still trim something.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 5, 2023

So I see that malloc_trim actually has the parameter pad:

       The pad argument specifies the amount of free space to leave
       untrimmed at the top of the heap.  If this argument is 0, only
       the minimum amount of memory is maintained at the top of the heap
       (i.e., one page or less).  A nonzero argument can be used to
       maintain some trailing space at the top of the heap in order to
       allow future allocations to be made without having to extend the
       heap with [sbrk(2)](https://man7.org/linux/man-pages/man2/sbrk.2.html).

Does current glibc honor that argument at all? Can we use that to control the incrementality of the trim? Since malloc_trim return value also describes if trimming was successful, we could probably come up with the adaptive scheme that guesses which pad to use, so that we don't trim the entirety of the memory, but still trim something.

This was one of the first things I tried last year. Does have very little impact. IIRC it only applies to the main arena (the one using sbrk) and limits the amount by which the break was lowered? I may remember the details wrong, but my tests also showed very little effect.

The bulk of memory reclamation comes from the glibc MADV_DONTNEED'ing free chunks. I think it does that without any bookkeeping, so for subsequent trims, it has no idea of how much memory of that range was paged in. So I think there is no way to implement a limiting trim via the size parameter.

I though about that and I think the only way to implement a limited trim would be to add a new API, since you cannot use the existing one without breaking compatibility. I always meant to ask Florian about this. I will tomorrow.

In any case, this would only be a solution for future glibcs, not for the ones that are around.

@tstuefe tstuefe marked this pull request as ready for review July 5, 2023 17:21
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 5, 2023
@tstuefe
Copy link
Member Author

tstuefe commented Jul 5, 2023

@robehn @zhengyu123 @shipilev @simonis
Thank you all for your support and input. I dusted off the patch and simplified it:

  • removed the adaptive step-down logic, since that was overly involved and in my tests did not work that well
  • removed the expedite-trim logic.
  • Pauses now stack.

So, in very few words, this patch

  • adds an optional thread to execute trims at periodic intervals
  • can be temporarily paused.
  • I guarded sections that are vulnerable against concurrent work (GC STW phases) or that are doing build C-heap operations (e.g. monitor bulk deletion, stringtable cleanups, arena cleanups etc) with pauses.

I'll do some more benchmarks over the next days, but honestly don't expect to see this raising above background noise. If I have time, I also will simulate heavy C-Heap activity to give the trim something to do.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursory review follows.

Generally, does the issue synopsis reflect what is going on correctly? This is not about "GC should trim" anymore, as we have a perfectly separate thread for this. In fact, we very specifically do not trim during GC :)

Related question if we want to use gc, trim tag for this, or just trim.

@@ -2986,3 +2986,8 @@ bool os::supports_map_sync() {
}

void os::print_memory_mappings(char* addr, size_t bytes, outputStream* st) {}

// stubbed-out trim-native support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these comments should be more succinct. Example: "Native heap trimming is not implemented yet." (this tells it can be implemented in future)

int fordblks;
int keepcost;
};
typedef struct glibc_mallinfo (*mallinfo_func_t)(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be os::Linux::glibc_mallinfo for consistency?

Comment on lines 5431 to 5440
out->arena = (int) mi.arena;
out->ordblks = (int) mi.ordblks;
out->smblks = (int) mi.smblks;
out->hblks = (int) mi.hblks;
out->hblkhd = (int) mi.hblkhd;
out->usmblks = (int) mi.usmblks;
out->fsmblks = (int) mi.fsmblks;
out->uordblks = (int) mi.uordblks;
out->fordblks = (int) mi.fordblks;
out->keepcost = (int) mi.keepcost;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: please indent it so that = are in the same column?

#ifdef __GLIBC__
return true;
#else
return false; // musl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid comments like // musl -- it might mislead if we ever go for e.g. uClibc and friends?

//
// The mode is set as argument to GCTrimNative::initialize().

class NativeTrimmer : public ConcurrentGCThread {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit ugly it pretends to be ConcurrentGCThread. Can it be just NamedThread?


class NativeTrimmerThread : public ConcurrentGCThread {

Monitor* _lock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const?

@@ -174,6 +174,7 @@ class outputStream;
f(full_gc_heapdump_post, " Post Heap Dump") \
\
f(conc_uncommit, "Concurrent Uncommit") \
f(conc_trim, "Concurrent Trim") \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a leftover.

Comment on lines +99 to +101
STATIC_ASSERT(_num_pools == 4);
return !_pools[0].empty() || !_pools[1].empty() ||
!_pools[2].empty() || !_pools[3].empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
STATIC_ASSERT(_num_pools == 4);
return !_pools[0].empty() || !_pools[1].empty() ||
!_pools[2].empty() || !_pools[3].empty();
for (int i = 0; i < _num_pools; i++) {
if (!_pools[i].empty()) return false;
}
return true;

static void clean() {
for (int i = 0; i < _num_pools; i++) {
_pools[i].prune();
ThreadCritical tc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need ThreadCritical here? I would have thought PauseMark handles everything right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to pause. I introduced an empty check on all pools, but that has to happen under lock protection too. So I moved up the 4 ThreadCriticals from the prune functions to this function.

The point is to avoid pausing if nothing is done, which is most of the time. Also, instead of 4 calls to ThreadCritical, just one.

// Execute the native trim, log results.
void execute_trim_and_log() const {
assert(os::can_trim_native_heap(), "Unexpected");
const int64_t tnow = now();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks redundant.

@shipilev
Copy link
Member

shipilev commented Jul 5, 2023

Maybe it would be cleaner to close this PR and open a new one, now that feature took another turn.
I just realized my last review included comments I did not commit as review since last year! Oops.

@tstuefe
Copy link
Member Author

tstuefe commented Jul 6, 2023

Closing this PR in favour of a new, slightly renamed one, as Aleksey suggested

@xtf2009
Copy link

xtf2009 commented Jan 9, 2024

any chance this feature can backport to jdk11 and 17?

@tstuefe
Copy link
Member Author

tstuefe commented Jan 9, 2024

any chance this feature can backport to jdk11 and 17?

This feature (different PR, see #14781) has been backported to jdk 17 already. 11, its possible, but not a priority and needs to be negotiated with the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org rfr Pull request is ready for review
6 participants