Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8297958: NMT: Display peak values #11497

Closed

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Dec 3, 2022

Historical peak values can be very useful in analyzing memory footprint.

For example, want to know how much memory the compiler needs during warmup? You have to get an NMT report at the exact time, with compile arenas at their largest combined extensions. But if we had peak values, you'd see how much the compiler needed by just looking at its arena peak size.

We already collect peak values in debug builds, but never actually display them. Since we already pay for them, we might just as well print them.

There is also a small issue that peak size and count are updated separately. It makes not much sense to treat these as independent values. Therefore this patch changes the meaning and implementation of peak count from today's "highest count" to "count at the point peak size was reached".

How this looks like:

-                        GC (reserved=425748KB, committed=94868KB)
                            (malloc=38552KB #1086) (peak=38622KB #1409)   <<<<
                            (mmap: reserved=387196KB, committed=56316KB) 
 
-                 GCCardSet (reserved=29KB, committed=29KB)
                            (malloc=29KB #387) 
 
-                  Compiler (reserved=200KB, committed=200KB)
                            (malloc=36KB #59) (peak=37KB #74)    <<<<
                            (arena=165KB #5) (peak=6192KB #18)  <<<<
[0x00007f7e6b0866b0] Arena::grow(unsigned long, AllocFailStrategy::AllocFailEnum)+0x40
[0x00007f7e6c117a29] OopMap::OopMap(int, int)+0x69
[0x00007f7e6b2857a5] LinearScan::compute_oop_map(IntervalWalker*, LIR_Op*, CodeEmitInfo*, bool)+0x85
[0x00007f7e6b285f2a] LinearScan::compute_oop_map(IntervalWalker*, LIR_OpVisitState const&, LIR_Op*)+0x5a
                             (malloc=32KB type=Arena Chunk #1) (peak=288KB #9) <<<<

Notes:

  • This RFE just adds peak to the debug VM. In debug, we already have all values, its just a matter of printing them. I would love to see peak values in release builds too, but collecting them does cost one or more CAS per malloc and therefore we must analyze performance before enabling. Having them in release would also remove the remaining #ifdef ASSERT.

  • I omit printing peak values when we are at peak. So if "peak" is missing, current peak is implied.

  • I only print peak values in summary and detail mode, not in any of the diff modes, to keep code complexity low and because diff modes are more about baseline compare.

  • In detail mode, there is a small display issue that call sites will be omitted that have no current allocations. A hypothetical call site that allocated a zillion byte, then freed them all, will not be shown even though its peak value would be interesting. That is an issue for another RFE.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11497/head:pull/11497
$ git checkout pull/11497

Update a local copy of the PR:
$ git checkout pull/11497
$ git pull https://git.openjdk.org/jdk pull/11497/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11497

View PR using the GUI difftool:
$ git pr show -t 11497

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11497.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 3, 2022

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 3, 2022

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Dec 3, 2022
@tstuefe tstuefe force-pushed the JDK-8297958-NMT-show-peak-values branch from 19d57bb to 40a4dfd Compare December 3, 2022 11:58
@tstuefe tstuefe marked this pull request as ready for review December 3, 2022 12:00
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 3, 2022
@mlbridge
Copy link

mlbridge bot commented Dec 3, 2022

Webrevs

@tstuefe tstuefe mentioned this pull request Dec 5, 2022
3 tasks
@ashu-mehra
Copy link
Contributor

Important addition to the memory stats in my opinion. I agree having it in release build would be great.

I omit printing peak values when we are at peak. So if "peak" is missing, current peak is implied.

It can be confusing for others not aware of this behavior, and may think that the "peak" value is missing in the output. I believe it would be less confusing if "peak" is "always" displayed.

@tstuefe
Copy link
Member Author

tstuefe commented Dec 6, 2022

Important addition to the memory stats in my opinion. I agree having it in release build would be great.

I omit printing peak values when we are at peak. So if "peak" is missing, current peak is implied.

It can be confusing for others not aware of this behavior, and may think that the "peak" value is missing in the output. I believe it would be less confusing if "peak" is "always" displayed.

Thank you, @ashu-mehra!

The first version had "(peak)" written after every line if at current peak. I found the output cluttered, but let's see what others say. I'm fine with reintroducing it again.

Cheers, Thomas

@tstuefe
Copy link
Member Author

tstuefe commented Dec 6, 2022

Example output when all peak values are indicated with "(peak)":

-                  Internal (reserved=1032KB, committed=1032KB)
                            (malloc=996KB #27910) (peak) 
                            (mmap: reserved=36KB, committed=36KB) 
 
-                     Other (reserved=2KB, committed=2KB)
                            (malloc=2KB #1) (peak=524290KB #67108865) 
 
-                    Symbol (reserved=1416KB, committed=1416KB)
                            (malloc=1056KB #10614) (peak) 
                            (arena=360KB #1) (peak)
 
-    Native Memory Tracking (reserved=1253KB, committed=1253KB)
                            (malloc=10KB #91) (peak=10KB #95) 
                            (tracking overhead=1244KB)
 
-        Shared class space (reserved=16384KB, committed=13280KB)
                            (mmap: reserved=16384KB, committed=13280KB) 
 
-               Arena Chunk (reserved=185KB, committed=185KB)
                            (malloc=185KB #93) (peak=6156KB #232) 
 
-                    Module (reserved=67KB, committed=67KB)
                            (malloc=67KB #1233) (peak) 
 
-                 Safepoint (reserved=8KB, committed=8KB)
                            (mmap: reserved=8KB, committed=8KB) 
 
-           Synchronization (reserved=223KB, committed=223KB)
                            (malloc=223KB #1856) (peak=223KB #1858) 
 
-            Serviceability (reserved=17KB, committed=17KB)
                            (malloc=17KB #9) (peak) 
 
-                 Metaspace (reserved=65558KB, committed=2582KB)
                            (malloc=22KB #14) (peak) 
                            (mmap: reserved=65536KB, committed=2560KB) 
 
-      String Deduplication (reserved=1KB, committed=1KB)
                            (malloc=1KB #8) (peak) 

Seems a bit verbose, but I'm fine with it if others are.

@jdksjolen
Copy link
Contributor

Important addition to the memory stats in my opinion. I agree having it in release build would be great.

I omit printing peak values when we are at peak. So if "peak" is missing, current peak is implied.

It can be confusing for others not aware of this behavior, and may think that the "peak" value is missing in the output. I believe it would be less confusing if "peak" is "always" displayed.

Thank you, @ashu-mehra!

The first version had "(peak)" written after every line if at current peak. I found the output cluttered, but let's see what others say. I'm fine with reintroducing it again.

Cheers, Thomas

If we are at peak, then instead of writing (malloc=XKB #Y) (peak=XKB #Y) (this is what it'd look like, right?) you can write (malloc=XKB #Y) at peak. It makes the peaking events visually distinct for experienced users without having to match numbers (also easy regexing) and serves as an indication to new users as to what's going on. It could be misinterpreted, but I think the context of a full log makes it obvious what's going on.

Copy link
Contributor

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Thomas!

I believe that this patch is in shape to be merged, so I will approve. I would however appreciate if you consider my nits and comments for this PR, especially regarding the peak values that Ashutosh also mentioned.

Cheers,
Johan


size_t MemoryCounter::peak_size() const {
return Atomic::load(&_peak_size);
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit and pre-existing: Can we add in // ASSERT at end of line here?

Comment on lines -132 to +140
DEBUG_ONLY(inline const MemoryCounter& malloc_counter() const { return _malloc; })
DEBUG_ONLY(inline const MemoryCounter& arena_counter() const { return _arena; })
const MemoryCounter* malloc_counter() const { return &_malloc; }
const MemoryCounter* arena_counter() const { return &_arena; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like we can keep these as returning references, if the printers take them in as references instead. Am I missing something?

I'm fine with keeping them as pointers, but I do want to make sure that I'm not missing some reasoning here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(My first answer got lost somehow)

You understand correctly. I have a slight preference for pointers since one can tell them easier apart in coding, but ultimately just a coin toss here.

@tstuefe
Copy link
Member Author

tstuefe commented Dec 6, 2022

If we are at peak, then instead of writing (malloc=XKB #Y) (peak=XKB #Y) (this is what it'd look like, right?) you can write (malloc=XKB #Y) at peak. It makes the peaking events visually distinct for experienced users without having to match numbers (also easy regexing) and serves as an indication to new users as to what's going on. It could be misinterpreted, but I think the context of a full log makes it obvious what's going on.

Okay, "at peak" it is :-)

@tstuefe
Copy link
Member Author

tstuefe commented Dec 6, 2022

Hi @jdksjolen,

thanks a lot for the quick review. You and @ashu-mehra both wanted a visual indication for the current peak, I added that:

-                  Compiler (reserved=200KB, committed=200KB)
                            (malloc=35KB #57) (peak=50KB #62) 
                            (arena=165KB #5) (peak=6512KB #18)
 
-                  Internal (reserved=1042KB, committed=1042KB)
                            (malloc=1006KB #28240) (at peak) 
                            (mmap: reserved=36KB, committed=36KB) 
 
-                     Other (reserved=2KB, committed=2KB)
                            (malloc=2KB #1) (peak=524290KB #67108865) 
 
-                    Symbol (reserved=1416KB, committed=1416KB)
                            (malloc=1056KB #10614) (at peak) 
                            (arena=360KB #1) (at peak)

Cheers, Thomas

Copy link
Contributor

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very useful change Thomas and it looks good in general. Just two small things to clean up.

Regarding having it on in release build, I think that would make sense as long as the perf impact isn't to big. I would not expect it to be so bad, but certainly good to measure it closely before doing it.

@@ -58,11 +61,21 @@ void MemReporterBase::print_malloc(size_t amount, size_t count, MEMFLAGS flag) c
amount_in_current_scale(amount), scale);
}

// blends out mtChunk count number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we explicitly passed in 0 as the count for mtChunk, but this is not true anymore since we pass in the MemoryCounter. So if we want to omit mtChunk count we need to reset the count here or handle it some other way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What it the reason for omitting the mtChunk count?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch.

I think this has to do with this ugly hack in MallocMemorySnapshot::make_adjustment():

So, we account chunks with mtChunk. Easy. But we also account for arenas. Arenas consists of chunks. To not report memory twice (once under malloced-with-mtChunk, once as "arena") we decrease the malloc size of mtChunk by the combined arena size of all arenas. What's left should be malloc size of free chunks in the free pool.

But that adjustment messes up the mtChunk malloc counter, since it is implemented as one gigantic free. So the counter is garbage.

I'll think of something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I guess the easy fix is just:

const size_t count = (flag == mtChunk) ? 0 : c->count();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Christ this is complex. :-(

Both detail reporter and summary reporter share this printer code, but via MEMFLAG the print output is subtly changed. MEMFLAG=mtNone is taken as "print in summary report format" and != mtNone is "print in detail report format". This is all fixable but makes the already convoluted report not prettier.

The other option would be to, in MallocMemorySnapshot::make_adjustment(), set the count to zero. But I don't dare this because afterwards the counter would assert if it were to be decreased further.

See this mess? MallocMemorySnapshot is not really usable anymore as counter holder after the adjustment is made, because mtChunk counter got modified and now may assert for negative overflow. I believe MallocMemorySnapshot::make_adjustment() is only valid if you don't modify the counters anymore afterwards.

I decided to leave it as it is. I don't think that anyone cares. As I wrote, I have plans to simplify this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this comment should go as well, right?

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented Dec 6, 2022

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8297958: NMT: Display peak values

Reviewed-by: jsjolen, sjohanss

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 70 new commits pushed to the master branch:

  • 1e46832: 8297602: Compiler crash with type annotation and generic record during pattern matching
  • b0e5432: 8297687: new URI(S,S,S,S) throws exception with incorrect index position reported in the error message
  • 2243646: 8298145: Remove ContiguousSpace::capacity
  • 84b927a: 8296024: Usage of DirectBuffer::address should be guarded
  • a9e6c62: 8297186: G1 triggers unnecessary full GCs when heap utilization is low
  • 4458de9: 8297172: Fix some issues of auto-vectorization of Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros()
  • a613998: 8297689: Fix incorrect result of Short.reverseBytes() call in loops
  • f8f4630: 8297963: Partially fix string expansion issues in UTIL_DEFUN_NAMED and related macros
  • 2a243a3: 8267617: Certificate's IP x509 NameConstraints raises ArrayIndexOutOfBoundsException
  • 923c746: 8298057: (fs) Remove PollingWatchService.POLLING_INIT_DELAY
  • ... and 60 more: https://git.openjdk.org/jdk/compare/11ba7591dfd3f7ca58e2e1ac6d1b3e81391f5bfb...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 6, 2022
@tstuefe
Copy link
Member Author

tstuefe commented Dec 6, 2022

Thanks @jdksjolen and @kstefanj for reviews!

I'll wait for the GHAs to complete, then integrate.

@tstuefe
Copy link
Member Author

tstuefe commented Dec 6, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Dec 6, 2022

Going to push as commit 336d230.
Since your change was applied there have been 72 commits pushed to the master branch:

  • 0d2a9ee: 8298142: Update internal comment on language features in SourceVersion
  • 203251f: 8297379: Enable the ByteBuffer path of Poly1305 optimizations
  • 1e46832: 8297602: Compiler crash with type annotation and generic record during pattern matching
  • b0e5432: 8297687: new URI(S,S,S,S) throws exception with incorrect index position reported in the error message
  • 2243646: 8298145: Remove ContiguousSpace::capacity
  • 84b927a: 8296024: Usage of DirectBuffer::address should be guarded
  • a9e6c62: 8297186: G1 triggers unnecessary full GCs when heap utilization is low
  • 4458de9: 8297172: Fix some issues of auto-vectorization of Long.bitCount/numberOfTrailingZeros/numberOfLeadingZeros()
  • a613998: 8297689: Fix incorrect result of Short.reverseBytes() call in loops
  • f8f4630: 8297963: Partially fix string expansion issues in UTIL_DEFUN_NAMED and related macros
  • ... and 62 more: https://git.openjdk.org/jdk/compare/11ba7591dfd3f7ca58e2e1ac6d1b3e81391f5bfb...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 6, 2022
@openjdk openjdk bot closed this Dec 6, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 6, 2022
@openjdk
Copy link

openjdk bot commented Dec 6, 2022

@tstuefe Pushed as commit 336d230.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tstuefe tstuefe deleted the JDK-8297958-NMT-show-peak-values branch February 15, 2023 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants