Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8289524: Add JFR JIT restart event #9334

Closed
wants to merge 9 commits into from

Conversation

MBaesken
Copy link
Member

@MBaesken MBaesken commented Jun 30, 2022

The JIT compiler restarts (see restart_compiler in NMethodSweeper::sweep_code_cache) would be a helpful addition to the JFR events. Currently we log the JIT stop operations in JFR (EventCodeCacheFull) but no restart.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9334/head:pull/9334
$ git checkout pull/9334

Update a local copy of the PR:
$ git checkout pull/9334
$ git pull https://git.openjdk.org/jdk pull/9334/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9334

View PR using the GUI difftool:
$ git pr show -t 9334

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9334.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 30, 2022

👋 Welcome back mbaesken! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 30, 2022
@openjdk
Copy link

openjdk bot commented Jun 30, 2022

@MBaesken The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jun 30, 2022
@mlbridge
Copy link

mlbridge bot commented Jun 30, 2022

Copy link
Contributor

@parttimenerd parttimenerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and simple

@MBaesken
Copy link
Member Author

MBaesken commented Jul 4, 2022

Looks good and simple

Hi, thanks for the review ! May I have a second review ?

@mgronlun
Copy link

mgronlun commented Jul 4, 2022

Hi @MBaesken, perhaps we should take a larger view of this functionality and incorporate it into the SweepCodeCache event. I don't see any information provided that would reflect on overall CodeCache memory before vs after in relation to the sweeper? There are only counts. What if we extend the event SweepCodeCache with fields to reflect "memory before sweep", "memory after sweep" and a boolean "compiler restart". In addition, there are metadata aspects that need to be addressed, like for example, this is not a durational event, and since it is issued only by the Sweeper thread it will not have a stack trace etc.

@egahlin
Copy link
Member

egahlin commented Jul 4, 2022

A memory before or after field should have the contentType="bytes" and we might as well use ulong as data type. There is no additional cost since data is stored using compressed integers.

@mgronlun
Copy link

mgronlun commented Jul 4, 2022

Thanks for clarifying Erik, I forgot to mention the contentType.

@MBaesken
Copy link
Member Author

MBaesken commented Jul 5, 2022

Hi Markus, I think it would be possible and probably make sense to instead enhance the existing EventSweepCodeCache.
Would it be okay to move down the EventSweepCodeCache to the end of NMethodSweeper::sweep_code_cache() ?
I think post_sweep_event would get 2 more parameters, a boolean for JIT-restart and an integer freed_memory.
Are you fine with this ?
Are there some compatibility concerns when enhancing an existing event EventSweepCodeCache ?

@mgronlun
Copy link

mgronlun commented Jul 5, 2022

Thanks for considering. It is fine to move the EventSweepCodeCache event around, you can see that the constructor takes the UNTIMED value. This means that timestamping is handled explicitly, by setting the set_starttime() and set_endtime() fields For compatibility, we can think of it like we are creating a subclass with extended fields. Just append the additional fields to the event. As for the memory field declarations, you can take a peek at other events in metadata.xml, for example, this field declaration is from CodeCacheFull:

<Field type="ulong" contentType="bytes" name="unallocatedCapacity" label="Unallocated" /> 

@MBaesken
Copy link
Member Author

MBaesken commented Jul 5, 2022

Hi Markus, another question that came up while looking into this - the current SweepCodeCache event has a threshold of 100 ms set in both default.jfc and profile.jfc . This was probably fine for existing usages. But would we loose the JIT restart events sometimes in case we incorporate the Jit restart and frred memory into the current SweepCodeCache event ?

@mgronlun
Copy link

mgronlun commented Jul 5, 2022

That is a good reflection. Yes, if the duration of the restartable sweep is below the threshold, then no event will be sent.

@mgronlun
Copy link

mgronlun commented Jul 5, 2022

Can you explain a bit more about what "JIT restart" actually means?

@MBaesken
Copy link
Member Author

MBaesken commented Jul 5, 2022

Can you explain a bit more about what "JIT restart" actually means?

The comment at line 433 of sweeper.cpp and following is explaining it.
The JIT compilation had been stopped before and now, after sweeping freed potentially some memory,
JIT compilation is re-enabled.
That's why also the log.debug("restart compiler"); in the code.

@mgronlun
Copy link

mgronlun commented Jul 5, 2022

Ok, so there is a corresponding "compiler stopped", implicitly noted by firing EventCodeCacheFull?

@MBaesken
Copy link
Member Author

MBaesken commented Jul 5, 2022

Ok, so there is a corresponding "compiler stopped", implicitly noted by firing EventCodeCacheFull?

Yes I think the EventCodeCacheFull (see CodeCache::report_codemem_full ) covers the JIT compiler stop pretty well.
That's why I did not attempt to add a JIT stop JFR event because we have this already.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented Jul 5, 2022

@MBaesken This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8289524: Add JFR JIT restart event

Reviewed-by: kvn, lucy

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 138 new commits pushed to the master branch:

  • 6882f0e: 8290013: serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java failed "assert(!in_vm) failed: Undersized StackShadowPages"
  • 92067e2: 8290137: riscv: small refactoring for add_memory_int32/64
  • 87340fd: 8288883: C2: assert(allow_address || t != T_ADDRESS) failed after JDK-8283091
  • bc7a1ea: 8288948: Few J2DBench tests indicate lower primitive drawing performance with metal rendering pipeline
  • 84f2314: 8286030: Avoid JVM crash when containers share the same /tmp dir
  • 4dd236b: 8290280: riscv: Clean up stack and register handling in interpreter
  • 522b657: Merge
  • 15d3329: 8281969: Bad result for the snippet @link tag if substring/regex consists of whitespace
  • c8e0315: 8290250: Shenandoah: disable Loom for iu mode
  • fb27ddc: 8290252: Add TEST.properties to java/nio/channels/FileChannel and move tests out of largeMemory sub-dir
  • ... and 128 more: https://git.openjdk.org/jdk/compare/e05b2f2c3b9b0276099766bc38a55ff835c989e1...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 5, 2022
@mgronlun
Copy link

mgronlun commented Jul 6, 2022

Hi again Mathias, based on your observation that this does not really align well with the threshold parameter for SweepCodeCache and is a complement event to EventCodeCacheFull, I believe your original suggestion is better (an instant event, only for "jit restart"). I was slightly tripped up with the concept of a "JIT restart", because at this point the JIT is already stopped, as implied by EventCodeCacheFull. So "JIT restart" here contextually means, "JIT start", as in "start JITting code again". Do you know if we have an event that describes the CodeCache settings in terms of memory sizes so that the "freed memory" can be interpreted relative to it?

@mgronlun
Copy link

mgronlun commented Jul 6, 2022

So the timestamp of this "JIT restart" event minus the timestamp of the previous "CodeCacheFull" event is the duration where JIT compilation is disabled, the reason being no memory available to accommodate new code. That's a good data point.

@egahlin
Copy link
Member

egahlin commented Jul 6, 2022

It would be good to add a sanity check. See:

The test in on the ProblemList.txt due to timeouts, but probably works most of the time if you run it.

Also remove "JitRestart" from TestLookForUntestedEvents.java

@MBaesken
Copy link
Member Author

MBaesken commented Jul 6, 2022

Hi again Mathias, based on your observation that this does not really align well with the threshold parameter for SweepCodeCache and is a complement event to EventCodeCacheFull, I believe your original suggestion is better (an instant event, only for "jit restart"). I was slightly tripped up with the concept of a "JIT restart", because at this point the JIT is already stopped, as implied by EventCodeCacheFull. So "JIT restart" here contextually means, "JIT start", as in "start JITting code again". Do you know if we have an event that describes the CodeCache settings in terms of memory sizes so that the "freed memory" can be interpreted relative to it?

Hi Markus, at least we store a couple of addresses in the EventCodeCacheFull event. Those could potentially be used for interpretation.
EventCodeCacheFull event;
event.set_startAddress((u8)heap->low_boundary());
event.set_commitedTopAddress((u8)heap->high());
event.set_reservedTopAddress((u8)heap->high_boundary());

@mgronlun
Copy link

mgronlun commented Jul 6, 2022

Yes, thank you, I took a look at those fields too. Unfortunately, there is no value that bears directly on "current in-use". The reserved and committed will not be updated. If there was a value that could reflect the current usage, even if it means adding it to both EventCodeCacheFull and the JIT restart, that would be perfect. EventCodeCacheFull because in-use is x, JIT restart because in-use is now y. There are some statistics-related properties in the CodeHeap, like for example heap->unallocated_capacity(); et al. Could some of those expose this running value perhaps?

@MBaesken
Copy link
Member Author

MBaesken commented Jul 6, 2022

Yes, thank you, I took a look at those fields too. Unfortunately, there is no value that bears directly on "current in-use". The reserved and committed will not be updated. If there was a value that could reflect the current usage, even if it means adding it to both EventCodeCacheFull and the JIT restart, that would be perfect. EventCodeCacheFull because in-use is x, JIT restart because in-use is now y. There are some statistics-related properties in the CodeHeap, like for example heap->unallocated_capacity(); et al. Could some of those expose this running value perhaps?

Probably we would need CodeCache::unallocated_capacity() (contains a number of code heaps as far as I know) to compare freed_memory to, because from what I see we iterate the whole CodeCache in NMethodSweeper::sweep_code_cache() .
But maybe someone with better knowledge about the CodeCache should comment here.

@RealLucy
Copy link
Contributor

RealLucy commented Jul 6, 2022

To get an idea of how much memory was freed in relation to what's actually available, you could evaluate CodeCache:max_capacity(). This function returns the overall size of the code heap (across all segments). The performance impact is minimal and no locks are acquired.

Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

@MBaesken
Copy link
Member Author

MBaesken commented Jul 7, 2022

So is the common opinion to get back to a separate JIT start event (I think that naming is prefered over JIT restart, am I correct) ?
Additionally we add CodeCache:max_capacity() at both JIT start and EventCodeCacheFull ?

@RealLucy
Copy link
Contributor

RealLucy commented Jul 7, 2022

So is the common opinion to get back to a separate JIT start event (I think that naming is prefered over JIT restart, am I correct) ?
Additionally we add CodeCache:max_capacity() at both JIT start and EventCodeCacheFull ?

I would be in favour of that.

@mgronlun
Copy link

mgronlun commented Jul 7, 2022

So is the common opinion to get back to a separate JIT start event (I think that naming is prefered over JIT restart, am I correct) ? Additionally we add CodeCache:max_capacity() at both JIT start and EventCodeCacheFull ?

I think it is fine to use the term "JIT restart" because it is in use both in the code as well as in the output of log statements. Another reason would be that the first "JIT start" event would always be missing. It was only the meaning that got me a bit confused. Yes, a separate event is preferrable having no duration (startTime=false), no stack trace (stackTrace=false).

The .jfc configs only need one element:

<setting name="enabled" control="compiler-enabled">true</setting>

Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.
Now you can relate the freed memory to what's available in total.

@@ -558,6 +558,11 @@
<Field type="ulong" contentType="bytes" name="used" label="Used" />
</Event>

<Event name="JitRestart" category="Java Virtual Machine, Compiler" label="JIT restart" stackTrace="false" startTime="false" thread="true">
<Field type="int" name="freedMemory" label="Freed Memory" />
<Field type="ulong" name="codeCacheMaxCapacity" label="CodeCache maximum capacity" />

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is codeCacheMaxCapacity the current in-use size of the CodeCache, in bytes? It should have the contentType="bytes" in that case. Also "freedMemory" should have the same contentType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, codeCacheMaxCapacity is given in bytes. It specifies the maximum size, not the current in-use size, of the CodeCache. It is the sum over all CodeHeap segments.

@@ -612,6 +617,7 @@
<Field type="int" name="adaptorCount" label="Adaptors" />
<Field type="ulong" contentType="bytes" name="unallocatedCapacity" label="Unallocated" />
<Field type="int" name="fullCount" label="Full Count" />
<Field type="ulong" name="codeCacheMaxCapacity" label="CodeCache maximum capacity" />

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contentType="bytes"

@@ -54,7 +54,7 @@ public class TestLookForUntestedEvents {

private static final Set<String> hardToTestEvents = new HashSet<>(
Arrays.asList(
"DataLoss", "IntFlag", "ReservedStackActivation",
"DataLoss", "IntFlag", "ReservedStackActivation", "JitRestart",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a test? There is an existing test for CodeCacheFull, maybe derive from it to create a test also for the JitRestart event?

@@ -558,6 +558,11 @@
<Field type="ulong" contentType="bytes" name="used" label="Used" />
</Event>

<Event name="JitRestart" category="Java Virtual Machine, Compiler" label="JIT restart" stackTrace="false" startTime="false" thread="true">
<Field type="int" name="freedMemory" label="Freed Memory" />
<Field type="ulong" name="codeCacheMaxCapacity" label="CodeCache maximum capacity" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "Code Cache Maximum Capacity"

@@ -612,6 +617,7 @@
<Field type="int" name="adaptorCount" label="Adaptors" />
<Field type="ulong" contentType="bytes" name="unallocatedCapacity" label="Unallocated" />
<Field type="int" name="fullCount" label="Full Count" />
<Field type="ulong" name="codeCacheMaxCapacity" label="CodeCache maximum capacity" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "Code Cache Maximum Capacity"

@@ -558,6 +558,11 @@
<Field type="ulong" contentType="bytes" name="used" label="Used" />
</Event>

<Event name="JitRestart" category="Java Virtual Machine, Compiler" label="JIT restart" stackTrace="false" startTime="false" thread="true">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"JITRestart" . The convention for other events have been to use capital laters for acronyms.

"JIT restart" -> "JIT Restart"

@@ -1366,6 +1366,7 @@ void CodeCache::report_codemem_full(int code_blob_type, bool print) {
event.set_adaptorCount(heap->adapter_count());
event.set_unallocatedCapacity(heap->unallocated_capacity());
event.set_fullCount(heap->full_count());
event.set_codeCacheMaxCapacity(CodeCache::max_capacity());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test of field in TestCodeCacheFull

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -442,6 +437,15 @@ void NMethodSweeper::sweep_code_cache() {
CompileBroker::set_should_compile_new_jobs(CompileBroker::run_compilation);
log.debug("restart compiler");
log_sweep("restart_compiler");
EventJitRestart event;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a sanity test of the event.

If the event can't be provoked reliably, retry or accept as OK.

@MBaesken
Copy link
Member Author

Hi , I adjusted the contentType as suggested, adjusted the label and TestCodeCacheFull.
Regarding triggering the JitRestart event , the suggested TestCodeCacheFull looks like a good start . So I added to allocateCodeBlob (this triggers the CodeCacheFull event) a corresponding freeCodeBlob call .
Unfortunately this did not work in practise and doesn't show the JitRestart event. There could be some delay to the restarting unfortunately.

@MBaesken
Copy link
Member Author

I added a test after some playing around with the WhiteBox functionality.

import jdk.test.lib.Asserts;
import jdk.test.lib.jfr.EventNames;
import jdk.test.lib.jfr.Events;
import sun.hotspot.WhiteBox;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package has been removed. Please use jdk.test.whitebox.WhiteBox.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Coleen thanks for the advice. After switching to the new package jdk.test.whitebox.WhiteBox I get
java.lang.UnsatisfiedLinkError: 'void sun.hotspot.WhiteBox.registerNatives()'
Do I need to do more than just renaming the package ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to change BlobType import too, this one is also in a new package.

Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all the fine-tuning, changes look even better now. And there is a test now so we get alerted if anything breaks. Thanks!

@MBaesken
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Jul 19, 2022

Going to push as commit dfbc691.
Since your change was applied there have been 153 commits pushed to the master branch:

  • 4e6cd67: 8290000: Bump macOS GitHub actions to macOS 11
  • af86cd3: 8290463: Fix several comment typos in sun.security.ec
  • 6cd1c0c: Merge
  • 2677dd6: 8289954: C2: Assert failed in PhaseCFG::verify() after JDK-8183390
  • 4f3f74c: 8289127: Apache Lucene triggers: DEBUG MESSAGE: duplicated predicate failed which is impossible
  • 4a4d8ed: 8289801: [IR Framework] Add flags to whitelist which can be used to simulate a specific machine setup like UseAVX
  • 5a96a5d: 8289612: Change hotspot/jtreg tests to not use Thread.stop
  • b65f7ec: 4887998: Use Integer.rotateLeft() and rotateRight() in crypto implementations
  • 6c8d0e6: 8282526: Default icon is not painted properly
  • e72742e: 8286172: Create an automated test for JDK-4516019
  • ... and 143 more: https://git.openjdk.org/jdk/compare/e05b2f2c3b9b0276099766bc38a55ff835c989e1...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 19, 2022
@openjdk openjdk bot closed this Jul 19, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 19, 2022
@openjdk
Copy link

openjdk bot commented Jul 19, 2022

@MBaesken Pushed as commit dfbc691.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

7 participants