Skip to content

Conversation

@shipilev
Copy link
Member

@shipilev shipilev commented Oct 7, 2025

During the performance investigations in safepoint machinery (notably JDK-8350324), I found the trace logging lacking. It would be good to improve it in order to finely profile various microscopic things like thread list walks, the blocking/resuming of Java threads, etc. For JDK-8350324, for example, I want to be able to measure the Java thread delays if they assist in avalanche wakeups.

This leans heavily on unified logging and invariant clocks to do the right thing, but I think it is a fair compromise for simplicity. The configuration I found most useful for testing is:

-Xlog:async -Xlog:safepoint=trace:file=safepoint.log:uptimenanos

My tentative plan is to visualize this more comprehensively with a little tool that digests that log.

I am open for bikeshedding on logging wording. The log example is in the comment below.

Additional testing:

  • Ad-hoc log peeking
  • Linux x86_64 server fastdebug, tier1
  • Linux x86_64 server fastdebug, all

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8369283: Improve trace logs in safepoint machinery (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27673/head:pull/27673
$ git checkout pull/27673

Update a local copy of the PR:
$ git checkout pull/27673
$ git pull https://git.openjdk.org/jdk.git pull/27673/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27673

View PR using the GUI difftool:
$ git pr show -t 27673

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27673.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 7, 2025

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@shipilev
Copy link
Member Author

shipilev commented Oct 7, 2025

Example log, plus some comments:

$ ... -Xlog:async -Xlog:safepoint=trace:file=safepoint.log:uptimenanos ...

$ sort safepoint.log | less
...
[2559694438ns] Safepoint synchronization initiated 
[2559699197ns] Suspending GC threads
[2559699838ns] Blocking threads from starting/exiting
[2559700219ns] Arming safepoint using futex wait barrier
[2559700860ns] Setting thread local yield flag for threads
[2559706020ns] 18 total threads, waiting for 5 threads to block
[2559706251ns] Checking thread status
[2559707804ns] Thread 0x00007e5b245755d0 [907800] is still running
[2559709337ns] Thread 0x00007e5b24576580 [907801] is still running
[2559709707ns] Thread 0x00007e5b24577890 [907802] is still running
[2559709868ns] Thread 0x00007e5b24578ba0 [907803] is still running
[2559710048ns] Thread 0x00007e5b2457cf00 [907806] is still running
[2559710318ns] Waiting for 5 threads to block
[2559737019ns] Polling page exception: thread = 0x00007e5b24576580 [907801], pc = 0x00007e5b13d40269 (loop), stub = 0x00007e5b137851e0
[2559737430ns] Polling page exception: thread = 0x00007e5b24578ba0 [907803], pc = 0x00007e5b13d40269 (loop), stub = 0x00007e5b137851e0
[2559737170ns] Polling page exception: thread = 0x00007e5b24577890 [907802], pc = 0x00007e5b13d40269 (loop), stub = 0x00007e5b137851e0
[2559738813ns] Polling page exception: thread = 0x00007e5b245755d0 [907800], pc = 0x00007e5b13d40269 (loop), stub = 0x00007e5b137851e0
[2559746978ns] Blocking thread 0x00007e5b245755d0 [907800] 
[2559746908ns] Blocking thread 0x00007e5b24576580 [907801] 
[2559737170ns] Polling page exception: thread = 0x00007e5b2457cf00 [907806], pc = 0x00007e5b13d431e7 (loop), stub = 0x00007e5b137851e0
[2559749403ns] Blocking thread 0x00007e5b24577890 [907802]
[2559755314ns] Blocking thread 0x00007e5b2457cf00 [907806]
[2559747028ns] Blocking thread 0x00007e5b24578ba0 [907803]
[2559778188ns] Checking thread status
[2559778568ns] Thread 0x00007e5b245755d0 [907800] is now blocked
[2559778759ns] Thread 0x00007e5b24576580 [907801] is now blocked
[2559779009ns] Thread 0x00007e5b24577890 [907802] is now blocked
[2559779230ns] Thread 0x00007e5b24578ba0 [907803] is now blocked
[2559779450ns] Thread 0x00007e5b2457cf00 [907806] is now blocked
[2559779761ns] Safepoint synchronization complete
[2561270499ns] Leaving safepoint
[2561272543ns] Disarming safepoint
[2561277833ns] Resuming GC threads
[2561280719ns] Unblocking thread 0x00007e5b24576580 [907801]
[2561280568ns] Unblocking thread 0x00007e5b245755d0 [907800]
[2561281450ns] Unblocking thread 0x00007e5b24577890 [907802]
[2561282031ns] Safepoint "G1CollectForAllocation", Time since last: 295523819 ns, Reaching safepoint: 86084 ns, At safepoint: 1490589 ns, Leaving safepoint: 10740 ns, Total: 1587413 ns, Threads: 5 runnable, 18 total
[2561282592ns] Unblocking thread 0x00007e5b24578ba0 [907803]
[2561283444ns] Safepoint complete
[2561284586ns] Unblocking thread 0x00007e5b2457cf00 [907806]

@openjdk
Copy link

openjdk bot commented Oct 7, 2025

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8369283: Improve trace logs in safepoint machinery

Reviewed-by: fbredberg, dholmes, rehn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 7 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Oct 7, 2025
@openjdk
Copy link

openjdk bot commented Oct 7, 2025

@shipilev The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 7, 2025
@mlbridge
Copy link

mlbridge bot commented Oct 7, 2025

@shipilev
Copy link
Member Author

shipilev commented Oct 7, 2025

Oops, cannot touch JavaThread::name() on this safepoint path, because it can allocate. Back to thread pointers only then.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally seems okay but a few queries.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still one query on the removed part, but okay, seems reasonable.

I seemed to recall that once-upon-a-time we had TraceTimers(?) to gather the interesting elapsed time and other stats for safepoints.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 9, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 9, 2025
@shipilev
Copy link
Member Author

shipilev commented Oct 9, 2025

Hey @robehn -- you improved/used this kind of logging the last time. How does it this PR look to you?

@robehn
Copy link
Contributor

robehn commented Oct 9, 2025

Hey @robehn -- you improved/used this kind of logging the last time. How does it this PR look to you?

Yea, I get what you are doing, it looks good.

But two comments, a bit OT:
1: As you have located the interesting sections, it would be nice if we could make use of them as general trace points.
Meaning it would be nice if e.g. JFR could share these locations.
2: We log threads a bit different, it would be nice if there was thread formatter, so all threads comes out identically.
And for a real user TID (threads PID in linux) is the identifier which is useful, as you can e.g. attach perf to that tid/pid.
The logging of pointers to Thread is only useful to seperate them.

Hence please log TID (linux thread pid), so one can do perf record --pid "TID" or nice or what you need.
Now if that should be done in a completely different PR, maybe addressing 2 or here I'm unsure of, so I'll approve as is and let you guys think about it or ignore :)

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 9, 2025
@shipilev
Copy link
Member Author

shipilev commented Oct 9, 2025

Hence please log TID (linux thread pid), so one can do perf record --pid "TID" or nice or what you need.

Right! In the initial version of the patch, I printed thread names as external identifiers. Then I realized you cannot touch the names, because they can allocate, so I removed that part. But that also loses any external ident for threads. I now pushed the commit that prints the thread TIDs. Updated log snippet here: #27673 (comment)

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Oct 9, 2025
Copy link
Contributor

@fbredber fbredber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 9, 2025
@shipilev
Copy link
Member Author

shipilev commented Oct 9, 2025

For your amusement, this is how a render for some events looks like, using this logging as input. With Shenandoah, so the actual VM work is very small.

Nearly perfect GC pause:
example-nearly-perfect

TTSP hiccup:
example-ttsp-lag

@shipilev
Copy link
Member Author

shipilev commented Oct 9, 2025

Thanks for reviews, all! The crude parsing tool I now have tells me the logging is fairly complete for the safepoint investigation purposes. The re-run of make test TEST=all finished without issues. We can do more stuff as followups, if needed. I am integrating this meanwhile.

/integrate

@openjdk
Copy link

openjdk bot commented Oct 9, 2025

Going to push as commit 1992b69.
Since your change was applied there have been 13 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 9, 2025
@openjdk openjdk bot closed this Oct 9, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 9, 2025
@openjdk
Copy link

openjdk bot commented Oct 9, 2025

@shipilev Pushed as commit 1992b69.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good update on the thread_id(). I had considered that earlier but it seemed we were using the address elsewhere so I didn't raise it. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants