Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8299426: Heap dump does not contain virtual Thread stack references #16665

Closed
wants to merge 3 commits into from

Conversation

alexmenkov
Copy link

@alexmenkov alexmenkov commented Nov 14, 2023

The change impelements dumping of unmounted virtual threads data (stack traces and stack references).
Unmounted vthreads can be detected only by iterating over the heap, but hprof stack trace records (HPROF_FRAME/HPROF_TRACE) should be written before HPROF_HEAP_DUMP/HPROF_HEAP_DUMP_SEGMENT.
HeapDumper supports segment dump (parallel dump to separate files with subsequent file merge outside of safepoint), the fix switches HeapDumper to always use segment dump: 1st segment contains only non-heap data, other segments are used for dumping heap objects. For serial dumping single-threaded dumping is performed, but 2 segments are created anyway.
When HeapObjectDumper detects unmounted virtual thread, it writes HPROF_FRAME/HPROF_TRACE records to the 1st segment ("global writer"), and writes thread object (HPROF_GC_ROOT_JAVA_FRAME) and stack references (HPROF_GC_ROOT_JAVA_FRAME/HPROF_GC_ROOT_JNI_LOCAL) to the HeapObjectDumper segment.
As parallel dumpers may write HPROF_FRAME/HPROF_TRACE concurrently and VMDumper needs to write non-heap data before heap object dumpers can write virtual threads data, writing to global writer is protected with DumperController::_global_writer_lock.

Testing: run tests which perform heap dump (in different scenarios):

  • test/hotspot/jtreg/serviceability
  • test/hotspot/jtreg/runtime/ErrorHandling
  • test/hotspot/jtreg/gc/epsilon
  • test/jdk/sun/tools/jhsdb

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8299426: Heap dump does not contain virtual Thread stack references (Bug - P3)(⚠️ The fixVersion in this issue is [22] but the fixVersion in .jcheck/conf is 23, a new backport will be created when this pr is integrated.)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16665/head:pull/16665
$ git checkout pull/16665

Update a local copy of the PR:
$ git checkout pull/16665
$ git pull https://git.openjdk.org/jdk.git pull/16665/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16665

View PR using the GUI difftool:
$ git pr show -t 16665

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16665.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 14, 2023

👋 Welcome back amenkov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 14, 2023

@alexmenkov The following labels will be automatically applied to this pull request:

  • hotspot-runtime
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org labels Nov 14, 2023
@alexmenkov alexmenkov changed the title vthreads in heapdump 8299426: Heap dump does not contain virtual Thread stack references Nov 14, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 14, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 14, 2023

Webrevs

Copy link
Contributor

@plummercj plummercj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are dumping all virtual threads found in the heap, even if they are unreachable. Although this would be true for platform threads also, there is a high likelihood of a lot of unreachable virtual threads in the heap, but not so much for platform threads.

Also, have you tested scalability? Not just a large number of live virtual threads (like a million), but also combined with a large number of unreachable virtual threads.

@sspitsyn
Copy link
Contributor

sspitsyn commented Nov 29, 2023

@plummercj said:

I think you are dumping all virtual threads found in the heap, even if they are unreachable.

There is a check for virtual thread liveness which has to be good enough in general:

1617   static bool should_dump_vthread(oop vt) {
1618     return java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::NEW
1619         && java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED;
1620   }
. . .
1919     if (java_lang_VirtualThread::is_instance(o) && ThreadDumper::should_dump_vthread(o)) {
1920       _vthread_dumper->dump_vthread(o, writer());
1921     }

writer()->flush();

// At this point, all fragments of the heapdump have been written to separate files.
// We need to merge them into a complete heapdump and write HPROF_HEAP_DUMP_END at that time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how the code in VM_HeapDumper::work function was re-arranged by removing some duplication and making more common vs conditional lines.

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Nice piece of work!

static bool is_vm_dumper(int dumper_id) { return dumper_id == VMDumperId; }
// the 1st dumper calling get_next_dumper_id becomes VM dumper
int get_next_dumper_id() {
return Atomic::fetch_then_add(&_dump_seq, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This can be used instead:
2196 return Atomic::inc(&_dump_seq);

@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@alexmenkov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8299426: Heap dump does not contain virtual Thread stack references

Reviewed-by: cjplummer, sspitsyn, lmesnik

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 373 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 30, 2023
@plummercj
Copy link
Contributor

There is a check for virtual thread liveness which has to be good enough in general:

1617   static bool should_dump_vthread(oop vt) {
1618     return java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::NEW
1619         && java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED;
1620   }
. . .
1919     if (java_lang_VirtualThread::is_instance(o) && ThreadDumper::should_dump_vthread(o)) {
1920       _vthread_dumper->dump_vthread(o, writer());
1921     }

That should in general take care of most unreachable virtual threads, but technically I don't think a virtual thread has to reach the TERMINATED state in order to become unreachable. However, it will never get scheduled.

@sspitsyn
Copy link
Contributor

That should in general take care of most unreachable virtual threads, but technically I don't think a virtual thread has to reach the TERMINATED state in order to become unreachable. However, it will never get scheduled.

Agreed.

@alexmenkov
Copy link
Author

There is a check for virtual thread liveness which has to be good enough in general:

1617   static bool should_dump_vthread(oop vt) {
1618     return java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::NEW
1619         && java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED;
1620   }
. . .
1919     if (java_lang_VirtualThread::is_instance(o) && ThreadDumper::should_dump_vthread(o)) {
1920       _vthread_dumper->dump_vthread(o, writer());
1921     }

That should in general take care of most unreachable virtual threads, but technically I don't think a virtual thread has to reach the TERMINATED state in order to become unreachable. However, it will never get scheduled.

I'm not sure I understand the scenario. The state is set to TERMINATED after the thread completes its execution.
So a virtual thread was scheduled, mounted, did some work (as state != NEW) and then scheduler unmounts it and decides to not schedule it again and just "loses" it?
This does not look like a real scenario for me, but anyway I think that's fine to report such unreachable virtual threads until GC collects the objects.

@plummercj
Copy link
Contributor

I'm not sure I understand the scenario. The state is set to TERMINATED after the thread completes its execution.
So a virtual thread was scheduled, mounted, did some work (as state != NEW) and then scheduler unmounts it and decides to not schedule it again and just "loses" it?
This does not look like a real scenario for me, but anyway I think that's fine to report such unreachable virtual threads until GC collects the objects.

I wasn't thinking in terms of the scheduler somehow no longer references the virtual thread, but instead the program no longer referencing the scheduler (and also not referencing the virtual thread).

@alexmenkov
Copy link
Author

I wasn't thinking in terms of the scheduler somehow no longer references the virtual thread, but instead the program no longer referencing the scheduler (and also not referencing the virtual thread).

AFAIU unfinished unmounted virtual threads are referenced from other objects (they are parked on), so they can't be unreachable even is the application is not referencing them and the scheduler.

Copy link
Contributor

@plummercj plummercj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have is when there is a large number of virtual threads, the dump may take too long and the hprof file gets bloated. My concern mostly comes from the large number of virtual thread stack traces that will be present. Dumping all these hprof records related to unmounted virtual threads could do more harm than good in some instances, and we may want a way for the user to disable it.

It would be nice if there could be some data sharing between threads with identical stack traces, but I don't see a way to do that with the current hprof spec.

@alexmenkov
Copy link
Author

One concern I have is when there is a large number of virtual threads, the dump may take too long and the hprof file gets bloated. My concern mostly comes from the large number of virtual thread stack traces that will be present. Dumping all these hprof records related to unmounted virtual threads could do more harm than good in some instances, and we may want a way for the user to disable it.

My understanding that information about references is one of the most important things for dump analysis (and that's what the issue about). So we cannot avoid stack unwinding for unmounted virtual threads.
As for heapdump file size, each stack trace adds 21 + 53 * frame_number bytes for 64bit system (uncompressed data)
So for 10 frames it adds ~550 bytes, for 20 frames ~1.1KB
I'm not sure if stack traces are important for analysis, maybe we it makes sense to add an option to not include them in heap dump (for both platform and virtual threads).

It would be nice if there could be some data sharing between threads with identical stack traces, but I don't see a way to do that with the current hprof spec.

Hprof spec says nothing about 1:1 relation between threads and stack traces, so theoretically several HPROF_GC_ROOT_THREAD_OBJ subrecords may refer to the same stack trace, but search for identical stack traces may be expensive.

@mlbridge
Copy link

mlbridge bot commented Dec 4, 2023

Mailing list message from David Holmes on serviceability-dev:

On 1/12/2023 2:08 pm, Alex Menkov wrote:

On Thu, 30 Nov 2023 21:11:08 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

I wasn't thinking in terms of the scheduler somehow no longer references the virtual thread, but instead the program no longer referencing the scheduler (and also not referencing the virtual thread).

AFAIU unfinished unmounted virtual threads are referenced from other objects (they are parked on), so they can't be unreachable even is the application is not referencing them and the scheduler.

There is (or was - there may be a property that affects this:
trackAllThreads?) a scenario where a VT might park on a synchronization
object which is not referenced from any other thread. The VT can never
be unparked, and the sync object and the VT are reachable only from
either other and so both can be GC'd.

David
-----

@mlbridge
Copy link

mlbridge bot commented Dec 4, 2023

Mailing list message from Alan Bateman on serviceability-dev:

On 04/12/2023 12:41, David Holmes wrote:

On 1/12/2023 2:08 pm, Alex Menkov wrote:

On Thu, 30 Nov 2023 21:11:08 GMT, Chris Plummer
<cjplummer at openjdk.org> wrote:

I wasn't thinking in terms of the scheduler somehow no longer
references the virtual thread, but instead the program no longer
referencing the scheduler (and also not referencing the virtual
thread).

AFAIU unfinished unmounted virtual threads are referenced from other
objects (they are parked on), so they can't be unreachable even is
the application is not referencing them and the scheduler.

There is (or was - there may be a property that affects this:
trackAllThreads?) a scenario where a VT might park on a
synchronization object which is not referenced from any other thread.
The VT can never be unparked, and the sync object and the VT are
reachable only from either other and so both can be GC'd.

That's right, the door is not closed to introducing ephemeral threads in
the future. Right now, virtual threads created directly with the Thread
API remaining strongly reachable once started until they terminate.
Virtual threads created in other containers (e.g. a thread-per-task
ExecutorService) are kept reachable by the container.

-Alan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/serviceability-dev/attachments/20231204/90cc8f06/attachment.htm>

@plummercj
Copy link
Contributor

plummercj commented Dec 4, 2023

My understanding that information about references is one of the most important things for dump analysis (and that's what the issue about). So we cannot avoid stack unwinding for unmounted virtual threads.
As for heapdump file size, each stack trace adds 21 + 53 * frame_number bytes for 64bit system (uncompressed data)
So for 10 frames it adds ~550 bytes, for 20 frames ~1.1KB
I'm not sure if stack traces are important for analysis, maybe we it makes sense to add an option to not include them in heap dump (for both platform and virtual threads).

My concern was with the memory usage of the stack traces. Yes I agree that including all referenced objects in the dump is important. An option just to leave out the stack traces seems like a good idea.

Hprof spec says nothing about 1:1 relation between threads and stack traces, so theoretically several HPROF_GC_ROOT_THREAD_OBJ subrecords may refer to the same stack trace, but search for identical stack traces may be expensive.

I was thinking initially that you couldn't do this because each stack has it's own unique set of locals that are referenced, and the locals were part of the stack trace, but they are not. There is instead a HPROF_GC_ROOT_JAVA_FRAME record for each local reference. It does include the ThreadID, and we could probably get away with multiple Thread records referring to the same stack trace.

@alexmenkov
Copy link
Author

My understanding that information about references is one of the most important things for dump analysis (and that's what the issue about). So we cannot avoid stack unwinding for unmounted virtual threads.
As for heapdump file size, each stack trace adds 21 + 53 * frame_number bytes for 64bit system (uncompressed data)
So for 10 frames it adds ~550 bytes, for 20 frames ~1.1KB
I'm not sure if stack traces are important for analysis, maybe we it makes sense to add an option to not include them in heap dump (for both platform and virtual threads).

My concern was with the memory usage of the stack traces. Yes I agree that including all referenced objects in the dump is important. An option just to leave out the stack traces seems like a good idea.

VM_HeapDumper caches stack traces for platform/carrier and mounted virtual threads only.
For unmounted virtual threads ThreadDumper objects are created on the stack (see VM_HeapDumper::dump_vthread), so I don't see problems with memory usage even huge number of unmounted vthreads.
I think an option to exclude stack traces from heap dump is a separate task.

Hprof spec says nothing about 1:1 relation between threads and stack traces, so theoretically several HPROF_GC_ROOT_THREAD_OBJ subrecords may refer to the same stack trace, but search for identical stack traces may be expensive.

I was thinking initially that you couldn't do this because each stack has it's own unique set of locals that are referenced, and the locals were part of the stack trace, but they are not. There is instead a HPROF_GC_ROOT_JAVA_FRAME record for each local reference. It does include the ThreadID, and we could probably get away with multiple Thread records referring to the same stack trace.

I think this possible improvement is out of scope for this PR

@plummercj
Copy link
Contributor

VM_HeapDumper caches stack traces for platform/carrier and mounted virtual threads only.
For unmounted virtual threads ThreadDumper objects are created on the stack (see VM_HeapDumper::dump_vthread), so I don't see problems with memory usage even huge number of unmounted vthreads.
I think an option to exclude stack traces from heap dump is a separate task.

I was actually referring to the footprint of the hprof file, not the in process memory usage while producing it.

My concern with not doing the option to exclude stack traces now is that it could result in some unusable or unmanageably large heap dumps, or tools simply being overwhelmed by the number of threads. For example, I just looked at the VisualVM threads view, and it just produces a scrollable list of all threads. What happens if there are suddenly 10's of thousands if not millions of threads? If we are lucky is doesn't choke on them and the platform threads are first in the list, but this is the type of thing I'd like to see testing of before pushing this change.

@mlbridge
Copy link

mlbridge bot commented Dec 5, 2023

Mailing list message from Chris Plummer on serviceability-dev:

On 12/4/23 5:20 AM, Alan Bateman wrote:

On 04/12/2023 12:41, David Holmes wrote:

On 1/12/2023 2:08 pm, Alex Menkov wrote:

On Thu, 30 Nov 2023 21:11:08 GMT, Chris Plummer
<cjplummer at openjdk.org> wrote:

I wasn't thinking in terms of the scheduler somehow no longer
references the virtual thread, but instead the program no longer
referencing the scheduler (and also not referencing the virtual
thread).

AFAIU unfinished unmounted virtual threads are referenced from other
objects (they are parked on), so they can't be unreachable even is
the application is not referencing them and the scheduler.

There is (or was - there may be a property that affects this:
trackAllThreads?) a scenario where a VT might park on a
synchronization object which is not referenced from any other thread.
The VT can never be unparked, and the sync object and the VT are
reachable only from either other and so both can be GC'd.

That's right, the door is not closed to introducing ephemeral threads
in the future. Right now, virtual threads created directly with the
Thread API remaining strongly reachable once started until they
terminate. Virtual threads created in other containers (e.g. a
thread-per-task ExecutorService) are kept reachable by the container.

-Alan

So does this mean if the application is no longer referencing the
ExecutorService, then we can have unreachable virtual threads that have
not completed? This is really the point I've been getting at.

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/serviceability-dev/attachments/20231204/fb49932f/attachment.htm>

@mlbridge
Copy link

mlbridge bot commented Dec 5, 2023

Mailing list message from Alan Bateman on serviceability-dev:

On 04/12/2023 18:59, Chris Plummer wrote:

So does this mean if the application is no longer referencing the
ExecutorService, then we can have unreachable virtual threads that
have not completed? This is really the point I've been getting at.

Yes, this is possible. It would be an unusual scenario of course.

-Alan

@alexmenkov
Copy link
Author

alexmenkov commented Dec 7, 2023

I was actually referring to the footprint of the hprof file, not the in process memory usage while producing it.

My concern with not doing the option to exclude stack traces now is that it could result in some unusable or unmanageably large heap dumps, or tools simply being overwhelmed by the number of threads. For example, I just looked at the VisualVM threads view, and it just produces a scrollable list of all threads. What happens if there are suddenly 10's of thousands if not millions of threads? If we are lucky is doesn't choke on them and the platform threads are first in the list, but this is the type of thing I'd like to see testing of before pushing this change.

Heap dumps are usually big. I think stack traces would not add much (comparing to the size of heapdump itself).
Also heap dumper supports compression. It works perfectly fine for identical stack traces.
I did some experiments with VisualVM and Eclipse MAT using a heapdump which contains 5K virtual threads.
VisualVM has a bug which causes failure populating thread list for virtual threads (I filed a bug for it). I fixed the bug locally and VisualVM was able to generate the list.
VisualVM is not ready to work with big number of threads - it generates the whole list of the threads with stack traces and locals before show it (as table rows or as html) and the generation takes long time. I'd say this is VisualVM's UI issue. I generated heap dump without stack traces - it doesn't help much.
Eclipse MAT handles 5K vthreads with no problem (no noticeable lags with and without stack traces).

So in my opinion the option to exclude stack traces doesn't make much difference. Tools should be ready to handle big number of threads.

@alexmenkov
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Dec 7, 2023

Going to push as commit 354ea4c.
Since your change was applied there have been 373 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 7, 2023
@openjdk openjdk bot closed this Dec 7, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 7, 2023
@openjdk
Copy link

openjdk bot commented Dec 7, 2023

@alexmenkov Pushed as commit 354ea4c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

5 participants