Timeline visualization tool #1022

wks · 2023-11-13T09:35:31Z

This PR adds eBPF-based scripts for recording the start and end time of work packets and formatting the log for visualization with PerfettoUI.

Co-authored-by: Claire Huang claire.x.huang@gmail.com
Co-authored-by: Zixian Cai u5937495@anu.edu.au

wks · 2024-01-08T08:59:44Z

There is a known issue: The scripts sometimes fail to capture the work packet names when running the OpenJDK binding. The bpftrace script reads all zero bytes from the memory location that holds the type names. However, if the value of the packet names are read by the Rust program (not the bpftrace script), the bpftrace script will be able to see those packet name strings. From the Rust documentation, the current implementation of std::any::typename<T>() reads the type name from debug information. I think it may be related to the exact timing when such type info is mmap-ed into the address space of the process. If it is mmap-ed too late, bpftrace will not be able to see it.

caizixian · 2024-01-08T11:53:46Z

@clairexhuang might also be interested

caizixian · 2024-01-08T12:01:40Z

There is a known issue: The scripts sometimes fail to capture the work packet names when running the OpenJDK binding. The bpftrace script reads all zero bytes from the memory location that holds the type names. However, if the value of the packet names are read by the Rust program (not the bpftrace script), the bpftrace script will be able to see those packet name strings. From the Rust documentation, the current implementation of std::any::typename<T>() reads the type name from debug information. I think it may be related to the exact timing when such type info is mmap-ed into the address space of the process. If it is mmap-ed too late, bpftrace will not be able to see it.

when such type info is mmap-ed into the address space of the process

Yes, it could be a race condition of on-demand paging vs the bpf program reading that address.

I remember you had a workaround that in probe!, you read the string to force the mapping. Does the workaround not work? And is this why you have @gc_count >= 2 to make sure that the typenames are mapped (which wouldn't be a reliable workaround, because for example, nursery GC might have different kinds of workpackets than full GC)?

Two more potential workarounds:

Find out the section where the static strings live, and pretouch the section during boot time.
In bpftrace code, check whether the str is empty, if so, don't set @decoded, and we can try again next time the packet is executed. This is in the slow path so shouldn't add much overheads.

I also suggest that when we merge, we add Co-authored-by: since this originally came from Claire's MPLR paper.

tools/tracing/timeline/README.md

caizixian · 2024-01-08T12:13:19Z

Also should we considering adding a mode where we visualize the relationship of work packets as well (which packet spawns what)? If collecting the work packet ID has too much overhead, we can always make this optionally compiled.

I remember @wks spent a lot of time getting that to work, and figuring out the correct JSON output for the arrows. It would be a shame if we lose it.

caizixian · 2024-01-08T12:13:54Z

Overall the cleanup of the scripts look good.

wks · 2024-01-09T06:57:12Z

There is a known issue: The scripts sometimes fail to capture the work packet names when running the OpenJDK binding. The bpftrace script reads all zero bytes from the memory location that holds the type names. However, if the value of the packet names are read by the Rust program (not the bpftrace script), the bpftrace script will be able to see those packet name strings. From the Rust documentation, the current implementation of std::any::typename<T>() reads the type name from debug information. I think it may be related to the exact timing when such type info is mmap-ed into the address space of the process. If it is mmap-ed too late, bpftrace will not be able to see it.

when such type info is mmap-ed into the address space of the process

Yes, it could be a race condition of on-demand mapping vs the bpf program reading that address.

I remember you had a workaround that in probe!, you read the string to force the mapping. Does the workaround not work?

It works, but it adds an unnecessary overhead to mmtk-core. I added this workaround and guarded it under a Cargo feature, just in case somebody needs that workaround.

And is this why you have @gc_count >= 2 to make sure that the typenames are mapped (which wouldn't be a reliable workaround, because for example, nursery GC might have different kinds of workpackets than full GC)?

Yes. That was my intention, but that didn't work. And I forgot to clean it up. I'll remove the expression @gc_count >= 2.

Two more potential workarounds:

* Find out the section where the static strings live, and pretouch the section during boot time.

Here is a log:

@type_name[140272533677491]: mmtk::scheduler::gc_work::PrepareCollector
@type_name[140272533679341]: mmtk::scheduler::gc_work::VMProcessWeakRefs<mmtk::scheduler::gc
@type_name[140272533680046]: mmtk::scheduler::gc_work::ScheduleCollection
@type_name[140272533680090]: mmtk::scheduler::gc_work::ScanVMSpecificRoots<mmtk::plan::immix
@type_name[140272533682789]: mmtk::scheduler::gc_work::Prepare<mmtk::plan::immix::gc_work::I
@type_name[140272533683189]: mmtk::scheduler::gc_work::StopMutators<mmtk::plan::immix::gc_wo
@type_name[140272533689536]: mmtk::scheduler::gc_work::VMPostForwarding<mmtk_openjdk::OpenJD
@type_name[140272533691961]: mmtk::scheduler::gc_work::ReleaseMutator<mmtk_openjdk::OpenJDK<
@type_name[140272533693082]: mmtk::scheduler::gc_work::ReleaseCollector
@type_name[140272533693471]: mmtk::scheduler::gc_work::ScanMutatorRoots<mmtk::plan::immix::g
@type_name[140272533693893]: mmtk::scheduler::gc_work::VMProcessWeakRefs<mmtk::scheduler::gc
@type_name[140272533695337]: mmtk::scheduler::gc_work::Release<mmtk::plan::immix::gc_work::I
@type_name[140272533697112]: mmtk::scheduler::gc_work::Release<mmtk::plan::immix::gc_work::I
@type_name[140272533697621]: mmtk::scheduler::gc_work::ScanMutatorRoots<mmtk::plan::immix::g
@type_name[140272533699641]: mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJD
@type_name[140272533700441]: mmtk::scheduler::gc_work::StopMutators<mmtk::plan::immix::gc_wo
@type_name[140272533701952]: mmtk::scheduler::gc_work::ScanVMSpecificRoots<mmtk::plan::immix
@type_name[140272533704295]: mmtk::scheduler::gc_work::PlanProcessEdges<mmtk_openjdk::OpenJD
@type_name[140272533704985]: mmtk::scheduler::gc_work::Prepare<mmtk::plan::immix::gc_work::I
@type_name[140272533725749]: mmtk::policy::immix::immixspace::PrepareBlockState<mmtk_openjdk
@type_name[140272533725901]: mmtk::policy::immix::immixspace::SweepChunk<mmtk_openjdk::OpenJ
@type_name[140272533728339]: mmtk_openjdk::gc_work::ScanClassLoaderDataGraphRoots<mmtk_openj
@type_name[140272533743174]: mmtk_openjdk::gc_work::ScanStringTableRoots<mmtk_openjdk::OpenJ
@type_name[140272533751448]: mmtk_openjdk::gc_work::ScanStringTableRoots<mmtk_openjdk::OpenJ
@type_name[140272533766535]: 
@type_name[140272533766983]: 
@type_name[140272533767830]: 
@type_name[140272533769535]: 
@type_name[140272533769973]: 
@type_name[140272533776336]: 
@type_name[140272533794799]: 
@type_name[140272533819627]: 
@type_name[140272533836434]: mmtk_openjdk::gc_work::ScanVMThreadRoots<mmtk_openjdk::OpenJDK<
@type_name[140272533845875]: mmtk_openjdk::gc_work::ScanVMThreadRoots<mmtk_openjdk::OpenJDK<
@type_name[140272533847104]: mmtk_openjdk::gc_work::ScanUniverseRoots<mmtk_openjdk::OpenJDK<
@type_name[140272533851788]: mmtk_openjdk::gc_work::ScanCodeCacheRoots<true, mmtk::scheduler
@type_name[140272533852204]: mmtk_openjdk::gc_work::ScanObjectSynchronizerRoots<mmtk_openjdk
@type_name[140272533863388]: mmtk_openjdk::gc_work::ScanCodeCacheRoots<true, mmtk::scheduler
@type_name[140272533865153]: mmtk_openjdk::gc_work::ScanSystemDictionaryRoots<mmtk_openjdk::
@type_name[140272533867703]: mmtk_openjdk::gc_work::ScanJNIHandlesRoots<mmtk_openjdk::OpenJD
@type_name[140272533869470]: mmtk_openjdk::gc_work::ScanWeakProcessorRoots<mmtk_openjdk::Ope
@type_name[140272533873490]: mmtk_openjdk::gc_work::ScanWeakProcessorRoots<mmtk_openjdk::Ope
@type_name[140272533875252]: mmtk_openjdk::gc_work::ScanAOTLoaderRoots<mmtk_openjdk::OpenJDK
@type_name[140272533878628]: mmtk_openjdk::gc_work::ScanJvmtiExportRoots<mmtk_openjdk::OpenJ
@type_name[140272533879069]: mmtk_openjdk::gc_work::ScanJNIHandlesRoots<mmtk_openjdk::OpenJD
@type_name[140272533885083]: mmtk::util::finalizable_processor::Finalization<mmtk::scheduler
@type_name[140272533887515]: mmtk::util::finalizable_processor::Finalization<mmtk::scheduler

The names missing are all openjdk-specific packets (I compared with a log from a worked-around mmtk-core), and their addresses are part of the contiguous region that contains other strings for mmtk_openjdk::..... They span over an almost 64KB region. So I don't think the lazy mapping happens at section granularity.

* In bpftrace code, check whether the `str` is empty, if so, don't set `@decoded`, and we can try again next time the packet is executed. This is in the slow path so shouldn't add much overheads.

This workaround doesn't work. In one benchmark, some work packet names can be read for 382-398 times and still get "\0", at which time the benchmark ends.

I also suggest that when we merge, we add Co-authored-by: since this originally came from Claire's MPLR paper.

Yes. I'll do it.

wks · 2024-01-09T07:10:45Z

Also should we considering adding a mode where we visualize the relationship of work packets as well (which packet spawns what)? If collecting the work packet ID has too much overhead, we can always make this optionally compiled.

I remember @wks spent a lot of time getting that to work, and figuring out the correct JSON output for the arrows. It would be a shame if we lose it.

Yes. I implemented that. However, to achieve that, I made non-trivial changes to the scheduler and work buckets so that every work packet is assigned a unique ID when it is added to a bucket. The overhead will be significant. But I think it is still useful, even if we have to guard it behind a Cargo feature. I'll tidy up that code and make another PR for it.

caizixian · 2024-01-09T09:43:39Z

This workaround doesn't work. In one benchmark, some work packet names can be read for 382-398 times and still get "\0", at which time the benchmark ends.

Ah ok. In that case, I agree with current workaround as of 7e9b1c0

But I think it is still useful, even if we have to guard it behind a Cargo feature. I'll tidy up that code and make another PR for it.

Sounds good.

wks added 3 commits January 8, 2024 15:43

WIP: Timeline visualization tool

44b2746

Minor changes

a5e0176

Reorgainze dir and add README

527a3dc

wks force-pushed the feature/timeline-tool branch from c939868 to 527a3dc Compare January 8, 2024 08:50

wks marked this pull request as ready for review January 8, 2024 08:52

wks requested review from caizixian and qinsoon January 8, 2024 08:52

caizixian reviewed Jan 8, 2024

View reviewed changes

tools/tracing/timeline/README.md Outdated Show resolved Hide resolved

Workaround packet name display

861e7db

wks added 2 commits January 9, 2024 16:04

Clarify logging, buffer and lost events

6c16931

Remove trailing spaces

7e9b1c0

caizixian approved these changes Jan 9, 2024

View reviewed changes

wks added this pull request to the merge queue Jan 9, 2024

Merged via the queue into mmtk:master with commit b66fa35 Jan 9, 2024
19 checks passed

wks deleted the feature/timeline-tool branch January 9, 2024 11:44

wks mentioned this pull request Jan 10, 2024

EBPF trace sometimes fail to record packet type #1020

Open

caizixian mentioned this pull request Feb 5, 2024

Optionally assign IDs to work packets, and record the creator of a packet for tracing #1080

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeline visualization tool #1022

Timeline visualization tool #1022

wks commented Nov 13, 2023 •

edited by caizixian

wks commented Jan 8, 2024

caizixian commented Jan 8, 2024

caizixian commented Jan 8, 2024 •

edited

caizixian commented Jan 8, 2024

caizixian commented Jan 8, 2024

wks commented Jan 9, 2024

wks commented Jan 9, 2024

caizixian commented Jan 9, 2024

Timeline visualization tool #1022

Timeline visualization tool #1022

Conversation

wks commented Nov 13, 2023 • edited by caizixian

wks commented Jan 8, 2024

caizixian commented Jan 8, 2024

caizixian commented Jan 8, 2024 • edited

caizixian commented Jan 8, 2024

caizixian commented Jan 8, 2024

wks commented Jan 9, 2024

wks commented Jan 9, 2024

caizixian commented Jan 9, 2024

wks commented Nov 13, 2023 •

edited by caizixian

caizixian commented Jan 8, 2024 •

edited