-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: diagnostics improvements tracking issue #57175
Comments
2022-12-07 SyncAttendees: @mknyszek @aclements @prattmic @felixge @nsrip-dd @rhysh Notes:
|
I'll miss the Dec 22nd meetup because I'm traveling for the holidays. That being said, if I find time I might also look into #57159 . Getting a proof of concept for Perfetto UI integration (ideally using their protocol buffer format) is probably more important than the gentraceback refactoring at this point. I just tried to work with a 300 MB (15s of prod activity) yesterday, and it was a real eye opener to the way the current UI struggles. |
I don't know if it's relevant (probably nothing new for the folks on this thread), but I had similar problems with the |
would the pprof labels also show up in goroutine traces? |
I'm working on a PoC that improves native stack unwinding on Windows by adding additional information to the PE file. This will help debugging with WinDbg and profiling with Windows Performance Analyzer. Would this work fit into the effort tracked by this issue? |
@thediveo I think that might be a good question for #56295, or you could file another issue. Off the top of my head, that doesn't sound like it would be too difficult to do. @qmuntal Oh neat! That's awesome. I think it's a little tangential to the work we're proposing here, unless you also plan to do anything with the runtime's unwinder (i.e. |
I still have to prepare the proposal, I plan to submit it next week.
Not for now, but once I finish this I want to investigate how feasible is too unwind native code and merge it with the Go unwinding, in case the exception happens in a non-Go module. |
I do now #57302 😄 |
Change https://go.dev/cl/459095 mentions this issue: |
2022-12-22 SyncAttendees: @mknyszek @aclements @prattmic @bboreham @rhysh @dominikh
|
2023-01-05 SyncAttendees: @aclements @felixge @nsrip-dd @rhysh @bboreham vnedkov @dashpole
|
2023-01-19 SyncAttendees: @aclements @felixge @nsrip-dd @rhysh @bboreham @mknyszek @prattmic @dominikh @dashpole
|
2023-02-02 SyncAttendees: @aclements @felixge @nsrip-dd @thepudds @bboreham @dashpole @mknyszek @prattmic
|
FYI: #57302 is hitting this as well, as I'm implementing SEH unwinding using the frame pointer. Whichever is the fix for that, would be good to take SEH also into account. |
This change does a lot at once, but it's mostly refactoring. First, it moves most of the profile abstraction out of benchmarks/internal/driver and into a new shared package called diagnostics. It also renames profiles to diagnostics to better capture the breadth of what this mechanism collects. Then, it adds support for turning on diagnostics from configuration files. Next, it adds support for generating additional configurations to capture the overhead of collecting diagnostics, starting with CPU profiling. Lastly, it adds support for the new Trace diagnostic. (This change also fixes a bug in go-build where Linux perf flags weren't being propagated.) In the future, core dumps could easily be folded into this new diagnostics abstraction. For golang/go#57175. Change-Id: I999773e8be28c46fb5d4f6a79a94d542491e3754 Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/459095 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-16 SyncAttendees: @mknyszek @aclements @felixge @nsrip-dd @prattmic @dominikh @thepudds @pmbauer @dashpole @rhysh
|
2023-03-02 SyncAttendees: @mknyszek @prattmic @felixge @nsrip-dd @aclements @thepudds @rhysh @bboreham
|
Does this include the current state of all (relevant) goroutines? The current parser is essentially a state machine and we need to see all previous events to reconstruct a global timeline. I don't see that going away with the new format.
I'd encourage you to take a look at https://github.com/dominikh/gotraceui/blob/04107aeaa72e30c50bb6d10e9f2b6ca384fafc3d/trace/parser.go#L18-L77 for the data layout I've chosen in gotraceui. It's nothing groundbreaking, but it highlights the need to avoid the use of pointers. |
It does not. It only cares about the initial state of all Ms (including goroutines running on them), and generally only mentions goroutines that actually emit events. For goroutines that aren't running, there are only two cases where we actually care about the initial state of a goroutine: whether it was blocked, or whether it was waiting. In both cases it's straightforward to infer the state of the goroutine from the events that must happen to transition goroutines out of these states: unblocking and starting to run. The trace still needs to indicate if a goroutine (and M) is in a syscall or if it's running. In the new design, this information is emitted together at the first call into the tracer by that M for that partition. The timestamp needs to be back-dated to the start of the partition. There's some imprecision with this back-dating but it's only relevant at the very start of a trace. The worst case is that a goroutine may appear to have been running or in a syscall at the start of a trace for longer than it actually was. The amount of imprecision here is bounded by the time delta between the global (serialized) declaration of a new partition and when an M has it's buffer flushed and/or is notified (via an atomic) that tracing has started, which I expect in general to be very short and non-blocking. (We can also explicitly bound the time by telling the M what time it was contacted for a new partition.) Note that the details above imply that when a new partition starts, a running M may have been in a tight loop and so hasn't emitted any events for the last partition, in which case we need to preempt it to have it dump its initial state. Generally, moving partitions forward doesn't have to even involve preemption.
That seems useful for the current trace format, thanks. For the new format, I don't expect to expand the trace events out of their encoded form at all, but rather decode them lazily (either copy them out wholesale or just point into the encoded trace data in the input buffer, both of which are cheap from the perspective of the GC). |
That has two implications, however:
I realize that with self-contained partitions it isn't feasible to include the state of all goroutines in all partitions, but maybe it should optionally be possible to dump complete state in the first partition, for users who want a complete view? However that wouldn't really fit into an M-centric format…
I feel like the current parser + its types and the new approach you describe are at two different layers of abstraction. The current parser isn't exposing raw events. Instead it is doing a fair bit of processing of arguments, and it populates |
Both of those things are good points. Dumping the state of the world at the start is one option but I'm also reluctant to do anything around this because it adds a lot of overhead. Interrogating every goroutine can take a while, and the world needs to be effectively stopped while it happens (or the synchronization will get really complicated). At the end of the day, my gut feeling is that the execution trace should focus solely on what's necessary for tracing execution, not what could execute. However, I can definitely see that getting the information you describe has utility and we don't want to lose that. In the last meeting we discussed how goroutine profiles could be used to fill this gap. As a baseline, it should be fairly straightforward to correlate a goroutine profile's STW timestamp with a STW event in the trace. Taking that one step further, we could explicitly mention that the STW was for a goroutine profile in the trace. (In theory we could also dump the goroutine profile into the trace, like we do with CPU samples. I am not opposed to this, but I probably wouldn't do it to start with.) You should be able to get a close approximation to the current behavior by starting a trace and then immediately grabbing a goroutine profile. Does that sound reasonable? Perhaps I'm missing some use-case that's totally missed. FTR, I fully recognize that we're losing something here in the trace, but I argue the net benefit is worth that cost. Also I just want to disclaim the design details in the last paragraph: subject to change in the first document draft. :) That's just where my head's at right now. It may turn out that the per-M synchronization I have in mind is too complex.
I think it works fine if, like I mention above, we're willing to give a little bit of leeway. Maybe you don't have a snapshot of the state of all goroutines at the moment the trace starts, but you have one from very soon after the trace starts, which is probably good enough?
That's another good point. To be clear, I do plan to have an API with some level of abstraction and not quite just []byte-to-type. :) Events will be opaque and fields will be accessed through methods, so we have a lot of wiggle room. However, something like the My general hope and expectation is that the vast majority of users should never have to look at the API at all, and instead rely on tools built with it. And those that do use the API don't need to understand the file format, just the execution model it presents (which I think is somewhat unavoidable). |
I think not having to STW and enumerate all goroutines was one of the design goals, as it didn't scale well. I take it the ragged barrier approach didn't pan out?
One use case of looking at execution traces as they are now is debugging synchronization issues. Imagine having an N:M producer/consumer model using goroutines and channels, and we're debugging why producers are blocking. The reason might be that all of the consumers are stuck, which is only evident if we can see them be stuck. If they're already stuck at the beginning of the trace then they would be invisible in the new implementation. More generally speaking, a lot of users aren't interested in the per-P or per-M views and instead want to see what each goroutine is doing (see also the per-goroutine timelines in gotraceui.) It turns out that per-G views are useful for debugging correctness and performance issues in user code and that traces aren't only useful for debugging the runtime.
In theory that sounds fine, assuming goroutine profiles are proper STW snapshots? Otherwise it would probably be difficult to synchronize the trace and the profile. At least this would give people the choice if they want to tolerate STW for more detailed traces.
Probably, yeah. |
It's not quite that it didn't pan out and more that it just doesn't work with a per-M approach given other design constraints. The ragged barrier I mentioned in an earlier design sketch is the A per-M approach can side-step a lot of that complexity, but it means we need a way to synchronize all Ms that doesn't involve waiting until the M gets back into the scheduler. What I wrote above is a rough sketch of a proposed lightweight synchronization mechanism that most of the time doesn't require preemption. I think that in general we can't require preemption in a per-M approach if we want to be able to simplify the no-P edge cases and also get events out of e.g. (In effect, I am proposing to shift the |
An aside that might steer you closer to a per-M approach: I tried adding per-M timelines to gotraceui using the current format and found it impossible due to the current event sorting logic. I ran into scenarios where a P would start on an M while the M was still blocked in a syscall. |
2023-12-07 SyncAttendees: @rhysh @mknyszek @prattmic @bboreham @nsrip-dd @felixge
|
2024-01-04 SyncAttendees: @nsrip-dd @felixge @bboreham @prattmic @dominikh @rhysh @thepudds @mknyszek Daniel Schwartz-Narbonne
|
2024-01-18 SyncAttendees: @prattmic @mknyszek @rhysh @bboreham @felixge @nsrip-dd @thepudds
|
I don't have ideas regarding overall design yet. However, since there'll be no way around reading old traces entirely into memory to sort them, I'd like to suggest using the trace parser of Gotraceui. It started out as a copy of It's at https://github.com/dominikh/gotraceui/tree/8fbc7cfaeb3cebed8890efbc030a62a7f1ff3f81/trace — since it started as a direct copy of Go's parser, the git log for the folder should show all relevant changes, and includes some benchmarks. There are some changes that might have to be reverted if we want to support very old traces. My parser dropped support for Go 1.10 and older and it doesn't handle EvFutileWakeup, which haven't been a thing since Nov 2015. The only change that might not be great is 2c5675443eebc969ee07cd4f2063c2d7476f7b9b which removes support for trace formats older than Go 1.11. That change could be reverted if need be, but traces that old wouldn't benefit from my improvements to batch merging. Edit: I've nearly completed an implementation of the conversion. |
2024-02-01 SyncAttendees: @prattmic @felixge @dominikh @nsrip-dd @rhysh @bboreham @mknyszek
|
2024-02-15 SyncAttendees: @nsrip-dd @rhysh @mknyszek @bboreham @felixge @prattmic @dashpole
|
This change does a lot at once, but it's mostly refactoring. First, it moves most of the profile abstraction out of benchmarks/internal/driver and into a new shared package called diagnostics. It also renames profiles to diagnostics to better capture the breadth of what this mechanism collects. Then, it adds support for turning on diagnostics from configuration files. Next, it adds support for generating additional configurations to capture the overhead of collecting diagnostics, starting with CPU profiling. Lastly, it adds support for the new Trace diagnostic. (This change also fixes a bug in go-build where Linux perf flags weren't being propagated.) In the future, core dumps could easily be folded into this new diagnostics abstraction. For golang/go#57175. Change-Id: I999773e8be28c46fb5d4f6a79a94d542491e3754
This change does a lot at once, but it's mostly refactoring. First, it moves most of the profile abstraction out of benchmarks/internal/driver and into a new shared package called diagnostics. It also renames profiles to diagnostics to better capture the breadth of what this mechanism collects. Then, it adds support for turning on diagnostics from configuration files. Next, it adds support for generating additional configurations to capture the overhead of collecting diagnostics, starting with CPU profiling. Lastly, it adds support for the new Trace diagnostic. In the future, core dumps could easily be folded into this new diagnostics abstraction. For golang/go#57175. Change-Id: I999773e8be28c46fb5d4f6a79a94d542491e3754
For golang/go#57175. Change-Id: I999773e8be28c46fb5d4f6a79a94d542491e3754
2024-02-29 SyncAttendees: @mknyszek @prattmic @bboreham @rhysh @nsrip-dd @dashpole Arun (from DataDog)
|
2024-03-14 SyncAttendees: @mknyszek @prattmic @felixge @dashpole @nsrip-dd @rhysh
|
2024-03-28 SyncAttendees: @felixge @mknyszek @prattmic @nsrip-dd @rhysh @thepudds
|
2024-04-11 SyncAttendees: @nsrip-dd @rhysh @bboreham @cagedmantis @dominikh @felixge @mknyszek
|
2024-04-25 SyncAttendees: @mknyszek @felixge @dashpole @rhysh @prattmic @nsrip-dd
|
2024-05-09 SyncAttendees: @prattmic @rhysh Jon B (from DataDog) @nsrip-dd @bboreham @cagedmantis
|
2024-05-23 SyncAttendees: @jba Jon B (from DataDog) @prattmic @rhysh @felixge @nsrip-dd @mknyszek Milind (from Uber)
|
2024-06-06 SyncAttendees: @rhysh @felixge @nsrip-dd @bboreham @cagedmantis Milind (from DataDog) @dashpole @aclements
|
2024-06-20 SyncAttendees: @rhysh @bboreham @cagedmantis @dominikh @mknyszek @prattmic
|
2024-07-18 SyncAttendees: @mknyszek @felixge @rhysh @nsrip-dd @bboreham @cagedmantis
|
2024-08-01 SyncAttendees: @mknyszek @prattmic @bboreham @felixge @nsrip-dd @rhysh Milind (from Uber)
|
2024-08-15 SyncAttendees: @mknyszek @rhysh @felixge @nsrip-dd @cagedmantis
|
2024-08-29 SyncAttendees: @prattmic @bboreham @felixge @nsrip-dd
|
2024-09-12 SyncAttendees: @mknyszek @prattmic @felixge @nsrip-dd @chabbimilind
|
2024-09-26 SyncAttendees: @mknyszek @prattmic @cagedmantis @nsrip-dd @rhysh @thepudds @bboreham @chabbimilind
|
As the Go user base grows, more and more Go developers are seeking to understand the performance of their programs and reduce resource costs. However, they are locked into the relatively limited diagnostic tools we provide today. Some teams build their own tools, but right now that requires a large investment. This issue extends to the Go team as well, where we often put significant effort into ad-hoc performance tooling to analyze the performance of Go itself.
This issue is a tracking issue for improving the state of Go runtime diagnostics and its tooling, focusing primarily on
runtime/trace
traces and heap analysis tooling.To do this work, we the Go team are collaborating with @felixge and @nsrip-dd and with input from others in the Go community. We currently have a virtual sync every 2 weeks (starting 2022-12-07), Thursdays at 11 AM NYC time. Please ping me at
mknyszek -- at -- golang.org
for an invite if you're interested in attending. This issue will be updated regularly with meeting notes from those meetings.Below is what we currently plan to work on and explore, organized by broader effort and roughly prioritized. Note that this almost certainly will change as work progresses and more may be added.
Runtime tracing
Tracing usability
Tracing performance
gentraceback
refactoring (runtime: rewrite gentraceback as an iterator API #54466) (CC @felixge, @aclements)Heap analysis (see #57447)
viewcore
's internal core file libraries (gocore
andcore
) to work with Go at tip.gocore
andcore
are well-tested, and tested at tip.gocore
andcore
externally-visible APIs, allowing Go developers to build on top of it.CC @aclements @prattmic @felixge @nsrip-dd @rhysh @dominikh
The text was updated successfully, but these errors were encountered: