-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/trace: Add traceEvCPUProfileRate Event #60701
Comments
This will improve our ability to analyze OS scheduling latency. See golang/go#60701 for a better approach in the long run. For PROF-7728
This will improve our ability to analyze OS scheduling latency. See golang/go#60701 for a better approach in the long run. For PROF-7728
This will improve our ability to analyze OS scheduling latency. See golang/go#60701 for a better approach in the long run. For PROF-7728
OS scheduler latency is frustrating, and I agree that better tools for tracking it down would be nice. First, on the problem of "how to detect OS scheduler latency". When I've seen it in execution traces (from quota exhaustion), it appears as if a goroutine is running on a P with no interruptions for 50+ milliseconds. (But, maybe it shows up in ways that I'm not currently able to see? Can you say more on how it appears in apps you support?) Historically speaking, and IIUC, the sysmon goroutine has been written to interrupt a G after it's been on a P for a full 10-millisecond time quantum of its own. A G might use the remainder of another goroutine's time quantum "for free" and then get a full time slice of its own, so in my experience CPU-bound goroutines often run for 9.99 + 10 = 19.99 ms in a row. It's possible this behavior has changed recently. But if not, that may be a somewhat reliable way to detect latency from the OS scheduler (though not guaranteed to be stable, including for operators who might tweak the sysmon settings). Second, on another hurdle to using CPU profile events for this purpose. The But, for your use case of knowing whether there's OS scheduler latency, if the P+M hasn't taken an exclusive lock on a G (by putting it in the _Grunnable state, then maybe it doesn't matter much if some particular M isn't scheduled by the OS? Third, on The fourth thing on my mind is that a mismatch between time that a P is in _Prunning and the total value of CPU profile samples that its M collects can diverge (on small scales) for other reasons. In I'm not sure how much coordination can be expected between the code that asks for a CPU profile and the code that asks for an execution trace. It sounds like you're preparing for there to be little to none. I wonder if there's a path forward that involves accessing more runtime internals via the runtime/metrics package. That wouldn't allow the execution-trace-collecting code to be notified of changes to the CPU profiler config, but it could allow it to ask about things like "is the sysmon interval still 10ms". And then identify OS scheduler latency by looking for goroutines that appear to run for more than 20ms. As I understand it, Go's 10ms scheduling quanta are small relative to the OS's scheduling quanta (which for the case of CPU quota might be 100ms or 250ms), and that could be an advantage in detection. |
@rhysh thanks for your thoughtful analysis. In additions to my comments below, I've also added another use case to the issue description that is hopefully simpler: Extract CPU Profile.
I just started looking into this problem, so I haven't had a chance to analyze any real-world data for this problem yet. That being said, I suspect it will look similar to what you're describing.
Last time I looked at this, I saw goroutines run up to 25ms before being preempted on darwin, but this 5ms mismatch is likely due to usleep accuracy on that platform. But on a platform with accurate timers I'd also expect goroutines running for > 20ms in a row to be a sign of OS scheduling latency. That being said, using such a heuristic feels brittle in the future, see #60693.
Yeah. I'm not trying to infer the OS scheduling latency experiencee by an M. I'm trying to understand the impact on Gs that are in _Grunning.
Yup.
Good point. It seems like all of these events would still have a
Interesting. But during this usleep the P would not have a G that is in _Grunning state itself? I'm trying to sum up _Grunning time rather than _PRunning time right now.
I expect the code that starts the trace to also be the code that controls the CPU profile rate. I'm just worried that I can't guarantee this right now.
I'm definitely open to exploring alternative solutions here, but as mentioned above, I'm worried that the concept of a scheduling quanta might go away when there are idle Ps, see #60693. |
Regarding OS scheduler latency, this could be a good opportunity to combine Go execution tracer with external data, notably https://github.com/google/schedviz/blob/master/doc/walkthrough.md is an example visualization tool of this data. With this data, one could imagine the trace viewer having "M descheduled" events on a per-M view, as well as adding a per-CPU view. |
Given that there may be better ways to identify OS scheduler latency, I think we should consider this in the context of a more general effort to make execution traces a superset of CPU profiles. In that context, it might not make sense to have an event for the rate and it might be better to attach this information to a broader "CPUProfileStart" event. (A "CPUProfileStop" event would also be a fairly natural place to drop in some of the other information, for example an encoded form of the memory mapping.) |
I don't think this needs to go through the proposal process, so removing the proposal label. I think this is up to the runtime team. |
Use Case 1: Extract CPU Profile
If an execution trace contains CPU sample events, it would be useful to extract a
cpu/nanoseconds
CPU profile similar to the one produced by runtime.StartCPUProfile.This can be useful when building tools. E.g. a "trace to CPU profile" tool. Or perhaps a tool for explaining the _Grunning time of a goroutine using the CPU samples for that goroutine. A naive solution would give equal weight to each collected CPU sample and stretch it over the sum of the _Grunning of the goroutine, but that could be misleading in case of scheduler latency, see below.
Use Case 2: Understand OS Scheduler Latency
A Go application might experience two types of scheduler latency: OS Scheduler Latency and Go Scheduler Latency. The latter can easily be analyzed using the execution tracer.
Detecting OS scheduler latency is a bit more tricky, but possible. Over a long enough time period, the cumulative time goroutines spend in
running
state should converge to the cumulative number oftraceEvCPUSample
events multiplied by their duration (default10ms
). If there are significantly lesstraceEvCPUSample
events than expected, that's a strong indicator that the application is not getting enough scheduling time from the OS. That's a common problem for some setups, so it'd be nice to use tracing data to detect it.(There are some dragons here when it comes to CPU samples received during syscalls/cgo ... but I think that deserves a separate discussion)
Problem:
The
traceEvCPUSample
event does not include a value indicating how much CPU time it represents:go/src/runtime/trace.go
Line 74 in 39effbc
One could assume that it's always
10ms
, but that won't work if the user calls runtime.SetCPUProfileRate. Unfortunately the execution trace does record this value, and it's not possible to get the currently active value from user land either. Unlike SetMutexProfileFraction, SetCPUProfileRate does not return a value, and there is noGetCPUProfileRate
method either.Additionally it's currently not possible calculate the expected number of
traceEvCPUSample
events if the CPU profiler is not enabled for the entire duration of the trace.Suggestion:
Add a new
traceEvCPUProfileRate
event that is recorded in the following case:0
)Alternatively we could also have a start/stop event for the CPU profiler.
cc @mknyszek @prattmic @nsrip-dd @rhysh
The text was updated successfully, but these errors were encountered: