runtime: CPU profiles should attribute time spent in runtime.morestack to the function call that triggered it #25943
What did you do?
Details can be found in #18138
What did you expect to see?
I expected CPU profiles to attribute time spent in runtime.morestack to the function that triggered stack growth.
What did you see instead?
Time spent in runtime.morestack is measured properly, but is not tied to any particular stack or series of function calls, making it difficult to determine which code path is experiencing excessive goroutine stack growth.
Does this issue reproduce with the latest release (go1.10.3)?
The text was updated successfully, but these errors were encountered:
Should we make the existing cpu profile to show the morestack call stack stitched with the goroutine stack that caused stack growth (that means I guess runtime.gentraceback behavior change)?
Or do we want to create a new built-in profile for stack size changes (per @josharian's comment in the issue 18138)?
Just back from a Go NYC meetup, where @richardartoul went through the backstory here, so I took a second look at this bug. I think I see at least the first problem, but I haven't actually tested it. Notes for whoever works on this next:
Profiling (on Linux) is driven by the
I think I'd start by changing that test to do check against a list of functions that switch to the system stack, not just
(All of this is for CPU profiling. @hyangah looked at memory profiling and IIRC concluded that there was a hardcoded test in there somewhere that just cut off the stack trace for runtime functions, but I don't have the link handy.)
gentraceback handles system stack transitions, but only when they're done by systemstack(). Handle morestack() too. I tried to do this generically but systemstack and morestack are actually *very* different functions. Most notably, systemstack returns "normally", just messes with $sp along the way. morestack never returns at all -- it calls newstack, and newstack then jumps both stacks and functions back to whoever called morestack. I couldn't think of a way to handle both of them generically. So don't. The current implementation does not include systemstack on the generated traceback. That's partly because I don't know how to find its stack frame reliably, and partly because the current structure of the code wants to do the transition before the call, not after. If we're willing to assume that morestack's stack frame is 0 size, this could probably be fixed. For posterity's sake, alternatives tried: - Have morestack put a dummy function into schedbuf, like systemstack does. This actually worked (see patchset 1) but more by a series of coincidences than by deliberate design. The biggest coincidence was that because morestack_switch was a RET and its stack was 0 size, it actually worked to jump back to it at the end of newstack -- it would just return to the caller of morestack. Way too subtle for me, and also a little slower than just jumping directly. - Put morestack's PC and SP into schedbuf, so that gentraceback could treat it like a normal function except for the changing SP. This was a terrible idea and caused newstack to reenter morestack in a completely unreasonable state. To make testing possible I did a small redesign of testCPUProfile to take a callback that defines how to check if the conditions pass to it are satisfied. This seemed better than making the syntax of the "need" strings even more complicated. Updates #25943 Change-Id: I9271a30a976f80a093a3d4d1c7e9ec226faf74b4 Reviewed-on: https://go-review.googlesource.com/126795 Run-TryBot: Heschi Kreinick <email@example.com> TryBot-Result: Gobot Gobot <firstname.lastname@example.org> Reviewed-by: Austin Clements <email@example.com>
This should now be fixed for CPU profiles. Note that due to implementation issues
Memory profiling of runtime functions is a separate issue which I haven't looked at closely. The change above probably helped, but it doesn't matter due to this policy in the protobuf writer:
which accounts all memory allocation by runtime functions to the user code that called them. This is probably to hide stuff like string-to-byte conversions, which don't show up explicitly in user code.
I'm going to close this issue since the original comment requested CPU only. We can reopen if necessary.
I agree, but since the morestack call is never returned to, putting it in the backtrace would require a level of specific hackery that I was uncomfortable with. I tried making newstack return normally in the common case but it proved more difficult than I think it's worth.
I don't think newstack is that much less clear than morestack. If it actually causes someone trouble we can revisit.