-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: average stack size is not a fit for every application #61052
Comments
The Go project has explicitly avoided providing knobs like this. It adds complexity that we have to live with forever. I don't think we'd want to change that stance to get just a few percent performance improvement.
This might be possible with the new And given above, I don't see a compelling need to provide this data given that we don't intend to provide a way to use it. |
The assumption a good global average size exists appears to be false for many applications. Allowing the developer to control the global value doesn't improve the situation. It would be better to automatically track/set the initial stack size per |
Average is not a very representative metric. It deserves some more exploration of other statistical measures: e.g., median/mode/... |
You could also imagine incorporating this into PGO somehow. |
Thanks for the response @randall77! (disclaimer, I work in the same corporation as @cdvr1993, but on a different domain)
This is a bad phrasing on our part. We would also strongly prefer not to have knobs and for things to work (mostly) okay. The goal thing was to report the situation, list what we tried/thought about, and ask for advice. To answer your points 1 by 1:
As it stands, we'll likely end up using the unsafe/linkname tricks in at least the worst affected applications. |
@randall77 could I maybe ask you to reconsider the stat emission? Or is that an absolute strong no? Even without more knobs, we can very well start a "goroutine ballast" already today. It would be much easier to do if we had stats to base it on. |
I don't have any objection to exposing such a stat by itself. I just think it would take a fair amount of work to implement - it isn't just exporting something the runtime is already keeping track of. Given the extra work required and limited payoff I don't think it is worth doing. |
Instead of exposing this via runtime/metrics, would it be cheaper to expose as part of the pprof or some other explicit call? E.g., include the stack size for each goroutine in the pprof goroutines output. |
I was actually able to get this info from cpu profiles. Not sure if it is the correct approach but at least I have some info. So basically I focus on copystack on one profile and then disassemble the binary so that I can get the stack usage of each stack trace. Then I just order them based on total size. With that I was able to determine that the best starting stack size for the application I mentioned previously was 32KB which allowed me to reduce copystack from 9.5% to 0.8%. The remaining would require a somewhat big jump on stack size, so I decided to stay on 32KB. |
In go1.19 was introduced that stack are created basen on the historical average size. Even though this has improved the landscape for our applications, we still experience issues with some of them.
To give one real example. Let's introduce a 15k cores application. Originally the profiles were showing close to 10% cpu being consumed by stack growth. If we translate this to cores that's ~1.5k cores.
The stack usage is very low (we never exceed 16M):
After adding to our metrics to publish the runtime/metrics that exposes the initial stack size we could observe that 99% of the time the stack size is set to 2KB.
By using go:linkname we were able to expose the initial stack size global variable. So what we did was to enable "gcshrinkstackoff" and disable "adaptivestackstart", to not shrink stacks and to avoid the runtime overriding our value.
Finally we ran several experiments injecting different stack sizes:
4KB
8KB
16KB
We decided to stop at 16KB because the gains were approaching 0, but as you can see we were able to reduce copystack by 7%.
Memory went up, but it is still reasonable. From 16MB to 50MB.
With all that said, would it be possible for the Go runtime to expose:
Or do you have any idea on how to implement this more safely (not requiring private linking).
The text was updated successfully, but these errors were encountered: