Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
proposal: runtime: add SchedStats API #15490
MemStats provides a way to monitor allocation and garbage collection.
We need a similar facility to monitor the Scheduler.
@minux, while true, runtime/trace seems like a pretty high overhead way to collect what amounts to a fairly small amount of information. It's certainly low overhead for what it does, but what it does is much more than what's needed here. The metrics @deft-code wants are primarily intended for continuous monitoring (based on offline conversations), so it needs to be cheap.
Here are the notes on the desired metrics I had from our meeting a while ago:
Ring buffer of sampled duration between entering and exiting runnable state
Four global stats
Maybe current number of running goroutines
Sorry, I'd lost track of the fact that there was a concrete proposal doc for this: https://github.com/deft-code/proposal/blob/master/design/15490-schedstats.md
@deft-code, could you mail a CL to add this to the go-proposal repository and, once submitted, edit your first post to link to it? Thanks.
referenced this issue
Feb 24, 2017
Update: Some teams inside Google tried CL 38180 for monitoring CPU load and performing load shedding. However, they've found that the stats provided in the CL aren't a good indicator of CPU load. In particular, it seemed that since goroutines are so cheap (compared to, say, using runnable threads as a load indicator), the runnable goroutine count often fluctuated dramatically, even when the system was under normal load. It was common to see a huge number of goroutines newly started or woken that would exit or sleep almost immediately once run. If the load shedder happened to sample the SchedStats during one of these spikes, it would think the system was overloaded.
There may be some other stat that is a more robust indicator of load. For example, maybe smoothing would help: the runtime could provide the time-integral of the runnable count, from which an application could compute the average runnable count over any desired time window. Or it could expose something similar for the running goroutines to give a measure of idle time (once a system was overloaded, this wouldn't be able to tell how overloaded it was).