Skip to content

proposal: runtime: add per-goroutine CPU stats #41554

Open
@asubiotto

Description

@asubiotto

Per-process CPU stats can currently be obtained via third-party packages like https://github.com/elastic/gosigar. However, I believe that there exists a need for a certain type of applications to be able to understand CPU usage at a finer granularity.

Example

At a high level in CockroachDB, whenever an application sends a query to the database, we spawn one or more goroutines to handle the request. If more queries are sent to the database, they each get an independent set of goroutines. Currently, we have no way of showing the database operator how much CPU is used per query. This is useful for operators in order to understand which queries are using more CPU and measure that against their expectations in order to do things like cancel a query that's using too many resources (e.g. an accidental overly intensive analytical query). If we had per-goroutine CPU stats, we could implement this by aggregating CPU usage across all goroutines that were spawned for that query.

Fundamentally, I think this is similar to bringing up a task manager when you feel like your computer is slow, figuring out which process on your computer is using more resources than expected, and killing that process.

Proposal

Add a function to the runtime package that does something like:

type CPUStats struct {
    user time.Duration
    system time.Duration
    ...
}

// ReadGoroutineCPUStats writes the active goroutine's CPU stats into
// CPUStats.
func ReadGoroutineCPUStats(stats *CPUStats) 

Alternatives

From a correctness level, an alternative to offering these stats is to LockOSThread the active goroutine for exclusive thread access and then get coarser-grained thread-level cpu usage by calling Getrusage for the current thread. The performance impact is unclear.

Additional notes

Obtaining execution statistics during runtime at a fine-grained goroutine level is essential for an application like a database. I'd like to focus this conversation on CPU usage specifically, but the same idea applies to goroutine memory usage. We'd like to be able to tell how much live memory a single goroutine has allocated to be able to decide whether this goroutine should spill a memory-intensive computation to disk, for example. This is reminiscent of #29696 but at a finer-grained level without a feedback mechanism.

I think that offering per-goroutine stats like this is useful even if it's just from an obervability standpoint. Any application that divides work into independent sets of goroutines and would like to track resource usage of a single group should benefit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Status

    Incoming

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions