Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: API for unstable metrics #37112

Open
mknyszek opened this issue Feb 7, 2020 · 9 comments
Open

proposal: runtime: API for unstable metrics #37112

mknyszek opened this issue Feb 7, 2020 · 9 comments

Comments

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Feb 7, 2020

Proposal: API for unstable runtime metrics

Background & Motivation

Today runtime metrics are exposed in two ways.

The first way is via the struct-based sampling APIs runtime.ReadMemStats and runtime/debug.GCStats. These functions accept a pointer to a struct and then populate the struct with data from the runtime.

The problems with this type of API are:

  • Removing/renaming old metrics from the structs is impossible.
    • For example, MemStats.BySize is hard-coded to 61 size classes when there are currently 83. We cannot ever change BySize.
  • Adding implementation-specific metrics to the structs is discouraged, because it pollutes the API when inevitably they'll be deprecated.
  • runtime.ReadMemStats has a global effect on the application because it forces a STW. This has a direct effect on latency. Being able to tease apart which metrics actually need gives users more control over performance.

The good things about this type of API are:

  • Protected by the Go 1 compatibility promise.
  • Easy for applications to ingest, use for their own purposes, or push to a metrics collection service or log.

The second is via GODEBUG flags which emit strings containing metrics to standard error (e.g. gctrace, gcpacertrace, scavtrace).

The problems with this type of API are:

  • Difficult for an application to ingest because it must be parsed.
  • Format of the output is not protected by the Go 1 backwards compatibility promise.

The good things about this type of API are:

  • We can freely change it and add implementation-specific metrics.
  • We never have to live with bad decisions.

I would like to propose a new API which takes the best of both approaches.

Requirements

  • The API should be easily extendable with new metrics.
  • The API should be easily retractable, to deprecate old metrics.
    • Removing a metric should not break any Go applications as per the Go 1 compatibility promise.
  • The API should be discoverable, to obtain a list of currently relevant metrics.
  • The API should be rich, allowing a variety of metrics (e.g. distributions).
  • The API implementation should minimize CPU/memory usage, such that it does not appreciably
    affect any of the metrics being measured.
  • The API should include useful existing metrics already exposed by the runtime.

Goals

Given the requirements, I suggest we prioritize the following concerns when designing the API in the following order.

  1. Extensibility.
    • Metrics are “unstable” and therefore it should always be compatible to add or remove metrics.
    • Since metrics will tend to be implementation-specific, this feature is critical.
  2. Discoverability.
    • Because these metrics are “unstable,” there must be a way for the application, and for the human writing the application, to discover the set of usable metrics and be able to do something useful with that information (e.g. log the metric).
    • The API should enable collecting a subset of metrics programmatically. For example, one might want to “collect all memory-related metrics” or “collect all metrics which are efficient to collect”.
  3. Performance.
    • Must have a minimized effect on the metrics it returns in the steady-state.
    • Should scale up to 100s of metrics, an amount that a human might consider “a lot.”
      • Note that picking the right types to expose can limit the amount of metrics we need to expose. For example, a distribution type would significantly reduce the number of metrics.
  4. Ergonomics.
    • The API should be as easy to use as it can be, given the above.

Design

See full design document at https://golang.org/design/37112-unstable-runtime-metrics.

Highlights:

  • Expose a new sampling-based API in a new package, the runtime/metrics package.
  • Use string keys for each metric which include the unit of the metric in an easily-parseable format.
  • Expose a discovery API which provides metadata about each metric at runtime, such as whether it requires a STW and whether it's cumulative (counter as opposed to a gauge).
  • Add a Histogram interface to the package which represents a distribution.
  • Support for event-based metrics is discussed and left open, but considered outside the scope of this proposal.

Backwards Compatibility

Note that although the set of metrics the runtime exposes will not be stable across Go versions, the API to discover and access those metrics will be.

Therefore, this proposal strictly increases the API surface of the Go standard library without changing any existing functionality and is therefore Go 1 compatible.

@gopherbot gopherbot added this to the Proposal milestone Feb 7, 2020
@gopherbot gopherbot added the Proposal label Feb 7, 2020
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Feb 8, 2020

Change https://golang.org/cl/218677 mentions this issue: design/37112-unstable-runtime-metrics.md: add proposal

@mknyszek mknyszek added this to Incoming in Proposals Feb 27, 2020
@rsc rsc changed the title proposal: API for unstable runtime metrics proposal: runtime: API for unstable metrics Mar 4, 2020
@rsc rsc moved this from Incoming to Active in Proposals Mar 4, 2020
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Mar 11, 2020

@mknyszek, what's the status of this? Is the doc ready for review and discussion (if so please check it in)?

@mknyszek

This comment has been minimized.

Copy link
Contributor Author

@mknyszek mknyszek commented Mar 11, 2020

The doc is ready for review and discussion, I just don't have +2 or a review to land it yet. I'll try to ping more people to look at the CL.

gopherbot pushed a commit to golang/proposal that referenced this issue Mar 18, 2020
This change adds a proposal and design document for addition of a
runtime metrics package which is designed to support adding and removing
runtime metrics without breaking the Go 1 compatibility promise.

Updates golang/go#37112.

Change-Id: I550740970ad74c5c71d712735e4984808bdbf463
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/218677
Reviewed-by: Austin Clements <austin@google.com>
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Mar 19, 2020

A couple of high-level comments:

  1. The Histogram interface seems too abstract. The user will need to know the space of possible concrete types anyway, and won't be able to use a given Histogram without knowing its concrete element type. Instead, I would be inclined to provide concrete Float64Histogram and TimeHistogram types — and then you don't need to worry about whether you can add methods to the interface in the future.
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Mar 19, 2020

  1. Should Metric type include type information about the metric values? That would allow the user to skip reading metrics that they won't be able to interpret anyway.

Actually, perhaps the Metric should include an allocation function..? That would make the allocation strategy a bit less implicit, and might reduce the risk of aliasing bugs in user code.

@mknyszek

This comment has been minimized.

Copy link
Contributor Author

@mknyszek mknyszek commented Mar 19, 2020

A couple of high-level comments:

  1. The Histogram interface seems too abstract. The user will need to know the space of possible concrete types anyway, and won't be able to use a given Histogram without knowing its concrete element type. Instead, I would be inclined to provide concrete Float64Histogram and TimeHistogram types — and then you don't need to worry about whether you can add methods to the interface in the future.

Given that we probably need to copy data out of the runtime anyway, I think you're right that it makes sense to just have concrete types.

  1. Should Metric type include type information about the metric values? That would allow the user to skip reading metrics that they won't be able to interpret anyway.

Actually, perhaps the Metric should include an allocation function..? That would make the allocation strategy a bit less implicit, and might reduce the risk of aliasing bugs in user code.

I considered including type information in the Metric in order to let users filter out metrics they don't understand, and I'm certainly not opposed. An allocation function is also a neat idea. One less potential footgun, but it is more work the user has to do to set this all up.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Mar 19, 2020

It may also be worthwhile to have Read report an explicit error if a requested metric does not exist.

(I could imagine that someone might try to call Read with a hard-coded list of metrics, and then the user code may produce an obscure panic when trying to read the unpopulated value.)

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Apr 1, 2020

I know there are minor details still being worked out, like whether Read returns an error in some cases. Other than those, it seems like people are generally in favor of this proposal.

Am I reading this wrong? Does anyone object to accepting this proposal?

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Apr 8, 2020

OK, then based on the discussion above this seems like a likely accept.

@rsc rsc moved this from Active to Likely Accept in Proposals Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Likely Accept
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.