Skip to content

Commit

Permalink
Specify MeterProvider configurable cardinality limits (#2960)
Browse files Browse the repository at this point in the history
Fixes #1891.

**EDIT: Updated to specify cardinality limits at the View/Instrument
level with a Reader-level default. Updated to use a hard limit**

## Changes

Adds optional support for a maximum cardinality limit.

The recommended default is 2000, based on this comment by
#1891 (comment)
@jack-berg.

~The Prometheus-WG SIG discussed this on Nov 9, 2022 and reached this
recommended solution to the problem outlined in #1891. The consequence
of exceeding these limits is in line with the current Prometheus server
behavior, which drops targets that misbehave. The discussed was
summarized here:
#1891 (comment)
  • Loading branch information
jmacd committed May 8, 2023
1 parent 88e0e7e commit 1997dd1
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 0 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ release.

- Add experimental histogram advice API.
([#3216](https://github.com/open-telemetry/opentelemetry-specification/pull/3216))
- Recommended cardinality limits to protect metrics pipelines against
excessive data production from a single instrument.
([#2960](https://github.com/open-telemetry/opentelemetry-specification/pull/2960))
- Specify second unit (`s`) and advice bucket boundaries of `[]`
for `process.runtime.jvm.gc.duration`.
([#3458](https://github.com/open-telemetry/opentelemetry-specification/pull/3458))
Expand Down
66 changes: 66 additions & 0 deletions specification/metrics/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ linkTitle: SDK
* [Use the maximum scale for single measurements](#use-the-maximum-scale-for-single-measurements)
* [Maintain the ideal scale](#maintain-the-ideal-scale)
* [Observations inside asynchronous callbacks](#observations-inside-asynchronous-callbacks)
* [Cardinality limits](#cardinality-limits)
+ [Synchronous instrument cardinality limits](#synchronous-instrument-cardinality-limits)
+ [Asynchronous instrument cardinality limits](#asynchronous-instrument-cardinality-limits)
- [Meter](#meter)
* [Duplicate instrument registration](#duplicate-instrument-registration)
* [Instrument name](#instrument-name)
Expand Down Expand Up @@ -235,6 +238,12 @@ are the inputs:
`exemplar_reservoir` (optional) to use for storing exemplars. This should be
a factory or callback similar to aggregation which allows different
reservoirs to be chosen by the aggregation.
* **Status**: [Experimental](../document-status.md) - the
`aggregation_cardinality_limit` (optional) associated with the view. This
should be a positive integer to be taken as a hard limit on the
number of data points that will be emitted during a single
collection by a single instrument. See [cardinality limits](#cardinality-limits),
below.

In order to avoid conflicts, views which specify a name SHOULD have an
instrument selector that selects at most one instrument. For the registration
Expand Down Expand Up @@ -582,6 +591,62 @@ execution.
The implementation MUST complete the execution of all callbacks for a
given instrument before starting a subsequent round of collection.

### Cardinality limits

**Status**: [Experimental](../document-status.md)

Views SHOULD support being configured with a cardinality limit to be
applied to all aggregators not configured by a specific view, specified
via `MetricReader` configuration.

View configuration SHOULD support applying per-aggregation cardinality limits.

The cardinality limit is taken as an exact, hard limit on the number
of data points that can be written per collection, per aggregation.
Each aggregation configured view MUST NOT output more than the
configured `aggregation_cardinality_limit` number of data points per
period.

The RECOMMENDED default aggregation cardinality limit is 2000.

An overflow attribute set is defined, containing a single attribute
`otel.metric.overflow` having (boolean) value `true`, which is used to
report a synthetic aggregation of the metric events that could not be
independently aggregated because of the limit.

The SDK MUST create an Aggregator with the overflow attribute set
prior to reaching the cardinality limit and use it to aggregate events
for which the correct Aggregator could not be created. The maximum
number of distinct, non-overflow attributes is one less than the
limit, as a result.

#### Synchronous instrument cardinality limits

Views of synchronous instruments with cumulative aggregation
temporality MUST continue to export the all attribute sets that were
observed prior to the beginning of overflow. Metric events
corresponding with attribute sets that were not observed prior to the
overflow will be reflected in a single data point described by (only)
the overflow attribute.

Views of synchronous instruments with delta aggregation temporality
MAY choose an arbitrary subset of attribute sets to output to maintain
the stated cardinality limit.

Regardless of aggregation temporality, the SDK MUST ensure that every
metric event is reflected in exactly one Aggregator, which is either
an Aggregator associated with the correct attribute set or an
aggregator associated with the overflow attribute set.

Events MUST NOT be double-counted or dropped during an
overflow.

#### Asynchronous instrument cardinality limits

Views of asynchronous instruments SHOULD prefer the first-observed
attributes in the callback when limiting cardinality, regardless of
aggregation temporality.

## Meter

Distinct meters MUST be treated as separate namespaces for the purposes of detecting
Expand Down Expand Up @@ -862,6 +927,7 @@ SHOULD provide at least the following:
* The `exporter` to use, which is a `MetricExporter` instance.
* The default output `aggregation` (optional), a function of instrument kind. If not configured, the [default aggregation](#default-aggregation) SHOULD be used.
* The default output `temporality` (optional), a function of instrument kind. If not configured, the Cumulative temporality SHOULD be used.
* The default aggregation cardinality limit to use, a function of instrument kind. If not configured, a default value of 2000 SHOULD be used.

The [MetricReader.Collect](#collect) method allows general-purpose
`MetricExporter` instances to explicitly initiate collection, commonly
Expand Down

0 comments on commit 1997dd1

Please sign in to comment.