Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric SDK specification OUTLINE #347

Merged
merged 36 commits into from
Aug 20, 2020
Merged
Changes from 3 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
c10ba0e
WIP: Metric SDK specification
jmacd Nov 12, 2019
bfda24a
Updates following Tigran's feedback
jmacd Nov 12, 2019
b04e927
More rewording
jmacd Nov 12, 2019
401292b
Strengthen requirements for aggregators
jmacd Nov 13, 2019
6990089
Fix typos noted by MrAlias
jmacd Nov 13, 2019
077141b
Address some of freeformzSFDC's feedback
jmacd Nov 14, 2019
1542a43
Upstream
jmacd Dec 13, 2019
3cb05a4
Capitalization
jmacd Dec 16, 2019
7e785de
Respond to feedback
jmacd Dec 17, 2019
35dff70
Respond to feedback
jmacd Dec 17, 2019
9d40730
Handle->Bound instrument
jmacd Dec 18, 2019
3ec4e00
New img
jmacd Dec 21, 2019
6f3fbdb
New img ref
jmacd Dec 21, 2019
8692b24
Update image
jmacd Dec 23, 2019
2b75442
Rename to Differentiator/Integrator
jmacd Dec 23, 2019
7be3138
Remove reference to defaultkeys batcher
jmacd Apr 14, 2020
f9866aa
Batcher->Integrator
jmacd Apr 14, 2020
931bd8c
Differentiator->Accumulator
jmacd Apr 14, 2020
535fc0e
Upstrema
jmacd May 26, 2020
4405845
Upstream
jmacd May 26, 2020
f358adf
Move the image
jmacd May 26, 2020
1fc3ec1
Update image
jmacd May 26, 2020
64bbb0c
Simplify the diagram
jmacd May 27, 2020
7063963
Upstream
jmacd Aug 11, 2020
8929123
Remove much dead code
jmacd Aug 19, 2020
b36917a
Remove more dead code
jmacd Aug 19, 2020
f698d3c
Lint
jmacd Aug 19, 2020
16a317b
Upstream
jmacd Aug 19, 2020
53d4418
Ignore
jmacd Aug 19, 2020
4857098
Rename metrics SDK
jmacd Aug 19, 2020
023838d
Bold
jmacd Aug 19, 2020
80d0b9a
CheckpointSet -> ExportRecordSet
jmacd Aug 19, 2020
80c16b2
Update diagram
jmacd Aug 20, 2020
d5a7162
Update diagram (png)
jmacd Aug 20, 2020
5c1700a
Editing
jmacd Aug 20, 2020
6aa3683
Undo
jmacd Aug 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions specification/sdk-metric.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
# Metric SDK

_This document is derived from the Golang Metrics SDK prototype. See
the currently open PRs:_
1. [Pipeline and stdout exporter](https://github.com/open-telemetry/opentelemetry-go/pull/265)
1. [Dogstatsd exporter](https://github.com/jmacd/opentelemetry-go/pull/7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we did not make any commitment to support a vendor property protocol in OpenTelemetry. We should probably consider to remove this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also already have Stackdriver in the go OTel tracing directories... iirc the decision was that it was okay for now but would need to migrate out of otel before 1.0?

I personally would lobby for sooner, but there are examples of proprietary exporters already in there...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I anticipated this sort of remark, and wrote the following in the original PR:

open-telemetry/opentelemetry-go#326 (comment)

If an exporter were produced that called a vendor's library (or used a vendor's types) directly, I would not try to put this in the otel repo, but do see the dogstats variation on statsd as being widely supported outside of datadog (e.g., in Veneur).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say that I'm only weakly committed to keeping this code in the otel repo. I'm aware of one effort in Go to create a direct export using the Datadog-Go client library, that's certainly not welcome in the otel repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI @jbarciauskas) My position is that DataDog ought to publish a specification and declare Dogstatsd an open protocol, which would let us resolve this question.

1. [Prometheus exporter](https://github.com/open-telemetry/opentelemetry-go/pull/296)

## Glossary
jmacd marked this conversation as resolved.
Show resolved Hide resolved

__Metric update__: The term _metric update_ refers to any single
operation on a metric instrument; each handle-oriented and direct call
jmacd marked this conversation as resolved.
Show resolved Hide resolved
imply a single metric update, whereas each RecordBatch operation
implies a batch of metric updates. See the user-facing API
specification for definitions of the three [calling
conventions](api-metrics-user.md).

__Aggregator__: The term _aggregator_ refers to an implementation that
jmacd marked this conversation as resolved.
Show resolved Hide resolved
can combine multiple metric updates into a single, combined state.
For example, a Sum aggregator combines multiple `Add()` updates into
single sum. Aggregators must support concurrent updates. Aggregators
support a `Checkpoint()` operation, which saves a snapshot of the
current aggregate state for collection, and a `Merge()` operation,
which combines state from two aggregators into one.

__Dimensionality reduction__: The user-facing metric API allows users
to supply LabelSets containing an unlimited number of labels for any
metric update. Some metric exporters will reduce the set of labels
when exporting metric data, either to reduce cost or because of
system-imposed requirements. A _dimensionality reduction_ maps input
LabelSets with (potentially) a large number of labels into a smaller
LabelSet containing only labels for an explicit set of label keys.
Performing dimensionality reduction in an metrics export pipeline
generally means merging Aggregators computed for original LabelSets
into a single combined Aggregator for the reduced-dimension LabelSet.

__Export record__: The _Export record_ is an exporter-independent
in-memory representation combining the metric instrument, the LabelSet
for export, and the associated (checkpointed) Aggregator containing
jmacd marked this conversation as resolved.
Show resolved Hide resolved
its state. Metric instruments are described by a metric descriptor.

__Metric descriptor__: A _metric descriptor_ is an in-memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the descriptors go further though and contain suggested dimensionality reduction that should be applied to them, right? e.g. if I specify in the descriptor that I use labels A and B, then all other labels will be dropped when aggregating and combining on the remaining unique values for A and B... so I feel this needs to talk specifically about how it relates to dimensionality.

representation of the metric instrument, including all the information
provided in when it was defined.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

## Meter implementation

The Meter API provides methods to create metric instruments, metric
instrument handles, and label sets. This document describes the
jmacd marked this conversation as resolved.
Show resolved Hide resolved
standard Meter implementation and supporting packages used to build
a complete metric export pipeline.

The Meter implementation stands at the start of the export pipeline,
where it interfaces with the user-facing API and receives metric
updates. The Meter's primary job is to maintain active state about
pending metric updates. The most important requirement placed on the
jmacd marked this conversation as resolved.
Show resolved Hide resolved
Meter implementation is that be able to "forget" state about metric
jmacd marked this conversation as resolved.
Show resolved Hide resolved
updates after they are collected.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

The Meter implementation SHOULD ensure that operations on instrument
jmacd marked this conversation as resolved.
Show resolved Hide resolved
handles be fast. Metric updates made via an instrument handle, when
used with an aggregator defined by simple atomic operations, should
follow a very short code path.

The Meter implementation provides a `Collect()` method to initiate
jmacd marked this conversation as resolved.
Show resolved Hide resolved
collection. Batcher and Exporter implementations are written with the
jmacd marked this conversation as resolved.
Show resolved Hide resolved
assumption that collection is single-threaded, therefore the Meter
implementation MUST prevent concurrent `Collect()` calls. During the
jmacd marked this conversation as resolved.
Show resolved Hide resolved
collection pass, the Meter implementation checkpoints each active
Aggregator and passes it to the Batcher for processing.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

This document does not specify how to coordinate synchronization
between user-facing metric updates and metric collection activity,
however Meter implementations SHOULD make efforts to avoid lock
contention by holding locks only briefly or using lock-free
techniques. Meter implementations MUST ensure that there are no lost
updates.

### Meter aggregation preserves LabelSet dimensions
jmacd marked this conversation as resolved.
Show resolved Hide resolved

The Meter acts as a short-term store for aggregating metric updates
jmacd marked this conversation as resolved.
Show resolved Hide resolved
within a collection period. The Meter implementation maintains
Aggregators for active metric instruments according to the complete,
original LabelSet. This ensures a relatively simple code path for
entering metric updates into the Meter implementation.

jmacd marked this conversation as resolved.
Show resolved Hide resolved
Reducing dimensions for export is the responsibility of the Batcher.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
As a consequence, the cost and complexity of dimensionality reduction
affects only the collection pass.

### Recommended implementation

The Meter implementation supports all three metric [calling
conventions](api-metrics-user.md): handle-oriented calls, direct
jmacd marked this conversation as resolved.
Show resolved Hide resolved
jmacd marked this conversation as resolved.
Show resolved Hide resolved
calls, and RecordBatch calls. Although not a requirement, we
recommended the following approach for organizing the Meter
implementation.

Of the three calling conventions, direct calls and RecordBatch calls
can be easily converted into handle-oriented calls using short-lived
handles. For example, a direct call can be implemented by acquiring a
handle, operating on the handle, and immediately releasing the handle.

```golang
// RecordOne converts a direct call into a handle-oriented call by allocating
// a short-lived handle.
func (inst *instrument) RecordOne(ctx context.Context, number core.Number, labelSet api.LabelSet) {
h := inst.AcquireHandle(labelSet)
defer h.Release()
h.RecordOne(ctx, number)
}
```

The Meter implementation tracks an internal set of records, where
every record either: (1) has a current, un-released handle pinning it
in memory, (2) has pending updates that have not been collected, (3)
is a candidate for removing from memory. The Meter maintains a
jmacd marked this conversation as resolved.
Show resolved Hide resolved
mapping from the pair (Instrument, LabelSet) to an active record.
Each active record contains an Aggregator implementation, which is
responsible for incorporating a series of metric updates into the
current state.

Because of short-lived handles, the SDK may accumulate records that
are not associated with a user-held handle. After these records are
collected they may be removed from the (Instrument, LabelSet) map of
active records.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

## Aggregator implementations

The Aggregator interface supports combining multiple metric events
into a single aggregated state. Different concrete Aggregator types
provide different functionality and levels of concurrent performance.

Aggregators support `Update()`, `Checkpoint()`, and `Merge()`.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
`Update()` is called directly from the Meter in response to a metric
event, and may be called concurrently. `Update()` is also passed the
user's telemetry context, which allows is to access the current trace
jmacd marked this conversation as resolved.
Show resolved Hide resolved
context and distributed correlations, honwever none of the built-in
jmacd marked this conversation as resolved.
Show resolved Hide resolved
aggregators use this information.

The `Checkpoint()` operation is called to atomically save a snapshot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably, in some cases, Checkpoint() will clear the previous value, whereas for a Gauge aggregation, it preserves the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need to add more discussion about this point. (It's irritating!)

of the Aggregator, since `Checkpoint()` may be called concurrently
with `Update()`. The `Merge()` operation supports dimensionality
reduction by combining state from multiple Aggregators into a single
Aggregator state.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

The Metric SDK comes with six built-in Aggregator types, two of which
are standard for use with counters and gauges.

1. Counter: This aggregator maintains a Sum using only a single word of memory.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
1. Gauge: This aggregator maintains a pair containing the last value and its timestamp.
tigrannajaryan marked this conversation as resolved.
Show resolved Hide resolved

Four aggregators are intended for use with [Measure metric instruments](api-metrics.md#measure).

1. MinMaxSumCount: This aggregator computes the min, max, sum, and count using only four words of memory.
1. Sketch: This aggregator computes an approximate data structure that can estimate quantiles. Example algorithms include GK-Sketch, Q-Digest, T-Digest, DDSketch, and HDR-Histogram. The choice of algorithm should be made based on available libraries in each language.
1. Histogram: This aggregator computes a histogram with pre-determined boundaries. This may be used to estimate quantiles, but is generally intended for cases where a histogram will be exported directly.
1. Exact: This aggregator computes an array of all values, supporting exact quantile computations in the exporter.

## Batcher implementation

The Batcher acts as the primary source of configuration for exporting
metrics from the SDK. The two kinds of configuration are:

1. Given a metric instrument, choose which concrete Aggregator type to apply for in-process aggregation.
1. Given a metric instrument, choose which dimensions to export by (i.e., the "grouping" function).

The first choice--which concrete Aggregator type to apply--is made
whenever the Meter implementation encounters a new (Instrument,
LabelSet) pair. Each concrete type of Aggregator will perform a
different function. Aggregators for counter and gauge instruments are
relatively straightforward, but many concrete Aggregators are possible
for measure metric instruments. The Batcher has an opportunity to
disable instruments at this point simply by returning a `nil`
Aggregator.

The second choice--which dimensions to export by--affects how the
batcher processes records emitted by the Meter implementation during
collection. During collection, the Meter implementation emits an
export record for each metric instrument with pending updates to the
Batcher.

During the collection pass, the Batcher receives a full set of
checkpointed Aggregators corresponding to each (Instrument, LabelSet)
pair with an active record managed by the Meter implementation.
According to its own configuration, the Batcher at this point
determines which dimensions to aggregate for export; it computes a
checkpoint of (possibly) reduced-dimension export records ready for
export.

Batcher implementations support the option of being stateless or
stateful. Stateless Batchers compute checkpoints which describe the
updates of a single collection period (i.e., deltas). Stateful
Batchers compute checkpoints from over the process lifetime; these may
be useful for simple exporters but are prone to consuming a large and
ever-growing amount of memory, depending on LabelSet cardinality.

Two standard Batcher implementations are provided.

1. The "defaultkeys" Batcher reduces the export dimensions of each
metric instrument to the Recommended keys declared with the
instrument.
1. The "ungrouped" Batcher exports metric instruments at full
dimensionality; each LabelSet is exported without reducing dimensions.

## Controller implementation

A controller is needed to coordinate the decision to begin collection.
Controllers generally are responsible for binding the Meter
implementation, the Batcher, and the Exporter.

Once the decision has been made, the controller's job is to call
`Collect()` on the Meter implementation, then read the checkpoint from
the Batcher, then invoke the Exporter.

One standard "push" controller is provided, which triggers collection
using a fixed period. The controller is responsible for flushing
metric events prior to shutting down the process.

Metric exporters that wish to pull metric updates are likely to
integrate a controller directly into the exporter itself.

## Exporter implementations

The exporter is called with a checkpoint of finished export records.
Most configuration decisions have been made before the exporter is
invoked, including which instruments are enabled, which concrete
aggregator types to use, and which dimensions to aggegate by.

There is very little left for the exporter to do other than format the
metric updates into the desired format and send them on their way.
tigrannajaryan marked this conversation as resolved.
Show resolved Hide resolved

## Multiple exporter support

The metric export pipeline specified here does not include explicit
support for multiple export pipelines. In principle, any one of the
interfaces here could be satisfied by a multiplexing implementation,
but in practice, it will be costly to run multiple Batchers or
Aggregators in parallel.

If multiple exporters are required, therefore, it is best if they can
share a single Batcher configuration.

## LabelEncoder optimizations

The Meter implementation and some Batcher implementations are required
to compute a unique key corresponding to a LabelSet, for the purposes
of locating an Aggregator to use for metric updates. Where possible,
Exporters can avoid a duplicate computation by providing a
LabelEncoder to the Meter implementation.

This optimization applies for any Exporter that will internally
compute a unique encoding for a set of labels, whether using a text or
a binary encoding. For example, a dogstatsd Exporter will benefit by
providing its specific LabelEncoder implementation to the Meter
implementation; consequently, the export records its sees will be
jmacd marked this conversation as resolved.
Show resolved Hide resolved
accompanied by a pre-computed encoding of the export LabelSet.

## Metric descriptors

The metric descriptor contains a complete description of the metric
instrument, including the kind of metric (Counter, Gauge, or Measure)
and all arguments passed to the instrument's constructor.

Exporters MUST have a mechanism to lookup internal state based on the
metric descriptor. This requirement could be satisfied by exposing
jmacd marked this conversation as resolved.
Show resolved Hide resolved
descriptors as reference objects (i.e., their memory address is
unique, can be used to lookup Exporter-specific state). Another way
to meet this requirement is to give each distinct metric instrument a
unique identifier that is included in the export record.