Set attribute on exemplar spans #3723

samwright · 2023-10-17T12:00:38Z

Proof of Concept for a way to support metric exemplars and tail-based sampling at the same time. Hopefully this will spur discussion on how we want to support this use-case.

Motivation

(from #2922)

Exemplars are used to navigate from metrics to example traces. For example, the screenshot shows latencies from the http.server.duration metric as visualized by Grafana. Each little dot represents an Exemplar. If you click on an Exemplar, you can navigate to the corresponding trace.

However, this does not work well if traces are sampled in the collector: In many cases the metric library will choose a Span as an exemplar that will be discarded by the collector during sampling.

It would be good if metric libraries could add a standard Span attribute to indicate that a Span has been selected as an Exemplar. This attribute could be considered by the collector's sampling algorithm to ensure that Exemplars aren't discarded.

A simple way to implement this is adding a boolean attribute, like exemplar="true".

Changes

HistogramExemplarReservoir and RandomFixedSizeExemplarReservoir now collect the first instead of the last exemplar sample they see. This lets us know whether or not a span will be used as an exemplar while it is still current.
Suggest otel.exemplar as a potential new attribute that, when set to true means the sample is being used as an exemplar.

Related issues
- PoC implementation in java: POC: More simply set attribute on exemplar spans opentelemetry-java#5922
- Relevant issue: Attribute to indicate whether a Span is used as an Exemplar #2922

MrAlias · 2023-10-17T15:24:37Z

specification/metrics/sdk.md

+If a span is to be used as an exemplar, the ExemplarReservoir MUST set the
+span attribute `"otel.exemplar"` to the value `true`, otherwise it MUST
+not change the attribute.


This will only work if we have a static set of exemplars. If an exemplar is sampled and later it is dropped from the reservoir due to other exemplars being sampled (simple-fixed-size-reservoir) there isn't a way to return the span to its original state currently because there is no API to drop an attribute.

The attribute could be changed to indicate that the span was considered/initially-stored in an ExemplarReservoir, and the can be used by tail-sampler as an additional input when making sampling decision by prioritizing spans with this attribute. Not perfect, but still an improvement.

MrAlias · 2023-10-17T15:26:52Z

specification/metrics/sdk.md

@@ -1028,7 +1032,7 @@ algorithm](https://en.wikipedia.org/wiki/Reservoir_sampling) can be used:

  ```
  bucket = random_integer(0, num_measurements_seen)
-  if bucket < num_buckets then
+  if bucket < num_buckets && reservoir[bucket] == null then


This will drastically change the default sampling algorithm. It would no longer be a "uniformly-weighted sampling algorithm based on the number of samples the reservoir has seen so far to determine if the offered measurements should be sampled."

I don't think this is an appropriate change for all use cases. It seems like a good motivation for a custom reservoir to be supported instead.

Very good point - I hadn't realised that. Being able to provide my own (or switch to a provided "tail-sampling friendly") reservoir would be great.

Alternatively, we could keep the algorithm as-is but still add the otel.exemplar=true attribute to all spans that made it into a bucket, even if they are later discarded. The number of spans with the attribute that didn't actually become exemplars would be small (the chance of the nth span making it into a bucket is 1/n, so for N spans the total that make it into a bucket is log(N) - please correct my maths if that's wrong!)

Being able to provide my own (or switch to a provided "tail-sampling friendly") reservoir would be great.

That is already allowed in the spec!

That is already allowed in the spec!

I'll go give it a try!

I've updated the PR to undo the changes to the fixed-size reservoir. I think this approach would be better long-term, even if it means keeping more spans than is needed. We could add some disclaimer explaining the trade-off between this and the histogram reservoir in terms of tail-sampling?

I've changed the new rule in the spec to align with this approach. A little more maths suggests the ratio between the number of exemplars (b) and the number of spans that get the attribute (b ln(n), where n is the number of spans) is 1/ln(n). So for b=2 (for some reason, the default size in java is the number of processor cores 🤯) :

spans annotated spans annotated spans that are exemplars

100 9.2 22%

1000 13.8 14%

10000 18.4 11%

I'd personally be fine keeping that many spans unnecessarily. If it became a problem I'd switch to the histogram-based reservoir. I've updated the java PR with this approach.

cijothomas · 2023-10-17T15:30:32Z

specification/metrics/sdk.md

@@ -1028,7 +1032,7 @@ algorithm](https://en.wikipedia.org/wiki/Reservoir_sampling) can be used:

  ```
  bucket = random_integer(0, num_measurements_seen)
-  if bucket < num_buckets then
+  if bucket < num_buckets && reservoir[bucket] == null then


This is probably contradicting the "uniformely-weigted sampling algorithm based on the number of samples seen so far", as once the buckets are filled, all subsequent measurements will be never be stored into reservoir.!

(nit perf, probably an implementation optizmiation: the cpu cycles spent on generating random is not required as well once the buckets are filled)...

Just saw this is already called out in #3723 (comment)

github-actions · 2023-10-26T03:16:24Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

jsuereth

I'm opposed to this specific change, but I think it calls out the need for allowing custom exemplar reservoir implementations.

jsuereth · 2023-10-27T13:59:41Z

specification/metrics/sdk.md

@@ -1044,14 +1052,15 @@ SHOULD be used.
 #### AlignedHistogramBucketExemplarReservoir

 This Exemplar reservoir MUST take a configuration parameter that is the
-configuration of a Histogram.  This implementation MUST keep the last seen
+configuration of a Histogram.  This implementation MUST keep the first seen


This reservoir is designed to match Prometheus behavior of today.

Rather than changing this specification, I'd prefer instead to expose a mechanism for users to specify their own custom Exemplar Reservoirs. This is the right way going forward to allow users to better sample the exemplars for their own use case or do more intelligent per-application sampling (similarly to how we expose Samplers for tracing).

jsuereth · 2023-10-27T14:02:28Z

specification/metrics/sdk.md

@@ -993,6 +993,13 @@ span context and baggage can be inspected at this point.
 The "offer" method does not need to store all measurements it is given and
 MAY further sample beyond the `ExemplarFilter`.

+If a span might be used as an exemplar, the ExemplarReservoir MUST set the
+span attribute `otel.exemplar` to the value `true`, otherwise it MUST


This only makes sense if you put a restriction to guarantee that an exemplar is preserved once added to a reservoir (which is not true at all today).

This could have poor interaction w/ cumulative streams.

I'd rather see this function exposed via a custom Exemplar Reservoir hook. Imagine this:

As Instrument Advice or as View configuration, you can provide an ExemplarReservoir factory for your metric.

You should be able to provide an ExemplarReservoir implementation that guarantees sampling once something hits the reservoir and would be able to add this flag.

All of that should be able to be done without official specification.

The only thing you're missing is the ability to provide custom reservoirs, which is something we should absoutely be adding to the exemplar specification.

github-actions · 2023-11-04T03:16:27Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

samwright added 2 commits October 17, 2023 13:11

Set attribute on exemplar spans

e2b6efc

update algorithms

5517978

arminru marked this pull request as ready for review October 17, 2023 15:12

arminru requested review from a team as code owners October 17, 2023 15:12

github-actions bot assigned jack-berg Oct 17, 2023

arminru marked this pull request as draft October 17, 2023 15:13

MrAlias reviewed Oct 17, 2023

View reviewed changes

cijothomas reviewed Oct 17, 2023

View reviewed changes

samwright added 2 commits October 17, 2023 17:50

undo changes to fixed-size reservoir

de4d465

make use of otel.exemplar best-effort instead of exact

a2d3af2

github-actions bot added the Stale label Oct 26, 2023

jsuereth reviewed Oct 27, 2023

View reviewed changes

github-actions bot closed this Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set attribute on exemplar spans #3723

Set attribute on exemplar spans #3723

samwright commented Oct 17, 2023 •

edited

MrAlias Oct 17, 2023

cijothomas Oct 17, 2023

MrAlias Oct 17, 2023

samwright Oct 17, 2023

cijothomas Oct 17, 2023

samwright Oct 17, 2023

samwright Oct 18, 2023

cijothomas Oct 17, 2023

cijothomas Oct 17, 2023

github-actions bot commented Oct 26, 2023

jsuereth left a comment

jsuereth Oct 27, 2023

jsuereth Oct 27, 2023

github-actions bot commented Nov 4, 2023

Set attribute on exemplar spans #3723

Set attribute on exemplar spans #3723

Conversation

samwright commented Oct 17, 2023 • edited

Motivation

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 26, 2023

jsuereth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 4, 2023

samwright commented Oct 17, 2023 •

edited