Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set attribute on exemplar spans #3723

Closed
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions specification/metrics/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -993,6 +993,10 @@ span context and baggage can be inspected at this point.
The "offer" method does not need to store all measurements it is given and
MAY further sample beyond the `ExemplarFilter`.

If a span is to be used as an exemplar, the ExemplarReservoir MUST set the
span attribute `"otel.exemplar"` to the value `true`, otherwise it MUST
not change the attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work if we have a static set of exemplars. If an exemplar is sampled and later it is dropped from the reservoir due to other exemplars being sampled (simple-fixed-size-reservoir) there isn't a way to return the span to its original state currently because there is no API to drop an attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute could be changed to indicate that the span was considered/initially-stored in an ExemplarReservoir, and the can be used by tail-sampler as an additional input when making sampling decision by prioritizing spans with this attribute. Not perfect, but still an improvement.


The "collect" method MUST return accumulated `Exemplar`s. Exemplars are expected
to abide by the `AggregationTemporality` of any metric point they are recorded
with. In other words, Exemplars reported against a metric data point SHOULD have
Expand Down Expand Up @@ -1028,7 +1032,7 @@ algorithm](https://en.wikipedia.org/wiki/Reservoir_sampling) can be used:

```
bucket = random_integer(0, num_measurements_seen)
if bucket < num_buckets then
if bucket < num_buckets && reservoir[bucket] == null then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will drastically change the default sampling algorithm. It would no longer be a "uniformly-weighted sampling algorithm based on the number of samples the reservoir has seen so far to determine if the offered measurements should be sampled."

I don't think this is an appropriate change for all use cases. It seems like a good motivation for a custom reservoir to be supported instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point - I hadn't realised that. Being able to provide my own (or switch to a provided "tail-sampling friendly") reservoir would be great.

Alternatively, we could keep the algorithm as-is but still add the otel.exemplar=true attribute to all spans that made it into a bucket, even if they are later discarded. The number of spans with the attribute that didn't actually become exemplars would be small (the chance of the nth span making it into a bucket is 1/n, so for N spans the total that make it into a bucket is log(N) - please correct my maths if that's wrong!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to provide my own (or switch to a provided "tail-sampling friendly") reservoir would be great.

That is already allowed in the spec!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is already allowed in the spec!

I'll go give it a try!

I've updated the PR to undo the changes to the fixed-size reservoir. I think this approach would be better long-term, even if it means keeping more spans than is needed. We could add some disclaimer explaining the trade-off between this and the histogram reservoir in terms of tail-sampling?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the new rule in the spec to align with this approach. A little more maths suggests the ratio between the number of exemplars (b) and the number of spans that get the attribute (b ln(n), where n is the number of spans) is 1/ln(n). So for b=2 (for some reason, the default size in java is the number of processor cores 🤯) :

spans annotated spans annotated spans that are exemplars
100 9.2 22%
1000 13.8 14%
10000 18.4 11%

I'd personally be fine keeping that many spans unnecessarily. If it became a problem I'd switch to the histogram-based reservoir. I've updated the java PR with this approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably contradicting the "uniformely-weigted sampling algorithm based on the number of samples seen so far", as once the buckets are filled, all subsequent measurements will be never be stored into reservoir.!

(nit perf, probably an implementation optizmiation: the cpu cycles spent on generating random is not required as well once the buckets are filled)...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this is already called out in #3723 (comment)

reservoir[bucket] = measurement
end
```
Expand All @@ -1044,13 +1048,13 @@ SHOULD be used.
#### AlignedHistogramBucketExemplarReservoir

This Exemplar reservoir MUST take a configuration parameter that is the
configuration of a Histogram. This implementation MUST keep the last seen
configuration of a Histogram. This implementation MUST keep the first seen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reservoir is designed to match Prometheus behavior of today.

Rather than changing this specification, I'd prefer instead to expose a mechanism for users to specify their own custom Exemplar Reservoirs. This is the right way going forward to allow users to better sample the exemplars for their own use case or do more intelligent per-application sampling (similarly to how we expose Samplers for tracing).

measurement that falls within a histogram bucket. The reservoir will accept
measurements using the equivalent of the following naive algorithm:

```
bucket = find_histogram_bucket(measurement)
if bucket < num_buckets then
if bucket < num_buckets && reservoir[bucket] == null then
reservoir[bucket] = measurement
end

Expand Down
Loading