Add the requirement for the probability sampler #570

jkwatson · 2020-04-20T20:25:58Z

It must include a span attribute describing its probability.

…ribute describing its probability.

yurishkuro · 2020-04-20T23:03:55Z

specification/trace/sdk.md

@@ -119,6 +119,9 @@ These are the default samplers implemented in the OpenTelemetry SDK:
  that are root spans (no parent) and Spans with remote parent. However there
  should be configuration to change this to "root spans only", or "all spans".
  * Description MUST be `ProbabilitySampler{0.000100}`.
+  * If the probability is applied (as opposed to using the parent's `SampledFlag` value), the set of span Attributes
+  returned in the result MUST be a single attribute with the key `sampling.probability` and a value of the probability
+  being used.


In Jaeger clients the samplers return two tags (at minimum), indicating the type of sampler used, and the parameter value (like probability, or rate for rate limiting sampler). It's functionally equivalent, but makes it easier to determine the type of sampler used than matching on potentially different tag names (e.g. Jaeger collector adds sampler type as a label on the metric of how many traces it receives).

I'd be all in favor of having the Decision include the sampler name, as well. I think that's a different PR than this one, though. :)

Isn't this very similar to the component=http attribute, which we decided against?

It seems a little different to me. This is an internal detail of the sampling itself, rather than something that is coming from outside the sampler (like the component).

arminru

Since this is a new MUST requirement some reasoning would be nice. Would the probability be used by the backend to apply a multiplicity to sampled spans?

specification/trace/sdk.md

Co-Authored-By: Armin Ruech <armin.ruech@gmail.com>

jkwatson · 2020-04-21T15:06:31Z

Since this is a new MUST requirement some reasoning would be nice. Would the probability be used by the backend to apply a multiplicity to sampled spans?

I agree. @pavolloffay raised this issue in the java project, so I'll let him add additional context: open-telemetry/opentelemetry-java#1087

Oberon00 · 2020-04-21T15:40:08Z

Regarding the line length: #564

c24t · 2020-04-21T18:02:36Z

specification/trace/sdk.md

@@ -119,6 +119,10 @@ These are the default samplers implemented in the OpenTelemetry SDK:
  that are root spans (no parent) and Spans with remote parent. However there
  should be configuration to change this to "root spans only", or "all spans".
  * Description MUST be `ProbabilitySampler{0.000100}`.
+  * If the probability is applied (as opposed to using the parent's


It's surprising that this is a MUST since downstream tracers don't typically need to know the parent's sampling procedure, just its decision. This looks more like a SHOULD-level semantic convention to me.

IMO, we want to have the Probability Sampler work the same everywhere. So, if we want to go forward with this change (and I'm relying on @pavolloffay for the justification/need), then I think that we should require them to all behave the same way. Hence, the use of MUST.

@c24t this has no effect on downstream tracers, since the probability is merely recorded as span attribute, out of band.

yurishkuro · 2020-04-21T20:00:18Z

Justification for capturing probability and type of sampler: Jaeger backend implements an adaptive control loop (think "PID controller") that continuously recalculates desired sampling probabilities for ingress endpoints and pushes them back to the SDKs. The calculation needs to know what kind of sampler was used for any given trace, because only probabilistically sampled traffic should be counted, whereas SDKs also have other sampling mechanisms, like "each endpoint at least once a minute". For this reason, Jaeger SDKs add sampler.type and sampler.param tags to the root span.

lizthegrey · 2020-04-28T18:09:03Z

Is it clearer to use the sample probability, or inverse-probability e.g. representing the number of spans/traces that this span represents? Some discussion recently: https://gitter.im/open-telemetry/opentelemetry-specification?at=5ea6fe381eb8bd3978fb91a0

lizthegrey

In addition to propagating in the span data, we also need to propagate in the trace headers so that subsequent sampling can be done (e.g. two probabilitysamplers each running with probability 0.2 should produce spans with probability 0.04 in net at the leaf, rather than 0.2 being applied as a fixed number)

jkwatson · 2020-04-28T18:14:41Z

In addition to propagating in the span data, we also need to propagate in the trace headers so that subsequent sampling can be done (e.g. two probabilitysamplers each running with probability 0.2 should produce spans with probability 0.04 in net at the leaf, rather than 0.2 being applied as a fixed number)

It feels like this is a bigger change that this PR was intending to address, since this particular part of the spec is only related to the attributes being provided by the Sampler, and not how these get propagated in headers.

william-tran · 2020-05-19T22:04:59Z

@yurishkuro: Justification for capturing probability and type of sampler: ... For this reason, Jaeger SDKs add sampler.type and sampler.param tags to the root span.

@lizthegrey: Is it clearer to use the sample probability, or inverse-probability e.g. representing the number of spans/traces that this span represents?

I'd like to generalize this further using inverse-probability, or just call it sample.weight which makes sense for all conceivable samplers, not just probabilistic ones. Its value is the number of actual events this span represents.

For example, let's use Jaeger's probabilistic and rate limiting samplers applied at the start of the trace. If Jaeger's probabilistic sampler was applied with sampler.param=0.1 , you can work your way back to sample.weight by taking the inverse of sampler.param=0.1, giving you 10. With the rate limiting sampler, the param is the max traces per second to capture, but what leaked from that leaky bucket isn't expressed anywhere, so you can't work out the sample.weight from that.

If we just use sample.weight, the type of sampler becomes an implementation detail that I can't see the need to be passed around, but maybe I'm missing some backstory. I get that this requires some upkeep for rate limiting samplers or subsequent sampling decisions to count what they throw away, but its not a lot of extra work, and it allows any sampler to produce things that can be subjected to further sampling decisions downstream and then re-inflated later.

jmacd · 2020-05-19T22:35:50Z

sample.weight is approximately unambiguous in this context. In academic papers on weighted sampling algorithms, sometimes the term "adjusted weight" is used. When used in this context, the sample.weight is the inverse probability times the input weight, which is 1.

A slightly-more unambiguous term that I like is "adjusted count" because it implies the input weight is 1, so I might propose sample.count instead of weight. If you're using a weighted sampling algorithm, you might compute a sample where the input weights were not 1. Then, the term "adjusted count" still applies--it's the ratio of adjusted weight to input weight.

So this is a vote for sample.count.

Kanwaldeep · 2020-05-28T07:29:22Z

specification/trace/sdk.md

+  * If the probability is applied (as opposed to using the parent's
+  `SampledFlag` value), the set of span Attributes returned in the result MUST
+  include an attribute with the key `sampling.probability` and a value of the
+  probability being used.


sampling.probability would have the value that is calculated by the algorithm? It is not very clear what is value of the probability being used.

I don't understand the concern. A probability-based sampler has a probability that it is configured to use. This is the sampling probability.

yurishkuro · 2020-07-17T18:24:38Z

@jmacd personally, I don't fully understand all the use cases where you can try to extrapolate other statistics based on the sampling probability, and that makes me reluctant making it a first-order field in the span. Probabilistic sampling is just one form of sampling, other forms of sampling cannot be expressed as a probability or even an effective weight/count. For example a token-bucket-based rate limiting sampler can be modified to count how many traces it didn't sample, but that count is completely wrong to use as a weight of the positive sampling decision (e.g. a traffic burst that exhausts all tokens for a while would make the next positive sampling decision disproportionately weighted, skewing any distribution metrics you may want to extrapolate from the sampled trace).

lizthegrey · 2020-07-17T18:49:07Z

@jmacd personally, I don't fully understand all the use cases where you can try to extrapolate other statistics based on the sampling probability, and that makes me reluctant making it a first-order field in the span. Probabilistic sampling is just one form of sampling, other forms of sampling cannot be expressed as a probability or even an effective weight/count. For example a token-bucket-based rate limiting sampler can be modified to count how many traces it didn't sample, but that count is completely wrong to use as a weight of the positive sampling decision (e.g. a traffic burst that exhausts all tokens for a while would make the next positive sampling decision disproportionately weighted, skewing any distribution metrics you may want to extrapolate from the sampled trace).

That's fair that a token bucket based sampler wouldn't be taking a statistical sample, and thus the sample rate would be inapplicable.

However, for probability based sampling, even where you may have multiple probabilities depending upon subpopulation (e.g. trying to sample the tails 1:1 but the body 1:50 or 1:1000), it is possible to reconstruct the distribution from re-weighting the received events. See https://youtu.be/QFx63Ukn4QM?t=1505 for more information (and not from Lightstep or Honeycomb! but from SignalFx!)

yurishkuro · 2020-07-17T19:54:14Z

@lizthegrey yeah, I saw that talk before, but the leftovers of a scientist in me were disappointed, because it showed neither math nor any literature references to prove the claim, only empirical observations from synthetic simulations. I would love to see something more robust.

That aside, I do not dispute that capturing the probability is useful, Jaeger SDK always did that, but out-of-band.

jmacd · 2020-07-21T07:05:43Z

Sampling as a topic has come up for Metrics in both open-telemetry/opentelemetry-proto#159 and #625.

To be fair, the talk slides list some ways to compute samples:
https://youtu.be/QFx63Ukn4QM?t=1401

Sampling of traces--of the kind discussed in this issue--is the same as sampling from GROUPING metric data points, where each event (that may become a sample item) has a real count of 1 (but may end up with representative sample_count > 1). I admit it is unsatisfying not to have an algorithm for computing this information in the open!

Computing sampling probabilities for ADDING metric data points is a simpler problem than for GROUPING metric data points, generally. I like to start with a weighted reserver sampling algorithm, here are two:

https://dl.acm.org/doi/10.1145/1314690.1314696
https://dl.acm.org/doi/10.5555/1496770.1496906

It's not directly obvious how to get from a solution to sampling ADDING points to sampling GROUPING points, but there are a bunch of ways. I want to highlight the simplest. Using a weighted sampling algorithm such as these requires batching data (spans or metric events), for the sampling weight or probability isn't known until a complete frame of data has been processed. Once you have batch of data, take two passes over the data:

Build a histogram of latencies having N buckets, let's say, and compute inverse bucket probability
Run the same data through an unbiased weighted sampling algorithm such as the two above, using inverse probability as the input weight

The output weight divided by the input weight equals sample_count (metric RawValue), and sampling.probability is sample_count / total count.

This procedure outputs a fixed size sample with information that allows computing sample_count as in open-telemetry/opentelemetry-proto#159 for GROUPING data points or sample.probability as in this PR for sample Span data points.

In the OpenCensus metric protocol, Histogram buckets contain one non-statistical exemplar each. Using the procedure above, we can compute instead a statistical sample of raw data points. Using the same procedure, an exporter can compute sample.probability in a tail-sampling scheme. Using this sample, we can estimate sub-population frequencies for GROUPING metric data points and span data. This supports the claim in #657 that we can synthesize raw metric values (with sample_count) from span data (with sample.probability), too.

If you read all this and still just want to see an open-source implementation, something is bound to happen. :-)

If you read all this, hopefully you also agree we should have a SpanData field named sample_count to match the field of the same name in open-telemetry/opentelemetry-proto#159.

lizthegrey · 2020-07-21T18:26:21Z

I think the SpanData discussion pertains to open-telemetry/oteps#107 in which we discuss out of process communication of the sample rate/priority via SpanData; the in-process communication of that information could be done with an Attribute (but as per above is really a metadata property best sent in Data rather than Attribute). So Data addresses a superset of the Attribute case, but is more disruptive because it involves changes to fields that we propagate.

jmacd · 2020-07-21T18:35:17Z

@lizthegrey I thought it was the opposite. I'm not trying to say anything about propagation in-band (which open-telemetry/oteps#107 does), only about what gets recorded in the span (which is I think the point of this PR).

yurishkuro · 2020-07-21T18:36:45Z

To be fair, the talk slides list some ways to compute samples: https://youtu.be/QFx63Ukn4QM?t=1401

To be fair, those references are algorithms for doing approximate distribution/quantile computations without storing all the data. None of them, to my knowledge, discuss the possibility of doing the same on already sampled data. The sampling discussion starts right after that, and is only backed by empirical experiments, not math. He even jokes "it's all good in practice, but how does it work in theory?"

github-actions · 2020-08-12T03:18:22Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2020-08-19T03:19:10Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Oberon00 · 2020-11-24T16:40:48Z

Java already implements this, so I think we should make a decision here other than letting the PR go stale.

CC @pmm-sumo

pmm-sumo · 2020-11-24T19:27:32Z

Java already implements this, so I think we should make a decision here other than letting the PR go stale.

CC @pmm-sumo

Great, thank you @Oberon00 for bringing this up

github-actions · 2020-12-03T03:25:09Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2020-12-10T03:28:05Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Add the requirement for the probability sampler to include a span att…

0f84c55

…ribute describing its probability.

jkwatson requested review from arminru, bogdandrutu, c24t, carlosalberto, iredelmeier, jmacd, reyang, SergeyKanzhelev, tedsuo, tigrannajaryan and yurishkuro as code owners April 20, 2020 20:25

jkwatson mentioned this pull request Apr 20, 2020

Add probability to decision when appropriate open-telemetry/opentelemetry-java#1116

Merged

yurishkuro reviewed Apr 20, 2020

View reviewed changes

arminru reviewed Apr 21, 2020

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

Update specification/trace/sdk.md

e833c3c

Co-Authored-By: Armin Ruech <armin.ruech@gmail.com>

shorten line lengths to 80

efb7376

c24t reviewed Apr 21, 2020

View reviewed changes

lizthegrey self-requested a review April 28, 2020 18:07

lizthegrey suggested changes Apr 28, 2020

View reviewed changes

Oberon00 mentioned this pull request May 19, 2020

Add probability sampler details #331

Closed

garthk mentioned this pull request May 23, 2020

Provide interim sampling capability in exporter garthk/opentelemetry_honeycomb#5

Open

Kanwaldeep reviewed May 28, 2020

View reviewed changes

github-actions bot added the Stale label Aug 12, 2020

github-actions bot closed this Aug 19, 2020

jmacd mentioned this pull request Aug 21, 2020

About RateLimitSampler and GuaranteedThroughputProbabilisticSampler open-telemetry/opentelemetry-go#1074

Closed

Oberon00 reopened this Nov 24, 2020

Oberon00 approved these changes Nov 24, 2020

View reviewed changes

Oberon00 removed the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Nov 24, 2020

github-actions bot removed the Stale label Nov 25, 2020

pmm-sumo mentioned this pull request Nov 25, 2020

Store sampling.probability in sampled span attributes open-telemetry/opentelemetry-collector#2215

Closed

github-actions bot added the Stale label Dec 3, 2020

github-actions bot closed this Dec 10, 2020

This was referenced Jan 4, 2021

Delegate constant traceidratio samplers to the constant implementations. open-telemetry/opentelemetry-java#2417

Closed

Remove off-spec sampling.probability from TraceIdRatioBasedSampler attribute (for now) open-telemetry/opentelemetry-java#2420

Closed

william-tran mentioned this pull request Mar 11, 2021

honeycomb: adjust the sample rate using an attribute on the span open-telemetry/opentelemetry-collector-contrib#1162

Merged

jmacd mentioned this pull request Oct 20, 2021

Probability sampling in tracestate specification #2047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the requirement for the probability sampler #570

Add the requirement for the probability sampler #570

jkwatson commented Apr 20, 2020

yurishkuro Apr 20, 2020

jkwatson Apr 20, 2020

Oberon00 Apr 21, 2020

jkwatson Apr 21, 2020

arminru left a comment

jkwatson commented Apr 21, 2020

Oberon00 commented Apr 21, 2020

c24t Apr 21, 2020

jkwatson Apr 21, 2020

yurishkuro Jul 17, 2020

yurishkuro commented Apr 21, 2020

lizthegrey commented Apr 28, 2020

lizthegrey left a comment

jkwatson commented Apr 28, 2020

william-tran commented May 19, 2020 •

edited

Loading

jmacd commented May 19, 2020

Kanwaldeep May 28, 2020

jkwatson May 28, 2020

yurishkuro commented Jul 17, 2020

lizthegrey commented Jul 17, 2020 •

edited

Loading

yurishkuro commented Jul 17, 2020

jmacd commented Jul 21, 2020

lizthegrey commented Jul 21, 2020

jmacd commented Jul 21, 2020

yurishkuro commented Jul 21, 2020 •

edited

Loading

github-actions bot commented Aug 12, 2020

github-actions bot commented Aug 19, 2020

Oberon00 commented Nov 24, 2020 •

edited

Loading

pmm-sumo commented Nov 24, 2020

github-actions bot commented Dec 3, 2020

github-actions bot commented Dec 10, 2020

Add the requirement for the probability sampler #570

Add the requirement for the probability sampler #570

Conversation

jkwatson commented Apr 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arminru left a comment

Choose a reason for hiding this comment

jkwatson commented Apr 21, 2020

Oberon00 commented Apr 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Apr 21, 2020

lizthegrey commented Apr 28, 2020

lizthegrey left a comment

Choose a reason for hiding this comment

jkwatson commented Apr 28, 2020

william-tran commented May 19, 2020 • edited Loading

jmacd commented May 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Jul 17, 2020

lizthegrey commented Jul 17, 2020 • edited Loading

yurishkuro commented Jul 17, 2020

jmacd commented Jul 21, 2020

lizthegrey commented Jul 21, 2020

jmacd commented Jul 21, 2020

yurishkuro commented Jul 21, 2020 • edited Loading

github-actions bot commented Aug 12, 2020

github-actions bot commented Aug 19, 2020

Oberon00 commented Nov 24, 2020 • edited Loading

pmm-sumo commented Nov 24, 2020

github-actions bot commented Dec 3, 2020

github-actions bot commented Dec 10, 2020

william-tran commented May 19, 2020 •

edited

Loading

lizthegrey commented Jul 17, 2020 •

edited

Loading

yurishkuro commented Jul 21, 2020 •

edited

Loading

Oberon00 commented Nov 24, 2020 •

edited

Loading