Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability Samplers based on W3C Trace Context Level 2 #3910

Closed
wants to merge 6 commits into from

Conversation

@jmacd
Copy link
Contributor Author

jmacd commented Feb 27, 2024

@kentquirk @oertl @PeterF778 @kalyanaj Please take a look.
I believe a bit more prototyping will be required, so I leave this open as a draft PR.
I am looking to prototype an OTel-Go SDK change to the TraceIdRatioBased sampler, and I will be continuing to revive open-telemetry/opentelemetry-collector-contrib#24811 in the coming week. Cheers!

@jmacd
Copy link
Contributor Author

jmacd commented Feb 27, 2024

@kentquirk @oertl @PeterF778 @kalyanaj
TL;DR I found about half of the text in the old, experimental specification to be overkill and removed a lot of it. If you see something you'll greatly miss, we can bring it back. @kentquirk if you want to throw out the old, experimental specification and start from scratch, I'm open to that. I don't think we need to extend the specification with all the pseudocode from the OTEP, readers can refer back to that document for details.

`traceparent` and `tracestate` contexts fields specified in [W3C Trace
Context Level 2](https://www.w3.org/TR/trace-context-2/).

When injecting and extracting trace context to ro from a carrier, the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When injecting and extracting trace context to ro from a carrier, the
When injecting and extracting trace context to or from a carrier, the

@@ -312,21 +333,46 @@ The default sampler is `ParentBased(root=AlwaysOn)`.

#### TraceIdRatioBased

* The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`. To respect the

* A rejection threshold is calculated, expressing as an integer how
many out of 2**56 trace IDs should be sampled.
* The threshold is encoded as a "T-value", expressing the threshold
using up to 5 hexadecimal digits of precision.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using up to 5 hexadecimal digits of precision.
using up to 14 hexadecimal digits of precision.

Samplers SHOULD unset `p` when the invariant between the `sampled`,
`p`, and `r` values is violated before using the `tracestate` to make
a sampling decision or interpret adjusted count.
Samplers SHOULD unset T-value (by erasing the `tv` field) when the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Samplers SHOULD unset T-value (by erasing the `tv` field) when the
Samplers SHOULD unset T-value (by erasing the `th` field) when the


```
base10(r) = 3
base10(p) = 2
tracestate: ot=tv:c
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tracestate: ot=tv:c
tracestate: ot=th:c

Comment on lines +368 to +369
return a string of the form `"TraceIdRatioBased{RATIO;tv:TVALUE}"`
with `RATIO` the configured probability and `TVALUE` replaced by the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return a string of the form `"TraceIdRatioBased{RATIO;tv:TVALUE}"`
with `RATIO` the configured probability and `TVALUE` replaced by the
return a string of the form `"TraceIdRatioBased{RATIO;th:THRESHOLD}"`
with `RATIO` the configured probability and `THRESHOLD` replaced by the

Comment on lines +530 to +531
Here, R-value ce929d0e0e4736 indicates that a consistent probability
sampler configured with probability greater than approximately 19.3%,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here, R-value ce929d0e0e4736 indicates that a consistent probability
sampler configured with probability greater than approximately 19.3%,
The set sampled flag together with the R-value being ce929d0e0e4736 indicate that the parent used a consistent probability sampler configured with probability greater than approximately 19.3%,

Comment on lines +546 to +548
T-value is not set, consistent with an unsampled context. In this
case, the parent context could have been sampled at less than 19.3%
probability or used the `AlwaysOff` sampler.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we know that the parent was not consistently sampled. If any other sampling mechanism is used, the probability could be any value. Mentioning 19.3% here, is therefore a little bit confusing.

Comment on lines +634 to +637
To combine a consistent probability Sampler decision with a
non-probability Sampler decision, T-value should be set according to
the probability sampler(s). If no probability sampler decides to
sample, no T-value should be set.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To combine a consistent probability Sampler decision with a
non-probability Sampler decision, T-value should be set according to
the probability sampler(s). If no probability sampler decides to
sample, no T-value should be set.
If the sampling decision is the result of multiple samplers that are not all consistent samplers, no T-value should be set.

Comment on lines +670 to +675
The `TraceIdRatioBased` and `ParentBased` samplers can be used as
delegates of another sampler, for conditioning the choice of sampler
on span and other fixed attributes. However, for adjusted counts to
be trustworthy, the choice of non-root sampler cannot be conditioned
on the parent's sampled trace flag or the OpenTelemetry tracestate
R-value or T-value, as these decisions would lead to incorrect
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `TraceIdRatioBased` and `ParentBased` samplers can be used as
delegates of another sampler, for conditioning the choice of sampler
on span and other fixed attributes. However, for adjusted counts to
be trustworthy, the choice of non-root sampler cannot be conditioned
on the parent's sampled trace flag or the OpenTelemetry tracestate
R-value or T-value, as these decisions would lead to incorrect
The `TraceIdRatioBased` and `ParentBased` samplers can be used as
delegates of another sampler, for conditioning the choice of sampler
on span and other fixed attributes. However, for adjusted counts to
be trustworthy, the choice of non-root sampler cannot be conditioned
on the parent's sampled trace flag or the OpenTelemetry tracestate
R-value, as these decisions would lead to incorrect

Choosing the T-value based on the parent T-value is fine. This is exactly what is done for parent-based sampling.

- TraceFlags (8 bits)
- TraceState (string)

Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, let's say "bits 2-7 of TraceFlags".


* Ratio values are restricted to the range `2**-56` through 1
* A rejection threshold is calculated, expressing as an integer how
many out of 2**56 trace IDs should be sampled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sampled" => "dropped"

and r-value are independent settings, each can be meaningfully set
without the other present.
The [Sampler API](sdk.md#sampler) is responsible for setting the
`sampled` and `random` flags and the `tracestate`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic. A sampler should not change the random flag at all. This flag must be set when the TraceID is created which is ahead of making the sampling decision for the root span.

bits.
Head Samplers (i.e., trace SDKs) SHOULD by default generate the 56
random bits specified for TraceIDs by the W3C Trace Context Level 2,
and set the Random trace flag, when generating new contexts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the SDK specification, the TraceID is already generated even when sampling ROOT spans. This needs more discussions.

Copy link
Member

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of this. I think you've done a good job here and based on what I see, I don't think we need to be rewriting it from scratch. Our past history leads me to believe that might be a long process. Let's push forward!

### W3C Trace Context Requirements

A W3C Trace Context propagator is expected to implement the
`traceparent` and `tracestate` contexts fields specified in [W3C Trace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`traceparent` and `tracestate` contexts fields specified in [W3C Trace
`traceparent` and `tracestate` context fields specified in [W3C Trace

- TraceFlags (8 bits)
- TraceState (string)

Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Propagators MUST NOT assume that bits 2-7 (6 most significant bits) will be zero.
Propagators MUST NOT assume that bits 2-7 of TraceFlags (the 6 most significant bits) will be zero.

Comment on lines +226 to +227
- [Sampled](https://www.w3.org/TR/trace-context/#sampled-flag)
- [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Sampled](https://www.w3.org/TR/trace-context/#sampled-flag)
- [Random](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag)
- [Sampled (value 0x1)](https://www.w3.org/TR/trace-context/#sampled-flag)
- [Random (value 0x2)](https://www.w3.org/TR/trace-context-2/#random-trace-id-flag)

I think it would help to know which bits we're talking about.

Comment on lines 352 to 353
* A rejection threshold is calculated, expressing as an integer how
many out of 2**56 trace IDs should be sampled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A rejection threshold is calculated, expressing as an integer how
many out of 2**56 trace IDs should be sampled.
* A rejection threshold is calculated, expressing as an integer the number
out of 2**56 trace IDs that should be dropped.

Comment on lines +517 to +519
This states that the SDK should fill least significant 7 bytes (i.e., 56
bits) of the TraceID are genuinely random or pseudo-random., so they
can be used for probability sampling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This states that the SDK should fill least significant 7 bytes (i.e., 56
bits) of the TraceID are genuinely random or pseudo-random., so they
can be used for probability sampling.
This states that the SDK should fill the least significant 7 bytes (i.e., 56
bits) of the TraceID with bits that are genuinely random or pseudo-random,
so that they can be used for probability sampling.

decimal value is greater than 63 before using the `tracestate` to make
a sampling decision or interpret adjusted count.
These points, taken together, means there will be a long delay before
the Random trace flag functions as intended in ther OpenTelemetry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the Random trace flag functions as intended in ther OpenTelemetry
the Random trace flag functions as intended in the OpenTelemetry


##### Example: Probability sampled context
Tail Samplers SHOULD ignore the W3C Trace Context Level 2 Random flag
when interpreting span data. This is a compromise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
when interpreting span data. This is a compromise
when interpreting span data. This is a compromise.


Consider a trace context with the following headers:
When there is an explicit R-value set, the Sampler will anyway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When there is an explicit R-value set, the Sampler will anyway
When there is an explicit R-value set, the Sampler will

* Sampler decisions are made by comparing the Trace ID randomness
against the rejection threshold.
* When Sampled, T-value is included in the [OpenTelemetry TraceState
header](./tracestate-handling.md), identified by sub-key `th`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the tracestate-handling.md to include the examples for the sub-key 'th'? Right now, it seems to be referring to the earlier p values and r value examples.

using up to 5 hexadecimal digits of precision.
* Sampler decisions are made by comparing the Trace ID randomness
against the rejection threshold.
* When Sampled, T-value is included in the [OpenTelemetry TraceState
Copy link

@kalyanaj kalyanaj Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to rephrase this to an active voice of what implementations must do. For example, something like: "When sampled, an implementation MUST update the tracestate header to specify the T-value. <+ any additional details>"

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 12, 2024
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Mar 20, 2024
@jmacd jmacd removed the Stale label Mar 28, 2024
@jmacd jmacd reopened this Mar 28, 2024
Copy link

github-actions bot commented Apr 5, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 5, 2024
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants