Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose sampling API changes #24

Merged
merged 13 commits into from
Aug 26, 2019
Merged

Conversation

bogdandrutu
Copy link
Member

@bogdandrutu bogdandrutu commented Aug 16, 2019

This RFC is a collaboration between @lizthegrey and @bogdandrutu. Also reviewed by @c24t.

If accepted, this document will serve as a rough draft for the new Sampling API in the specification.

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to move to proposed.

text/0006-sampling.md Outdated Show resolved Hide resolved
text/0006-sampling.md Show resolved Hide resolved
text/0006-sampling.md Outdated Show resolved Hide resolved
text/0006-sampling.md Outdated Show resolved Hide resolved
Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, I have some minor comments.

text/0006-sampling.md Outdated Show resolved Hide resolved
text/0006-sampling.md Outdated Show resolved Hide resolved
@bogdandrutu
Copy link
Member Author

@lizthegrey @c24t @reyang please re-read because I made some large changes and things may have changed.

@reyang reyang self-requested a review August 19, 2019 01:29
@lizthegrey
Copy link
Member

We don't need a separate RFC to resolve the question of propagating the sampleRate if @tedsuo's hypothesis from open-telemetry/opentelemetry-specification#198 (comment) is true; instead, we can just incorporate that sampleRate, if it exists, will be read from the tracestate, multiplied by the current sampleRate of the current Sampler, and passed into the newly created tracestate using the internal SDK method.

Awaiting feedback from @iredelmeier there and here as a result

bogdandrutu and others added 2 commits August 19, 2019 17:03
Co-Authored-By: Yang Song <songy23@users.noreply.github.com>
text/0006-sampling.md Outdated Show resolved Hide resolved
text/0006-sampling.md Show resolved Hide resolved
text/0006-sampling.md Show resolved Hide resolved
It is safe to assume that users of the API should only access the `RecordEvents` property when
instrumenting code and never access `SampledFlag` unless used in context propagators.

### SamplingHint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding clarification / examples when the user code would want to provide one of the 4 values.

text/0006-sampling.md Outdated Show resolved Hide resolved
text/0006-sampling.md Outdated Show resolved Hide resolved
Co-Authored-By: Tristan Sloughter <t@crashfast.com>
Copy link
Member

@lizthegrey lizthegrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing to approve. suggest that we go ahead and merge this as a proposed RFC and then open a separate PR to cover moving from proposed to approved now that most of the substantive feedback is addressed.

@tsloughter
Copy link
Member

I just realized I don't think it is made clear. My take away is that sampling happens on every span creation. If this is the case it should be called out because it differs from opencensus which only runs on the root span or one with a remote parent.

@tsloughter
Copy link
Member

Also, at first, since RecordEvents doesn't get used during a child's sampling decision I wasn't concerned that it was an attribute on a span instead of the span context. But thinking about it more, if it is to be used to know whether to skip adding events/attributes/etc to a span then it would be much better for us if it was kept in the span context.

My guess is the reason it isn't in the span context is because it isn't meant to be propagated? The issue for us is checking if a span is sampled is a very efficient operation because it simply checks the context, and so isn't an issue that it may get checked a lot if a lot of events are recorded. While if it is in the span then the table of spans has to be searched each time to find the value.

@bogdandrutu
Copy link
Member Author

My guess is the reason it isn't in the span context is because it isn't meant to be propagated? The issue for us is checking if a span is sampled is a very efficient operation because it simply checks the context, and so isn't an issue that it may get checked a lot if a lot of events are recorded. While if it is in the span then the table of spans has to be searched each time to find the value.

@tsloughter I think you refer by "us" to erlang. Conceptually this property is a span level property and it does not get propagated to the child span (only the sampled-flag is propagated).

I just realized I don't think it is made clear. My take away is that sampling happens on every span creation. If this is the case it should be called out because it differs from opencensus which only runs on the root span or one with a remote parent.

Yes I think we should allow the sampler to be more configurable, we can always call the sampler and an implementation for example the probability sampler by default applies the sampling logic only if root or remote parent.

@tsloughter
Copy link
Member

@bogdandrutu yes, "us" referring to Erlang/Elixir implementation. I get that it isn't propagated so it makes sense to not be in the context, but maybe it should be propagated and simply not taken into account for anything when it gets propagated.

I suppose the Erlang impl could also include it in the span context itself so that we can make the recording decisions efficiently. Since doing that won't change what is propagated, only what is in the context record for the Erlang process.

As for when sampling is run, the SDK must define when it is to run, right? Is that simply outside the scope of this RFC? If not then it would be good to define that in the SDK default implementation that sampling is run on all span starts.

@bogdandrutu
Copy link
Member Author

@tsloughter I think the comment in the rfc does mention this:

* Allows users to configure if probability applies only for "root spans", "root spans and remote 
   parent", or "all spans".
     * Default is to apply only for "root spans and remote parent".
     * Remote parent property should be added to the SpanContext see specs [PR/216][specs-pr-216]

Do you want to add this in the TLDR section?

@tsloughter
Copy link
Member

@bogdandrutu it implies it but it doesn't say it. I'm suggesting it be a clear requirement that in the SDK the default tracer's StartSpan must run the the sampler any time it is called.

@bogdandrutu
Copy link
Member Author

@tsloughter added a clarification in the when does sampling happen section.

@tsloughter
Copy link
Member

@bogdandrutu perfect, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants