Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling decison is too late to gain much performance #620

Open
Oberon00 opened this issue May 26, 2020 · 10 comments
Open

Sampling decison is too late to gain much performance #620

Oberon00 opened this issue May 26, 2020 · 10 comments
Labels
area:sampling Related to trace sampling release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory

Comments

@Oberon00
Copy link
Member

Oberon00 commented May 26, 2020

Problem description

Instrumentations are encouraged to provide as many attributes as possible already when starting the span, to allow the sampler to make a better decision. However, collecting these attributes can already be expensive. There is one prominent case where samplers won't even look at these attributes, namely if the sampler respects parent decisions.

Proposal

I think the spec should contain a mechanism for instrumentations to ask the sampler earlier, before even starting to collect expensive attributes. Two ideas:

  1. Languages that use two-step span creation (via a SpanBuilder) could provide a "isRecordable" already on the SpanBuilder that calls the sampler with the information collected so far. A problem with this is that the span name may also be expensive to compute, but that is fixable by making it optional and providing a setName on the SpanBuilder.
  2. All languages could provide a isRecordable method on the tracer with some arguments, probably only the parent SpanContext.

Discussion

A difference to current calls to the sampler with both of these is that the new calls are (and should be!) made before a span ID is computed (whether the trace ID should be filled in from the parent context in cases where there is one is debatable). Nevertheless, I think the existing sampler interface is enough to support these use cases if samplers are written a bit defensively (the probability sampler which does look at the trace ID might be an exception, it would have to return true for invalid trace IDs since it cannot know yet).

EDIT: One issue with calling Sampler.shouldSample twice is that it may lose performance in the sampled case, especially if the sampler adds / computes attributes for its decision. This could be fixed by adding a new method to the sampler so the tracer knows any attributes won't be used. This can also be done backwards-compatible, as the method can be easily substituted with "return true" or "return shouldSample().isSampled()" for samplers that don't support it.

Another advantage of these methods would be that they could also support explicit tracing suppression (as per #530).

@Oberon00 Oberon00 changed the title Normal head-based sampling decison is too late to gain much performance Sampling decison is too late to gain much performance May 26, 2020
@jmacd jmacd added the area:sampling Related to trace sampling label May 29, 2020
@Oberon00
Copy link
Member Author

Oberon00 commented Jun 2, 2020

Note that this feature may probably be possible to bolt-on in a way that is API-compatible (on the instrumentation side at least), but resolving this before 1.0 may lead to a cleaner design. For example, we could have completely separate interfaces for purely parent-based sampling and "root sampling". #610 already does this mostly but behind the same interface.

@cijothomas
Copy link
Member

cijothomas commented Jun 4, 2020

Instrumentations are encouraged to provide as many attributes as possible already when starting the span, to allow the sampler to make a better decision.

Where is this suggested? I couldn't find this in specs - My assumption was that, its entirely up to the libraries to decide how much attributes (or none at all) to pass initially (available for Samplers), and then check span.IsRecording before adding more and more (expensive) attributes .

@arminru
Copy link
Member

arminru commented Jun 5, 2020

@cijothomas

It's stated in the tracing API spec in the span creation section:

Whenever possible, users SHOULD set any already known attributes at span creation
instead of calling `SetAttribute` later.

The assumption here, I suppose, is that the more fundamental attributes relevant for sampling decisions are readily available right from the start and that additional, more expensive attributes would be retrieved after a positive sampling decision.

@Oberon00
Copy link
Member Author

Oberon00 commented Jun 5, 2020

The most expensive attribute for http are often the URL parts, and you definitively want to know them when sampling a root span.

@cijothomas
Copy link
Member

@cijothomas

It's stated in the tracing API spec in the span creation section:

Whenever possible, users SHOULD set any already known attributes at span creation
instead of calling `SetAttribute` later.

The assumption here, I suppose, is that the more fundamental attributes relevant for sampling decisions are readily available right from the start and that additional, more expensive attributes would be retrieved after a positive sampling decision.

Thanks for clarifying. By "already known attributes" - I took it as "already available attributes" - so the cost of obtaining them is already paid anyway, so use it in SpanCreation.

@bogdandrutu bogdandrutu added the spec:trace Related to the specification/trace directory label Jun 12, 2020
@carlosalberto carlosalberto added the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Jul 10, 2020
@andrewhsu andrewhsu added this to Required for GA, needs action. Add label:release:required-for-ga to issues and PRs when moving them to this column. in GA Spec Burndown Jul 13, 2020
@andrewhsu andrewhsu added the priority:p2 Medium priority level label Jul 17, 2020
@tigrannajaryan
Copy link
Member

Improving sampling performance appears to be a nice-to-have capability but not a mandatory one. The proposal for isRecordable also seems to be an additive change, not breaking. I suggest that we remove the release:required-for-ga label.

@Oberon00 Oberon00 added release:after-ga Not required before GA release, and not going to work on before GA and removed release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Sep 9, 2020
@andrewhsu andrewhsu removed the priority:p2 Medium priority level label Sep 9, 2020
@andrewhsu andrewhsu removed this from Required for GA, needs action. Add label:release:required-for-ga to issues and PRs when moving them to this column. in GA Spec Burndown Sep 10, 2020
@lmolkova
Copy link
Contributor

Adding a new use-case:

  • Native instrumentations don't even know if SDK is present and configured

this implies a hint for instrumentation to know if the tracer is operational at all, regardless of sampler.

@lmolkova
Copy link
Contributor

lmolkova commented Sep 23, 2021

And another one open-telemetry/oteps#172:

  • Span suppression for auto-instrumentation duplicates or verbose layers (e.g. http physical)

@jmacd
Copy link
Contributor

jmacd commented Sep 23, 2021

It looks like this would need a prototype and a draft spec change to move forward.

In the linked issue #1916 I mentioned a preference for a lazy-value mechanism to support the intended performance gains. Only a Sampler that actually tries to use attributes will incur the cost of evaluating attributes before the Sampler decision.

I would begin by asking in maintainers in each language if there is an existing instrumentation framework with a precedent for lazy values. In OpenTracing-Go there was a lazy log-field for example, https://github.com/opentracing/opentracing-go/blob/e2cb74943cd81f680d00b6be69fc4a2557525727/log/field.go#L162, inspired iirc by the uber-go/zap implementation.

@yurishkuro
Copy link
Member

In Go capturing lazy-eval attributes incurs allocations which could cause more overhead than the savings from lazy eval.

Jaeger samplers (Go, Node.js) support deferred sampling decision where the span is recording until enough attributes are collected to definitely decide not to sample, at which point span becomes no-op. It's definitely more expensive than blind probabilistic sampling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sampling Related to trace sampling release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

10 participants