Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Span functions (SetTag, LogKV, etc.) only operate on spans initially marked as sampled #391

Closed
deuill opened this issue Apr 29, 2019 · 14 comments

Comments

@deuill
Copy link

deuill commented Apr 29, 2019

Requirement - what kind of business use case are you trying to solve?

Span values (tags, logs, etc.) can only be set on spans marked as "sampled", a state generally decided on StartSpan via the configured Sampler.

However, it's possible to "force" span collection via the sampling.priority tag, which sets the "sampled" state to true after the fact, with any tags set before this being ignored (perhaps counter-intuitively). In our case, this is used to force span collection on 5xx errors, which are rare but important enough to collect regardless of Sampler configuration.

Problem - what in Jaeger blocks you from solving the requirement?

Tags and logs collected before collection is forced are essentially lost, having never been attached in the first place. This is supposedly done on purpose as an optimization.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Calls to Span.SetOperationName, Span.SetTag, Span.LogKV, etc should not check s.context.IsSampled(), as the result can change after initialization, and having these operations never take place is counter-intuitive, and prevents "sample-on-error" scenarios as described above.

@deuill
Copy link
Author

deuill commented Apr 29, 2019

This probably extends to quite a few other places as well, e.g. here, and I'm wondering if forcing span collection via the sampling.priority isn't technically misuse of the library (or at least, using against the intentions).

AFAICT the OpenTracing spec doesn't actually say what the tracers should do in cases where collection is forced, and simply states the following:

If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism.

However, even if setting the sampling priority is only intended for debugging purposes, why wouldn't one want tags, logs, child spans, etc. assigned as per normal?

@yurishkuro
Copy link
Member

Using sampling.priority to sample on error will only allow you to sample the current span, and any of its future children (which are unlikely since the span is already logically "failed"). So while your observation about logs/tags is accurate, I am not sure it's what you really want.

@deuill
Copy link
Author

deuill commented Apr 30, 2019

It's the only way I can see being able to decide whether a span should be sampled after initialization, which is a requirement for this use-case. From what I can tell, OpenTracing doesn't support this specifically, but doesn't call for implementations to short-circuit operations on as-of-yet unsampled spans (which seems to be the issue here), but I could be wrong.

I'm willing to do the work here provisionally, but I'm also guessing the operations are made to short-circuit for performance reasons, and you might not be willing to incur the penalty for only a small number of users, though the overhead might be negligible.

@yurishkuro
Copy link
Member

Again, I just want to confirm that you understand that with this approach you will capture exactly one span out of the whole trace. Is that your intention?

@deuill
Copy link
Author

deuill commented May 1, 2019

Ah, apologies. No, my intention is that I'm able to start a span, attach tags, logs, and child spans as per usual, but defer the decision of whether the span should be sampled or not until some time before Finish() is called.

I realize this isn't possible in the current state of the code, and my understanding is that sampling.priority isn't necessarily designed to fulfil this use-case, but I'm wondering if functions short-circuit as a performance optimization or whether the spec requires this. I initially set out to create a custom Sampler, but this too is hampered by the fact that the sampling decision is copied into the SpanContext's flags during initialization.

As stated above, I'm willing to explore a correct solution and do the work for implementing it, assuming the business case is strong enough.

Otherwise, the only (extremely hacky) workaround I can think of is having the initial decision made by a ConstSampler, then setting sampling.priority to 0 if the request completed successfully, and the original sampler (in our case, a ProbabilitySampler) would've not sampled the span in the first place.

@yurishkuro
Copy link
Member

I understand the issue, but not your objective. If you have a trace with 100 spans and one of them failed, do you want to see the whole trace in Jaeger or just that single failed span?

@deuill
Copy link
Author

deuill commented May 1, 2019

Just the span that was "forced" plus any of its child spans, though in our case this would be the root span (there are no other sibling spans).

What we essentially do is start a span for the incoming HTTP request, then have underlying handlers start child spans for blocks of business logic (attaching tags and logs along the way). Once the HTTP handler is about return, we check the code returned by the underlying handlers, and set sampling.priority = 1 if the result is in the 5xx range.

@yurishkuro
Copy link
Member

Just the span that was "forced" plus any of its child spans

It won't work for children spans as they would have been already finished by the time the main span knows it wants to be sampled.

@deuill
Copy link
Author

deuill commented May 3, 2019

Then the use-case here is perhaps impossible under OpenTracing. I don't mean to waste any more of your time, feel free to close this.

@yurishkuro
Copy link
Member

It is possible, but not through client work alone. It's called tail-based sampling, which is partially already supported in OpenCensus Collector.

@deuill
Copy link
Author

deuill commented May 22, 2019

That makes sense, thanks a million. We use Jaeger server-side, so I'll need to dig deeper.

@yurishkuro
Copy link
Member

Note that the recently added delayed sampling (#449) allows you to add a sampler that will do this.

@deuill
Copy link
Author

deuill commented Jan 24, 2020

Wow, thanks! I'll run some tests and try and get this in, thanks for the ping and your work!

@yurishkuro
Copy link
Member

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants