Replies: 2 comments 3 replies
-
You are right. This is a known side-effect, and while I know I have talked about this in the past, we might not have this explicitly being called out in the docs. Unfortunately, there's no way for the collector to know whether a trace is complete, given that a trace is just a collection of spans that have references to their parents and trace IDs. Once a trace has been discarded as "not interesting", there's nothing we can do when a new span appears, suddenly making that an interesting trace, other than record the fact in a metric. As an operator, seeing this metric increasing means that I might need to adjust the "decision wait time" depending on the cost vs. benefit of holding all "sane" spans in memory for a longer period of time. |
Beta Was this translation helpful? Give feedback.
-
Agreed, if the trace has already been discarded. I wonder if the cost of keeping traces longer (memory ain't cheap!) can be adjusted by keeping traces on local disk. Can keep them longer there without running up RAM (but perhaps at the cost of performance). |
Beta Was this translation helpful? Give feedback.
-
Perhaps I'm missing something. Hoping someone(s) can set me straight: When doing tail-based sampling using a latency policy, there's a good chance that a really slow trace is missed and mis-categorized as 'not slow'.
The way the latency sampler works is it checks each span and calculates the duration based on just the span's start and end times. It does this after the decision_wait time has expired. This means that any span that took longer than decision_wait will not be considered during the decision processing (since the span hasn't arrived yet, since it's likely still waiting for the request to finish!).
When the slow span finally comes in, the trace was already categorized as
sampling.NotSampled
, and late-spans will be discarded (and recorded in thestatLateSpanArrivalAfterDecision
metric).So the traces that should be of special importance (we're using a latency policy, so clearly we're interested in slow traces!) will be missed and not-sampled.
Am I right? Or am I missing something about how spans work?
Beta Was this translation helpful? Give feedback.
All reactions