-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete recent recent traces with elasticsearch backend #2231
Comments
This was raised a few time indeed. I don't think it can be solved purely by limiting the search to [now-lookback, now-1min], because once any span falls into this range the whole trace will be returned, even if it's still in-progress. An alternative might be implementing a "window" parameter in the query service so that after the search is executed and traces are loaded in memory, those that have any spans within [now-window, now] are discarded as potentially incomplete. They could also be marked as potentially incomplete in the UI. We're currently working on tail-based sampling implementation that requires pre-aggregation of spans in memory, which naturally introduces a window of inactivity before the whole trace is saved & indexed, so that would also address this problem. |
This is related to jaegertracing/jaeger-ui#462. The UI could signalize missing spans in certain situations (e.g. missing spans in the middle) it's clear that those spans haven't been yet reported/persisted. If there are missing spans UI signalize |
Yeah this could even be a UI concern which makes it very prominent that we're displaying an incomplete trace. Anything to save the user confusion! |
I was investigating on these |
Hey,
This isn't a problem with Jaeger per-se, more of an issue when you're using Jaeger with Elasticsearch backend.
Because spans come in asynchronously (and batched from collectors), and then the index is refreshed on an interval (say, 10s), its quite typical for the most recent traces to appear broken:
Then when you refresh 10/15s later, it's all nice complete:
This leads to such an unbelievable amount of "tracing is broken" comments to the platform team, so i'm trying to think about ways to handle it.
There is no requirement for "near real time" traces, so ideally i'm trying to find a way to ensure that traces don't become visible in search for say, 60s after the first span has come in (subsequently giving time for all spans to come in).
The only way I can think of doing this would be if elasticsearch exposed a time offset configuration for this storage backend, which effectively never returned results newer than 1 minute old when using the
Last Hour
time default.Or perhaps have a config option to suppress results which are
"trace-without-root-span"
?I'm open to ideas!
The text was updated successfully, but these errors were encountered: