Incomplete recent recent traces with elasticsearch backend #2231

Stono · 2020-05-07T21:51:39Z

Hey,
This isn't a problem with Jaeger per-se, more of an issue when you're using Jaeger with Elasticsearch backend.

Because spans come in asynchronously (and batched from collectors), and then the index is refreshed on an interval (say, 10s), its quite typical for the most recent traces to appear broken:

Then when you refresh 10/15s later, it's all nice complete:

This leads to such an unbelievable amount of "tracing is broken" comments to the platform team, so i'm trying to think about ways to handle it.

There is no requirement for "near real time" traces, so ideally i'm trying to find a way to ensure that traces don't become visible in search for say, 60s after the first span has come in (subsequently giving time for all spans to come in).

The only way I can think of doing this would be if elasticsearch exposed a time offset configuration for this storage backend, which effectively never returned results newer than 1 minute old when using the Last Hour time default.

Or perhaps have a config option to suppress results which are "trace-without-root-span"?

I'm open to ideas!

The text was updated successfully, but these errors were encountered:

yurishkuro · 2020-05-07T22:42:06Z

This was raised a few time indeed. I don't think it can be solved purely by limiting the search to [now-lookback, now-1min], because once any span falls into this range the whole trace will be returned, even if it's still in-progress. An alternative might be implementing a "window" parameter in the query service so that after the search is executed and traces are loaded in memory, those that have any spans within [now-window, now] are discarded as potentially incomplete. They could also be marked as potentially incomplete in the UI.

We're currently working on tail-based sampling implementation that requires pre-aggregation of spans in memory, which naturally introduces a window of inactivity before the whole trace is saved & indexed, so that would also address this problem.

pavolloffay · 2020-05-11T08:49:11Z

This is related to jaegertracing/jaeger-ui#462. The UI could signalize missing spans in certain situations (e.g. missing spans in the middle) it's clear that those spans haven't been yet reported/persisted.

If there are missing spans UI signalize skipping clock skew adjustment which is confusing for users.

Stono · 2020-05-11T12:26:12Z

Yeah this could even be a UI concern which makes it very prominent that we're displaying an incomplete trace. Anything to save the user confusion!

badihi · 2023-05-06T13:45:31Z

I was investigating on these invalid parent span IDs, skipping clock skew adjustment errors almost for a week! Now I found out that I just have to wait for all of spans to be collected.
Maybe I'm stupid! But stupid people are not rare nowadays! There must be some instructions to prevent the confusion in the UI.

ghost added the needs-triage label May 7, 2020

yurishkuro added help wanted Features that maintainers are willing to accept but do not have cycles to implement and removed needs-triage labels May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete recent recent traces with elasticsearch backend #2231

Incomplete recent recent traces with elasticsearch backend #2231

Stono commented May 7, 2020 •

edited

Loading

yurishkuro commented May 7, 2020

pavolloffay commented May 11, 2020

Stono commented May 11, 2020

badihi commented May 6, 2023 •

edited

Loading

Incomplete recent recent traces with elasticsearch backend #2231

Incomplete recent recent traces with elasticsearch backend #2231

Comments

Stono commented May 7, 2020 • edited Loading

yurishkuro commented May 7, 2020

pavolloffay commented May 11, 2020

Stono commented May 11, 2020

badihi commented May 6, 2023 • edited Loading

Stono commented May 7, 2020 •

edited

Loading

badihi commented May 6, 2023 •

edited

Loading