Comparing traces by latency #513

yurishkuro · 2020-01-27T22:00:49Z

@tiffon you previously showed this diff mode at KubeCon'18, if I recall. What's the status of this, is the code somewhere in a branch? It would be great to merge it.

(Asked in internal SO post 43701)

yurishkuro · 2020-01-29T15:09:24Z

Perhaps this is the branch: https://github.com/jaegertracing/jaeger-ui/tree/diff-on-latency

tiffon · 2020-01-30T23:35:41Z

@yurishkuro yes, that's the branch. A lot has changed, but I'll check it out this weekend.

yurishkuro · 2020-01-30T23:45:18Z

Thanks, Joe. This feels like it should be relatively small change on top of the existing diffs, but people do want it, even as experimental.

tiffon · 2020-02-04T02:16:26Z

@yurishkuro I didn't get a chance to work on the trace durations this weekend. I will try to take a look at it this week.

The old code is ~650 commits behind master and dates back to when we used flow. It was also a prototype for troubleshooting issues during a data-center turn-up. So, realistically, merging it is isn't going to work; it needs to be implemented from scratch.

I've also been thinking it would be nice to be able to represent the impact of nodes in the DAG that aren't common between the two traces. For instance, all of the common nodes could have nearly the same durations and the difference in trace durations is accounted for by nodes only in one or the other trace. It might be possible to just compare the average duration of the spans that are present against 0. That would only work for non-relative comparisons but relative comparisons were too noisy, anyway. Either way, it will be easier to explore this once something is up and running, but I wanted to mention it in case you have any suggestions.

yurishkuro · 2020-02-04T03:06:33Z

iirc, the graph already shows missing/added nodes in black & white (as opposed to red/green for comparable nodes).

everett980 · 2020-02-20T22:01:26Z

@tiffon Idr the formula for creating dense nodes out of multiple spans, but if spans a:0, a:1, b:0 are present in trace A three times, and present in trace B four times and result in one node (data.a.members.length === 9 and data.b.members.length === 12) what durations would be compared?
Is it total duration of data.a.members vs total duration of data.b.members?
Does it make sense to instead/toggle compare the average duration?
What about comparing the max duration of a:0, a:1, b:0 for each? This probably requires a different schema than what is in PR#521

yurishkuro · 2020-02-21T01:08:20Z

I think we shouldn’t be making any assumptions. One option is to not aggregate sibling paths at all, thus removing the problem. If user chooses to aggregate identical sibling paths, then they should specify the aggregation function to use with comparison, from min, max, avg.

tiffon · 2020-02-21T03:39:16Z

@everett980 the formula was average duration, and the only way data.a.members.length would be nine is if a:0 a:1 and b:0 were all grouped in the same node in the diff. In our case, that only happens if the spans are the same service, operation; all ancestors are the same service, operation; and all either have children or are leaves. We went with average instead of various percentiles, histograms, cumulative distribution whatever (maybe it was CDR but I forget the name)... the generally low number of spans I each group makes average the most attractive aggregation (according to Bill's research and my own chat with some of the stats folks.

@yurishkuro it's fine to change that, but I think it should be a separate ticket. This ticket and the PR that lays the ground work for implementing it should keep the grouping consistent with structural trace diffs and trace graphs. Or, if we want to prioritize changes to the structure / grouping of spans then we should add latency comparisons after that (IMO). I completely agree that would be a useful change (we already have a ticket for it) but I don't think we should combine it with duration diffs.

Also, grouping related siblings has upsides in terms of simplifications, mainly regarding ordering (IIRC) bc order then matters if siblings aren't grouped. And, with that, comparing traces requires additional heuristics for matching, which is a high cost for the benefit of not grouping. Not intractable, but not free either.

everett980 · 2020-02-25T21:21:24Z

the formula was average duration

I think there is value in having a toggle for total duration comparisons. If the number of grouped nodes in traceB has increased but the average time is unchanged (or even slightly reduced) then it should optionally reflect an increase in time spent on that node.
Or, due to potential overlap in a grouping (and this relates to the Trace Statistics view as well as the Trace Graph view) I think it would be beneficial to have a concept of "non-concurrent time" and "non-concurrent self time" when presenting a group of spans. 20 concurrent nodes with (self) time of 30ms each is very different than 20 sequential nodes with (self) time of 30ms each.

and all either have children or are leaves.

For both the existing functionality and the duration functionality it would be nice to toggle this distinction (i.e.: DDG's "Show paths to the focal node" vs "Separate internal from external nodes"), but as mentioned this could be part of a separate issue.

tiffon mentioned this issue Feb 17, 2020

Refactor model/trace-dag to prep for latency diffs #521

Merged

jkowall added the enhancement label Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing traces by latency #513

Comparing traces by latency #513

yurishkuro commented Jan 27, 2020 •

edited

Loading

yurishkuro commented Jan 29, 2020

tiffon commented Jan 30, 2020

yurishkuro commented Jan 30, 2020

tiffon commented Feb 4, 2020

yurishkuro commented Feb 4, 2020

everett980 commented Feb 20, 2020 •

edited

Loading

yurishkuro commented Feb 21, 2020

tiffon commented Feb 21, 2020 •

edited

Loading

everett980 commented Feb 25, 2020 •

edited

Loading

Comparing traces by latency #513

Comparing traces by latency #513

Comments

yurishkuro commented Jan 27, 2020 • edited Loading

yurishkuro commented Jan 29, 2020

tiffon commented Jan 30, 2020

yurishkuro commented Jan 30, 2020

tiffon commented Feb 4, 2020

yurishkuro commented Feb 4, 2020

everett980 commented Feb 20, 2020 • edited Loading

yurishkuro commented Feb 21, 2020

tiffon commented Feb 21, 2020 • edited Loading

everett980 commented Feb 25, 2020 • edited Loading

yurishkuro commented Jan 27, 2020 •

edited

Loading

everett980 commented Feb 20, 2020 •

edited

Loading

tiffon commented Feb 21, 2020 •

edited

Loading

everett980 commented Feb 25, 2020 •

edited

Loading