Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing traces by latency #513

Open
yurishkuro opened this issue Jan 27, 2020 · 9 comments
Open

Comparing traces by latency #513

yurishkuro opened this issue Jan 27, 2020 · 9 comments

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Jan 27, 2020

@tiffon you previously showed this diff mode at KubeCon'18, if I recall. What's the status of this, is the code somewhere in a branch? It would be great to merge it.

image

(Asked in internal SO post 43701)

@yurishkuro
Copy link
Member Author

Perhaps this is the branch: https://github.com/jaegertracing/jaeger-ui/tree/diff-on-latency

@tiffon
Copy link
Member

tiffon commented Jan 30, 2020

@yurishkuro yes, that's the branch. A lot has changed, but I'll check it out this weekend.

@yurishkuro
Copy link
Member Author

Thanks, Joe. This feels like it should be relatively small change on top of the existing diffs, but people do want it, even as experimental.

@tiffon
Copy link
Member

tiffon commented Feb 4, 2020

@yurishkuro I didn't get a chance to work on the trace durations this weekend. I will try to take a look at it this week.

The old code is ~650 commits behind master and dates back to when we used flow. It was also a prototype for troubleshooting issues during a data-center turn-up. So, realistically, merging it is isn't going to work; it needs to be implemented from scratch.

I've also been thinking it would be nice to be able to represent the impact of nodes in the DAG that aren't common between the two traces. For instance, all of the common nodes could have nearly the same durations and the difference in trace durations is accounted for by nodes only in one or the other trace. It might be possible to just compare the average duration of the spans that are present against 0. That would only work for non-relative comparisons but relative comparisons were too noisy, anyway. Either way, it will be easier to explore this once something is up and running, but I wanted to mention it in case you have any suggestions.

@yurishkuro
Copy link
Member Author

iirc, the graph already shows missing/added nodes in black & white (as opposed to red/green for comparable nodes).

@everett980
Copy link
Collaborator

everett980 commented Feb 20, 2020

@tiffon Idr the formula for creating dense nodes out of multiple spans, but if spans a:0, a:1, b:0 are present in trace A three times, and present in trace B four times and result in one node (data.a.members.length === 9 and data.b.members.length === 12) what durations would be compared?
Is it total duration of data.a.members vs total duration of data.b.members?
Does it make sense to instead/toggle compare the average duration?
What about comparing the max duration of a:0, a:1, b:0 for each? This probably requires a different schema than what is in PR#521

@yurishkuro
Copy link
Member Author

I think we shouldn’t be making any assumptions. One option is to not aggregate sibling paths at all, thus removing the problem. If user chooses to aggregate identical sibling paths, then they should specify the aggregation function to use with comparison, from min, max, avg.

@tiffon
Copy link
Member

tiffon commented Feb 21, 2020

@everett980 the formula was average duration, and the only way data.a.members.length would be nine is if a:0 a:1 and b:0 were all grouped in the same node in the diff. In our case, that only happens if the spans are the same service, operation; all ancestors are the same service, operation; and all either have children or are leaves. We went with average instead of various percentiles, histograms, cumulative distribution whatever (maybe it was CDR but I forget the name)... the generally low number of spans I each group makes average the most attractive aggregation (according to Bill's research and my own chat with some of the stats folks.

@yurishkuro it's fine to change that, but I think it should be a separate ticket. This ticket and the PR that lays the ground work for implementing it should keep the grouping consistent with structural trace diffs and trace graphs. Or, if we want to prioritize changes to the structure / grouping of spans then we should add latency comparisons after that (IMO). I completely agree that would be a useful change (we already have a ticket for it) but I don't think we should combine it with duration diffs.

Also, grouping related siblings has upsides in terms of simplifications, mainly regarding ordering (IIRC) bc order then matters if siblings aren't grouped. And, with that, comparing traces requires additional heuristics for matching, which is a high cost for the benefit of not grouping. Not intractable, but not free either.

@everett980
Copy link
Collaborator

everett980 commented Feb 25, 2020

the formula was average duration

I think there is value in having a toggle for total duration comparisons. If the number of grouped nodes in traceB has increased but the average time is unchanged (or even slightly reduced) then it should optionally reflect an increase in time spent on that node.
Or, due to potential overlap in a grouping (and this relates to the Trace Statistics view as well as the Trace Graph view) I think it would be beneficial to have a concept of "non-concurrent time" and "non-concurrent self time" when presenting a group of spans. 20 concurrent nodes with (self) time of 30ms each is very different than 20 sequential nodes with (self) time of 30ms each.

and all either have children or are leaves.

For both the existing functionality and the duration functionality it would be nice to toggle this distinction (i.e.: DDG's "Show paths to the focal node" vs "Separate internal from external nodes"), but as mentioned this could be part of a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants