Currently, in the dependency graph, links between services are bolder depending on the number of traces satisfying this link. It is a bit misleading since it depends on the sampling we apply.
Between two services, what matters (IMO) is the number of calls we made from service A to service B (from a request POV). Instead of having the total service count from A to B, what if we had an histogram representing calls from A to B from a request POV. Meaning that instead of having a random number depending on the sampling, we would get an accurate idea of what's happening between A and B.
The whole system is composed of three traces:
Today, the dependency graph would show:
A -> B with the count 5
The idea would be that instead of seeing 5, we would see an histogram showing 0, 2 and 3 (which are the number of times A called B from a request POV).
Do you think it would be a better information?
Should we keep or remove the totalCount?
thanks for writing this up @fedj. I think if the goal is to keep the entire value space in an aggregatable form, something like hdrhistogram might do the trick http://psy-lob-saw.blogspot.my/2015/02/hdrhistogram-better-latency-capture.html It might not end up with the exact values, but it will be.. well.. high resolution :P
in this issue you are collecting a histogram of linksPerTrace. That might end up as the json field corresponding to the histogram's representation. You might want to add traceCount as opposed to removing callCount.. not sure..
ps. something more commonly requested has been latency across the link (which hdr histogram could do as well)
One trick is that the value needs to be serialized (both in thrift and also in json). I've not done research to suggest how people are reading hdr histograms from the browser... for example, are they converting them to percentiles or rendering them directly.
cc @eirslett for more ideas