-
Notifications
You must be signed in to change notification settings - Fork 65
Adds advice about bounding spans #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Over the last year I've noticed a problem where instrumentors accidentally send too much data. For example, things bubble up as frame size exceeded (like 64MiB) etc. While a lot of systems can handle loads, there's often an error when something like this is going on. For example, it might be someone replicating the entire http message in a trace, or storing all headers, etc. At the moment, only Finagle's tracer bounds the size of a span before placing it on the transport. For example, it makes sure a scribe message flushes at 5MiB. I've noticed others mention they alert when their instrumentation report very large spans. Getting to transport errors is the worst place to help folks. I figured adding some docs here might obviate this sort of problem, or at least give folks a good chance to start with better practice.
|
cc'ing a bunch of tracer authors (noting I'm sure I'm missing many): |
| Spans contain identifying information such as traceId, spandId, parentId, and | ||
| RPC name. | ||
|
|
||
| Spans are usually small. For example, the serialized form is often measured in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "spans are expected to be typically small" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe "Zipkin's design assumes spans are small (orders of kilobytes or less)"
ps @mosesn this is a fair statement?
Main idea I'm trying to relay is that being space efficient was a constant in a lot of Zipkin's design, ex how ip addresses are serialized into numbers, how finagle's tracer only picks a couple fields so that it ends up with quite small (hundreds of byte) spans, etc. If higher orders (like MiB) were intended, people wouldn't mess with things like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's right. Even with all of these optimizations, the write throughput is often enormous.
|
Thinking maybe I could also add a counter-advice here.. "Zipkin integrates with general purpose logging, as opposed to replacing it." Below details need work (cc @abesto @anuraaga @clehene @klingerf particularly for help tidying this part), I suppose I could pull it into a separate PR. Zipkin instrumentation routinely offer integration with logging systems, usually adding trace identifiers to the logging context. This allows free formed logging to be correlated with a trace, and enables users to leverage the higher sophistication of log queries. Zipkin (v1)'s query and indexing system was designed to help pinpoint traces based on known values and categories. For example, you can search by duration, or tags like http path, but not fuzzy queries like regular expressions. |
|
sorry if this overloads this issue, so yay|nay on inclusion, or tell me to punt it. The below is a "day 2" problem that relates to this topic. cc'ing one of our ops advocates @gena01 One less obvious thing about span size is that consistent span size leads to easier operations. For example, I've seen in many occurrences spans/minute or similar as the health signal of the tracing system. While some instrumentation report span size metrics (histograms etc), count over time is simple. In other words, the system will perform differently if span size range covers several orders of magnitude, certainly span/time metrics would be less effective. Moreover, anything that layers on this topic is more complex, for example scaling or capacity planning. |
|
I think it is a good idea. Small spans are benefits all over |
|
thx for the feedback, folks |
|
Never considered span sizes as a potential problem before, this is an important addition. Thank you. |
| the URI of the call will help with later analysis of requests coming into the | ||
| service. | ||
|
|
||
| The primary use case of binary annotations is exact match search. That said, it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think this is accurate statement. I would even say the opposite, the majority of binary annotations are never used for search, they are used to add contextual data to the span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will reword this, though I wouldn't go so far as to say it is opposite even
if in your practice search isnt used.
On 12 Sep 2016 00:40, "Yuri Shkuro" notifications@github.com wrote:
In pages/instrumenting.md
#35 (comment)
:
@@ -58,12 +58,31 @@ information about the RPC. For instance when calling an HTTP service, providing
the URI of the call will help with later analysis of requests coming into the
service.+The primary use case of binary annotations is exact match search. That said, it is
don't think this is accurate statement. I would even say the opposite, the
majority of binary annotations are never used for search, they are used to
add contextual data to the span.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/openzipkin/openzipkin.github.io/pull/35/files/eeb91ad188ac5257ec2123b9cf557674c74f3d7c#r78298213,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD610FzBG_JGFGAQ42rhneJYKrnqyb9ks5qpC8NgaJpZM4IrHCO
.
|
NP!
|
Over the last year I've noticed a problem where instrumentors accidentally send too much data.
For example, things bubble up as frame size exceeded (like 64MiB) etc. While a lot of systems can
handle loads, there's often an error when something like this is going on. For example, it might be
someone replicating the entire http message in a trace, or storing all headers, etc.
At the moment, only Finagle's tracer bounds the size of a span before placing it on the transport. For
example, it makes sure a scribe message flushes at 5MiB. I've noticed others mention they alert
when their instrumentation report very large spans.
Getting to transport errors is the worst place to help folks. I figured adding some docs here might
obviate this sort of problem, or at least give folks a good chance to start with better practice.