Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converge discussion around B3 and TraceId/SpanId #31

Open
codefromthecrypt opened this issue May 11, 2016 · 13 comments
Open

Converge discussion around B3 and TraceId/SpanId #31

codefromthecrypt opened this issue May 11, 2016 · 13 comments

Comments

@codefromthecrypt
Copy link
Member

codefromthecrypt commented May 11, 2016

http://zipkin.io/pages/instrumenting.html discusses propagation, in terms of http and thrift, as well parent id vs span id, etc.

There are several aspects around propagation that should be highlighted independently before being bound to a specific propagation carrier, such as a binary field or http headers.

For example, the following fields are used in propagation, even if not all are stored. Particularly things like 'debug' vs 'flags' are hard to understand.

Here are some useful things discovered and documented making Brave's binary form match Finagle's.

SpanId

Key fields propagate together, even if they are sent in http as separate headers. It is useful to think of them as a unit named SpanId or TraceId, regardless of if the propagation is in-process or not:

  • spanId - Unique 8-byte identifier of this span within a trace.
  • parentId - The parent's spanId or null if this the root span in a trace.
  • traceId - Unique 8-byte identifier for a trace, set on all spans within it.
  • flags - Like sampled or debug

Not necessarily obvious uses of SpanId

Efficient and consistent logging key

Both Finagle and Brave have very efficient toString forms of this, which can make log searches easier. The format is $traceId.$spanId<:$parentId, and ends up looking like this 0000000000000001.0000000000000003<:0000000000000002

Alternative to "passing a span around"

The above compound key can be used as an alternative to "passing a span around". For example, in Finagle, this is used as a key in a map that has a mutable span. Instrumentation add to this map, until it is converted into a transport object for reporting.

The Debug Flag

In all known propagation (ex both http and binary), flag bit 0 is the debug flag. For example, a flag value of 1 means this trace should pass any sampling, instrumentation or collection side.

Special Cases in Binary Encoding

Binary encoding is fixed-width 32 bytes

The binary structure of the above fields is 32-bytes, and this mean some encoding tricks as you need to know the difference between 0 and unset or null.

Most importantly, you can't just read the flags as 0 or 1 for a debug decision! For example, 3 is also debug, because in both cases bit 0 (FLAG_DEBUG) is set.

Root Span

  • In systems like finagle, where the trace id is always a span id, spanId = parentId = traceId means this is the root span.
  • In systems where a trace id is not a span id, a separate flag is used to ignore the value of the parent id, bit 3 of flags indicates you should ignore the parent id as it is a root span.

Sampled Flag

Flags are bits that can either be zero or one. However, the act of sampling is that there are three values: Sampled, Don't Sample, or Don't know. The latter is not a well documented option, but it does exist. In order to tell the difference between yes, no and don't know, we need 2 flags.

  • Flag bit 1(FLAG_SAMPLING_SET) indicates whether Flag bit 2 (FLAG_SAMPLED) should be interpreted as a sampled decision. If bit 1 is 0, then you don't read bit 2.
@codefromthecrypt
Copy link
Member Author

@yurishkuro
Copy link

Q: what is the interpretation of Don't Know value for Sampled, where it exists? Given that Sampled is usually used to decide whether to store/not store the trace, a 3rd value is odd.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 11, 2016 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 11, 2016 via email

@mosesn
Copy link

mosesn commented May 11, 2016

Hmm, I'm not 100% sure, but I think it's because when we first create a TraceId, we haven't made a decision on whether to sample or not yet:

https://github.com/twitter/finagle/blob/develop/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala#L141-L152

We might be able to make that decision sooner, and I don't think we need the third state in the wire protocol, imo.

@mosesn
Copy link

mosesn commented May 11, 2016

Actually, do we need to encode it at all? If we use a protocol with optional headers, we can simply not encode the header to signal "off", and encode it to signal "on".

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 11, 2016 via email

@mosesn
Copy link

mosesn commented May 11, 2016

Yeah, migration would be a pain in the ass.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 11, 2016 via email

@mosesn
Copy link

mosesn commented May 11, 2016

Yeah, seems reasonable. So to make sure we're on the same page:

Some(true) // always sample / debug
Some(false) // never sample
None // implementation can choose whether to sample or not

That seems about right?

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 11, 2016 via email

@mikewrighton
Copy link

Am I right in thinking that flags are only used by logic in the instrumentation code, not anywhere in the Zipkin backend?

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented May 11, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants