Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding of context flags in otlp messages #382

Open
blumamir opened this issue Apr 19, 2022 · 14 comments
Open

encoding of context flags in otlp messages #382

blumamir opened this issue Apr 19, 2022 · 14 comments

Comments

@blumamir
Copy link
Member

The spec for tracing includes a trace flags in the context:

TraceFlags contain details about the trace. Unlike TraceState values, TraceFlags are present in all traces. The current version of the specification only supports a single flag called sampled.

However, the otlp proto message has no appropriate proto field to store the flags for message Span and message Link.

I think the flags are important to be encoded in the otlp format for the following reasons:

  • It is handy that when holding a link, to be able to tell if it points on a sampled or non sampled span. For example in messaging specification, a consumer may link to a producer span, but it is possible and common that the consumer is sampled and the producer is not. Currently, consumer links just store trace_id, span_id and trace_state. If someone tries to follow the link and cannot find the source span, there is no way to tell if it was not sampled (valid) or simply missing (lost - invalid). It is can also be expensive to follow a link (query to db) - and by knowing if a link is sampled or not one can skip this operation for processing and visualization purposes. There can also be UI value in presenting a link when it is not sampled vs sampled link.
  • OTLP can be handy for storing spans, not only for transmitting them. For transmitting spans from SDKs to collector, we do not anticipate any non-sampled spans. However, someone might need to serialize and deserialize non-sampled spans (for example - to debug) in which case this data will currently be lost. I think protocol should support this kind of use-cases
  • Even though flags currently only support single "sampled" flag, they can be extended in the future to hold new values. We cannot anticipate the use of these future values, and having them in the otlp model will allow using it always.

Does it make sense to add "trace_flags" field to Span and Link messages in proto? or are there any good reasons why no to include them?

@blumamir
Copy link
Member Author

Another argument by @dyladan :

If you can't encode/decode and get the same object back out then something is missing

@blumamir
Copy link
Member Author

related issue: #166

@tigrannajaryan
Copy link
Member

This sounds reasonable to me. Can you bring this up in the next Specification SIG meeting?

@open-telemetry/specs-trace-approvers what do you think?

If we decide to add this we need to follow the same approach we have for logs:

@blumamir
Copy link
Member Author

Thanks @tigrannajaryan , I will bring it up in the next SIG.

Why use fixed32? isn't the flags 8 bit (1 byte) value by definition?

Another issue, since proto 3 has no optional, if the user is using an exporter with an old version of the proto, the field will not be sent which will default to 0 in the receiver (collector) which will incorrectly interpret it as "not sampled".
The proto "version" discussed in the SIG could help differentiate between the 2 options in the receiver if required. Not sure if it's really a problem or not.

@blumamir
Copy link
Member Author

blumamir commented Apr 20, 2022

Ok, I see there is no fixed8 type in proto. Still, if the value is expected to be 8 bits at most, It will be more efficient to use varints I suppose.

@dyladan
Copy link
Member

dyladan commented Apr 20, 2022

Maybe slightly smaller over the wire (2 bytes instead of 4) but it requires more processing for encode/decode since it requires dropping the msb of each byte and doing some shifting to get the final value. Not sure which concern is more important in general but I would argue for making it consistent with the logs.

@tigrannajaryan
Copy link
Member

Another issue, since proto 3 has no optional, if the user is using an exporter with an old version of the proto, the field will not be sent which will default to 0 in the receiver (collector) which will incorrectly interpret it as "not sampled".
The proto "version" discussed in the SIG could help differentiate between the 2 options in the receiver if required. Not sure if it's really a problem or not.

Proto 3 recently added the notion of presence and I think we already have optional fields elsewhere: #366

@blumamir
Copy link
Member Author

Maybe slightly smaller over the wire (2 bytes instead of 4) but it requires more processing for encode/decode since it requires dropping the msb of each byte and doing some shifting to get the final value. Not sure which concern is more important in general but I would argue for making it consistent with the logs.

You are right. I am not sure what is better.
My intuition is that since the value is currently very small (1 bit) it will also fit into 1 byte which will not require any shifting. We will only use the second varint byte when the 8th and last flag will be used which will probably take many years.

Anyhow, if we already use fixed32 for flags then that's good for me.

@blumamir
Copy link
Member Author

Another issue, since proto 3 has no optional, if the user is using an exporter with an old version of the proto, the field will not be sent which will default to 0 in the receiver (collector) which will incorrectly interpret it as "not sampled".
The proto "version" discussed in the SIG could help differentiate between the 2 options in the receiver if required. Not sure if it's really a problem or not.

Proto 3 recently added the notion of presence and I think we already have optional fields elsewhere: #366

Amazing, I wasn't aware of it.
Do you think we should mark the flags as optional to support old otlp exporters? Or transformations from something else to OTLP which cannot fill this value but still want to communicate it is missing and not equal 0

@tigrannajaryan
Copy link
Member

Do you think we should mark the flags as optional to support old otlp exporters? Or transformations from something else to OTLP which cannot fill this value but still want to communicate it is missing and not equal 0

I haven't thought through all the implication, but making it optional sounds right. Whoever is is making this change in the proto need to do the usual interoperability exercise:

  • How old receivers work with new exporters
  • How new receivers work with old exporters
  • How new receivers work with new exporters

For each of the cases with new exporters consider sampled flag set and unset and show that the receivers work as expected.

@blumamir
Copy link
Member Author

Whoever is is making this change in the proto need to do the usual interoperability exercise:

I'll try to answer according to my understanding:

  • How old receivers work with new exporters

Since this is a new proto field with a new number, old receivers should ignore it, which also means they will not have this information available to the processing pipeline which is fine I suppose. If this information is important to the consumer, a new receiver version should be deployeded with support to the flag.

  • How new receivers work with old exporters

Since this field is optional, new receivers should be able to detect that it is not set, and update the relevant internal data structures with an UNSET value. This should be implemented idiomatically to the language. In JS for example, one can set undefined to the value which is different from 0 - flags exist and set to the value 0 (non sampled)

  • How new receivers work with new exporters

New receivers will have this field populated with a valid value sent from new exporters. They should be able to decode it, extract the flags value, and propagate it via internal data structures to processors.
They should be able to send it as Optional<int> (depends on the language) which processors can consume dowstream. the value 0 means that the flags exists and set to "not sampled" (with the current single flag)

@jmacd
Copy link
Contributor

jmacd commented Apr 26, 2022

When W3C adds a randomness flag to the trace flags, OTel spans will immediately want to know whether the flag was set, e.g., w3c/trace-context#468

We need this information. My opinion is that because we haven't had this information, allowing the default value of 0 is acceptable; I see this also as another case supporting the introduction of auxiliary version information to the protocol.

@blumamir
Copy link
Member Author

I see this also as another case supporting the introduction of auxiliary version information to the protocol

I support. I think this will be the cleanest way to work around the issues and still keep the protocol consistent and correct

@tigrannajaryan
Copy link
Member

I don't mind having a version, especially if it means the flags field declaration in the Span no longer needs to be option and will be consistent with the declaration in LogRecord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants