Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a skeleton of the trace data model #2012

Closed
wants to merge 2 commits into from

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Oct 12, 2021

Part of #1929.

Changes

Adds text specifying how to interpret Span data in the OpenTelemetry trace data model.

Related PR #1899

Prior to incorporating the changes of OTEP 168, we need a foundation.

@jmacd jmacd marked this pull request as ready for review October 13, 2021 16:33
@jmacd jmacd requested review from a team as code owners October 13, 2021 16:33
Comment on lines +70 to +71
Span context is the portion of the OpenTelemetry Context that makes up
the tracing data model. This is specified by reference to the [W3C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes up the tracing data model.

This is not quite clear. Can you expand/reword?


## Span fields

### TraceID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the same TraceId capitalization as the log data model?

@@ -0,0 +1,169 @@
# Trace Data Model

**Status**: [Mixed](../document-status.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this Mixed? Do we have anything unstable in the trace data model?

Comment on lines +38 to +39
The OpenTelemetry data model for tracing consists of a protocol
specification for encoding spans, which represent an individual unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protocol specification for encoding spans

This combination sounds very confusing to me. I think you're defining a logical data model that includes entities, their attributes, and relationships between entities. So I don't understand what to make of the words "protocol", "specification", and "encoding" in this context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question is: whose data model are you trying to describe, the API's or the SDK's? They have different shapes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to write a document that helps someone interpret the OTLP Span protocol without reading a .proto file, where I can add further detail about how to interpret the OTel tracestate field used for probability sampling. Right now the .proto file says this of tracestate:

  // trace_state conveys information about request position in multiple distributed tracing graphs.
  // It is a trace_state in w3c-trace-context format: https://www.w3.org/TR/trace-context/#tracestate-header
  // See also https://github.com/w3c/distributed-tracing for more details about this field.

Tangentially, I'm actually not sure how multi-tenancy is expected to work in W3C and it's not a part of the SDK or the API's model. See #1852 (comment)

In any case, I think there should be one model--what differences between the API and the SDK "shape" should be reflected in this kind of document?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each layer can have a different logical model of the data it operates on. The API data model would be the smallest surface, SDK's data model could be larger (e.g. in OpenTracing API there was nothing related to sampling, but in Jaeger SDK implementing that API there was is_sampled state on the span context, as well as other fields that were implementation details of the SDKs exposed to its internal interfaces). OTLP as a wire format is another layer which might have yet another data model (e.g. it has things like ResourceSpans, which are not a concept in the API).

So you may be trying to write a document that describes the logical data model of OTLP (aka physical data model since there's no more abstractions left), but such description would be ~90% consisting of the logical data model of the SDK, which in turn is only a small extension of the API data model.

a document that helps someone interpret the OTLP Span protocol without reading a .proto file

For the record, can't say I sympathize with this goal, what's wrong with reading the specification of the protocol to understand the protocol? It's another thing that .proto should not need to explain what a Span is or what Span name requirements are, that should come from the other parts of the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics and logs data model files are written without reference to the SDK and API specifications, and I'm not sure why someone should have to understand either to interpret trace data. The reason this happened in the metrics specification is that there are complicated relationships between fields that can't be documented in a single field. An example in trace is "root span". We need to use this term to explain a bunch of things, but it doesn't have a field in the proto. In metrics, we decided that datamodel.md should be the source of truth, not the .proto.

In practice, I haven't followed the trace specification as closely as I have metrics, and now that I'm invested in adding probability sampling to the specification, I think this is sorely needed. As an example, I've just read through the current api.md and sdk.md to try to understand how tracestate is specified. It's incomplete, and the ground is already unstable because W3C refers to "tenants" and we have no equivalent concepts in OTel.

  • There are APIs given for manipulating tracestate, but no explanation of what it is or why you'd use it (WHY WOULD A USER USE THIS API?)
  • There is no mention of tracestate in a Span Link
  • You have to read between the lines to determine that the Span data contains the tracestate returned by the Sampler, but even then it rests on assumptions. E.g., "A Tracestate that will be associated with the Span through the new
    SpanContext." doesn't actually say that the recorded span object will contain the associated tracestate, but again if you go to the .proto all it says is:
  // trace_state conveys information about request position in multiple distributed tracing graphs.
  // It is a trace_state in w3c-trace-context format: https://www.w3.org/TR/trace-context/#tracestate-header
  // See also https://github.com/w3c/distributed-tracing for more details about this field.

So, I'll try again.

corresponding to a named operation, representing its part in the
overall, distributed unit of work.

### Root span
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually part of the model? Is there an attribute or trait that tags a span as root span? I think this belongs to the glossary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is in the Glossary section.)

This is why I bring up the question about multi-tenancy, which I don't see as a part of the OpenTelemetry data model at this time (despite @Oberon00's remarks).


**Status**: [Stable](../document-status.md)

The OpenTelemetry Span encodes the `tracestate` that was computed when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a circular definition. Can we give a semantic definition of what it represents rather than how it's populated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that there is a semantic definition for tracestate, I see it as a thing W3C gives vendors to use as they like. We're already in questionable territory, IMO, because we're calling OpenTelemetry a "vendor" when we declare a use of tracestate for ourselves.

See also #1852 (comment) where it seems there are at least two semantic definitions possible ("universal" and "per-tenant")

**Status**: [Stable](../document-status.md)

The OpenTelemetry Span name is a short, human-readable description of
the work performed within the span's context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the work performed within the span's context.
the work performed by the application and represented by the span.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we already have the definition of things like span name elsewhere in the spec. I think we should think about eliminating such duplication and referencing other parts of the spec. To me the reasonable hierarchy would be: start with logical data model that describes what the model entities/attributes represent, which then refer to it in the API spec to describe the operations on those entities (but no longer describing the meaning of entities/attributes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll have to refactor this PR.
There is one sentence in api.md that belongs here, and instead that document can link to this one.

Comment on lines +109 to +110
The OpenTelemetry SpanID is defined to identify the span that is the
parent of a new trace context, equivalent to the W3C trace context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span that is the parent of a new trace context

This sentence is confusing. What is trace context? How can span be a parent of it? Does every SpanId identify a parent?

@arminru arminru added area:data-model For issues related to data model spec:trace Related to the specification/trace directory labels Oct 14, 2021
@jmacd
Copy link
Contributor Author

jmacd commented Oct 14, 2021

I will incorporate the feedback here and re-open the PR when it is ready for review again.

@yurishkuro I am most interested in your reply to #2012 (comment) and would be glad for you to join the discussion in #1852 while I continue to edit this document.

@jmacd jmacd closed this Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:data-model For issues related to data model spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants