Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Processor and Exporter interface to pass resource #508

Open
rghetia opened this issue Mar 10, 2020 · 13 comments
Open

Change Processor and Exporter interface to pass resource #508

rghetia opened this issue Mar 10, 2020 · 13 comments
Assignees
Labels
area:sdk Related to the SDK release:after-ga Not required before GA release, and not going to work on before GA spec:resource Related to the specification/resource directory spec:trace Related to the specification/trace directory

Comments

@rghetia
Copy link
Contributor

rghetia commented Mar 10, 2020

In current resources specification, resource is associated with a Trace Provider. Semantically, that makes sense as well. However, there is problem retrieving resource in a SpanProcessor. SpanProcessor has no reference to TraceProvider to which it is registered. The problem is only in async path.

In Go (and most likely other languages as well), when span ends, the data flow is as follows.
Sync Path: Tracer (ctx1) -->Simple Span Processor (ctx1, SpanData) --> Exporter (ctx1, SpanData).
Async Path: Tracer (ctx1)-->Batched Span Processor (ctx1, SpanData) --> Exporter (ctx2, []SpanData)

In sync path, there are no issues. Resources can be added to SpanData and Exporter can send the resource to its backend after remapping the resource if necessary.

In async path, however, it is not efficient to duplicate the same resource with all the SpanData. After all we have common resource for a batch of spans in proto for a reason. How do we achieve the efficiency?

  • Group all SpanData on BSP by resources. It requires
    • OnEnd(SpanData) interface to change to onEnd(Resource, SpanData) to contain reference to the Resource. Batcher can simply keep one reference for all the accumulated SpanData.
    • Export(Batch) must be changed to Export(Resourc, Batch) to contain reference to the Resource.
    • Restrict BSP to be associated only with one Trace Provider (current spec doesn't restrict associating BSP with multiple Trace Provider).
      • If it must be allowed to associate with Multiple Provider then use reference to Resource (assuming same resource object will be provided with each onEnd() call) to group SpanData by Resource. Then call Export(Resource, []SpanData) for each group.

Alternatively,

  • Create resource at Exporter.
@bogdandrutu
Copy link
Member

In Java SpanData contains the Resource, but that can change if we decide to implement this. Also same discussion should apply for InstrumentationLibrary.

@SergeyKanzhelev
Copy link
Member

SergeyKanzhelev commented Mar 10, 2020

discussed today at tech sync and the question is whether having Resource reference from SpanData is a possibility. This is how many languages implement it today.

@Oberon00
Copy link
Member

Relatedly, we don't have a "readable span" or "span data" interface in the spec at all currently, so the object that is passed to the exporter is left to the language SIGs to design. See #158.

@tsloughter
Copy link
Member

Languages without the ability to use references to a shared object, like Erlang/Elixir, have to copy the Resource and instrumentation library info to every Span. I'm in the process of figuring out a better way for us, involving a single copy in the tracer provider.

Still working on the interface (like should the batch processor or exporter request the resource from the tracer provider on every export so that the Resource can be updated? Mainly thinking about how the Resource is setup during boot and not having to wait to initialize the processor until the Resource is fully initialized)

@jmacd
Copy link
Contributor

jmacd commented Mar 18, 2020

Have we decided to coalesce resources into span attributes? I believe we have.
I believe that the attributes are logically equivalent whether specified as a resource or a span attribute, and I don't see why we should maintain that distinction for the exporters.

@tsloughter
Copy link
Member

The protos currently keep them separate. The requests to export spans with the Otel protocol requires Resource as a field along with a repeated field of InstrumentationLibrarySpans which is a message containing the a field for instrumentation_library and a repeated field of Spans.

@Oberon00
Copy link
Member

Oberon00 commented Mar 18, 2020

why we should maintain that distinction for the exporters

To save tons of bandwidth that would be used up by duplicated resource information? (Or alternatively, lots of CPU cycles that would be used for compression). EDIT: Also, resources are a prime candidate for caching on the backend due to being immutable, see #515.

EDIT: Or I might be misunderstanding you here?

@Oberon00
Copy link
Member

Have we decided to coalesce resources into span attributes? I believe we have.

I believe we talked about that on a SIG call and everyone there seemed to be in favor IIRC, but it's not documented anywhere.

@jkwatson
Copy link
Contributor

why we should maintain that distinction for the exporters

To save tons of bandwidth that would be used up by duplicated resource information? (Or alternatively, lots of CPU cycles that would be used for compression)

Yes, I 100% agree with this. Also, some vendors probably want to map these attributes to different ones that are conventional on their platforms, and having it able to be done once is very preferable.

@jmacd
Copy link
Contributor

jmacd commented Mar 18, 2020

None of the existing OSS export protocols have this support built in, so I'm skeptical that we'll save resources by this decision alone. Also this makes distributed processing of span data more difficult. Compression is cheap.

Surely resources will be repetitive, but span attributes and metric labels will be too. If we had API support for resource scopes, then I can see how we would cheaply be able to avoid repetition for all kinds of attribute. open-telemetry/oteps#78

@Oberon00
Copy link
Member

It already helps for forwarding to the OTel collector. Also at Dynatrace we plan to support this kind of "deduplication".

@mwear
Copy link
Member

mwear commented Aug 25, 2020

This issue comes up regularly, so I thought I'd at least write down what I know about it in one place, so I can refer to it later. If this comment, or some version of it, would be useful elsewhere, let me know.

The underlying reason why resources need to be attached to the spans and metrics they generate, is that OpenTelemetry allows multiple resources per process. This allows a process to have multiple "logical applications". A consequence of this design, is that exporters need to be aware that telemetry can come from multiple resources, and handle the data accordingly. In practice this means grouping spans or metrics by resource, and exporting them in batches. This choice prevents an exporter from being tied to a single resource.

The mechanism by which this is currently possible is by having multiple tracer providers. Each provider can have a different resource, but share the same export pipeline.

There are proposals, such as @jmacd's Resource Scopes, where multiple Resources can be managed via Context: open-telemetry/oteps#78. This shares the same requirements for telemetry and the export pipeline as the multiple tracer provider use case.

If we decided that Resources mapped 1-1 to processes, we could simplify the export pipeline by passing a Resource to an exporter directly and avoid having to attach a resource to every span or metric recorded.

@Oberon00
Copy link
Member

This allows a process to have multiple "logical applications". A consequence of this design, is that exporters need to be aware that telemetry can come from multiple resources
...
The mechanism by which this is currently possible is by having multiple tracer providers. Each provider can have a different resource, but share the same export pipeline.

My opinion on this is: Any Exporter MUST NOT be attached to more than one SpanProcessor unless it explicitly states that it supports it. Any SpanProcessor MUST NOT be attached to more than one TracerProvider unless it (and any attached exporters) explicitly support it.

I think we implicitly have the first requirement (only one SpanProcessor per Exporter) already when we say at https://github.com/open-telemetry/opentelemetry-specification/blob/da9f4593af21961683606be7a1e80d2cf4cd0242/specification/trace/sdk.md#exportbatch:

Export() will never be called concurrently for the same exporter instance.

This is nigh impossible to implement if an exporter can be attached to more than one SpanProcessor at the same time. (NB: Someone pointed out the defect that SimpleSpanProcessor implementations typically will also break this rule somewhere).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK release:after-ga Not required before GA release, and not going to work on before GA spec:resource Related to the specification/resource directory spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

9 participants