OTel bridge spec #516

SylvainJuge · 2021-09-29T12:56:14Z

Add specification for OpenTelemetry API agent bridges.

Approval from agents is optional here, only Java agent will use this in the short term to provide an OpenTelemetry bridge.

Implementation on APM server : elastic/apm-server#6259

Things to clarify

clarify: should we define all the vendor-specific attributes that we could support ? For example using attributes with OTel API user.id or span.type could allow to access vendor fields in Elastic agent protocol without direct API access. This mapping could also be handled on the server.
properly document unsupported features: events, span links, bagage, ...
Capturing events not supported, but exceptions reporting should be supported if possible OTel bridge spec #516 (comment)
mapping between OTel status and Elastic outcome: mapping at agent level, in apm-server for protocol-specific inference
add to spec that the feature should be configurable, disabled by default until proven mature and allows users to opt-in.
Should we implement an automatic fallback with labels for apm-server < 7.16 ? Yes, see comment for details.

ghost · 2021-09-29T12:59:45Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-03-14T05:52:00.851+0000
Duration: 7 min 31 sec

specs/agents/tracing-api-otel.md

specs/agents/tracing-api.md

Co-authored-by: Benjamin Wohlwend <bw@piquadrat.ch>

specs/agents/tracing-api-otel.md

specs/agents/tracing-api-ot.md

specs/agents/tracing-api-otel.md

specs/agents/tracing-api-ot.md

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

specs/agents/tracing-api-otel.md

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

specs/agents/tracing-api-otel.md

Co-authored-by: Trent Mick <trentm@gmail.com>

specs/agents/tracing-api-otel.md

astorm · 2022-01-24T20:12:44Z

specs/agents/tracing-api-otel.md

+- Java Agent: Elastic active context is implemented as a thread-local stack
+- Java OTel API: active context is implemented as a key-value map propagated through thread local
+
+In order to avoid potentially complex and tedious synchronization issues between OTel and our existing agent


I ran into a little trickiness when implementing the "if there's not a parent span start a transaction" behavior when working on the Node.js Bridge related to the context objects. I believe I have a path forward for the Node.js agent -- I'm mainly sharing this as background in case other folks run into it (and also forcing myself to say everything out loud in case there's some corner I'm not considering ;))

First -- there's no span builder object in any of the Node.js API or SDK code. Second, neither the startSpan or the startActiveSpan methods of the Node.js tracer allow you to pass in the span that should be a parent. Instead, this information is stored on the passed in open telemetry context object. Specifically, the intended pattern here is that you can fetch the span that should be the parent (i.e. the span that's active) by using the context's getSpanContext method.

You can see an example of this in the default Node.js SDK/tracer:

The becomes a bit complicated because, currently, this span context is initially set by whatever tracecontext propagator is currently being used

https://github.com/open-telemetry/opentelemetry-js/blob/04f9edd12fb2f42898bee7086691e47b3ab9b629/packages/opentelemetry-core/src/trace/W3CTraceContextPropagator.ts#L120

https://github.com/open-telemetry/opentelemetry-js/blob/04f9edd12fb2f42898bee7086691e47b3ab9b629/packages/opentelemetry-propagator-b3/src/B3SinglePropagator.ts#L86

https://github.com/open-telemetry/opentelemetry-js/blob/04f9edd12fb2f42898bee7086691e47b3ab9b629/packages/opentelemetry-propagator-b3/src/B3MultiPropagator.ts#L137

https://github.com/open-telemetry/opentelemetry-js/blob/04f9edd12fb2f42898bee7086691e47b3ab9b629/packages/opentelemetry-propagator-jaeger/src/JaegerPropagator.ts#L111

The trickiness came from the fact that we're not using these W3C propagators in the Node.js agent bridge -- trace context needs to be propagated when creating a transaction. This means that in cases where we're starting a transaction and assigning it to the ElasticOtelSpan object, we also need to set that span context on the open telemetry context object.

This means that in cases where we're starting a transaction and assigning it to the ElasticOtelSpan object, we also need to set that span context on the open telemetry context object.

The idea for the bridges is that there's just a single context storage - the internal one of the agent.

The case you mentioned is handled by this part of the spec:

After activating an Elastic span via the agent's API, the [Context] returned via the [get current context API] should contain that Elastic span

What it means is that when ContextManager::active is called, the bridge implementation of the interface would create a Context on-the-fly which contains the Elastic span.

The trickiness came from the fact that we're not using these W3C propagators in the Node.js agent bridge -- trace context needs to be propagated when creating a transaction.

I don't think there's a need to implement a custom W3C propagators in the Node.js agent bridge. You can just rely on the provided implementation.

The becomes a bit complicated because, currently, this span context is initially set by whatever tracecontext propagator is currently being used

The trace context propagators don't alter the context storage. It's a bit confusing but trace.setSpanContext, which is called by the propagators, only creates an immutable copy of the provided Context and sets the span. In the case of W3CTraceContextPropagator, the span is a "fake" NonRecordingSpan. The whole purpose of this is to populate a Context object that has the trace ids from the traceparent header.

Personally, I find that part of the OTel API a bit bloated and a source of unnecessary ceremony and allocation.

Happy to jump on a zoom if that was unclear.

basepi · 2022-01-26T16:52:31Z

Have there been any thoughts on tracking uptake of these bridges in telemetry? Obviously it would have to wait for telemetry v2, but should we be setting an otel attribute or something so that we can know if a span/transaction was created via the bridge?

Edit: the Span Kind section assumes that the APM Server can tell that a Transaction/Span is from the otel bridge (as it is expected to infer the span_kind if it's not provided). How would the APM Server know, if there are no (optional) otel attributes on the event?

specs/agents/tracing-api-otel.md

stuartnelson3 · 2022-01-27T16:45:37Z

specs/agents/tracing-api-otel.md

+if (otel_span.remote_contex != null) {
+    span_or_transaction = createTransactionWithParent(otel_span.remote_context);
+} else if (otel_span.parent == null) {
+    span_or_transaction = createRootTransaction();


The go otel library allows users to set NewRoot as an option when creating a new span (https://pkg.go.dev/go.opentelemetry.io/otel/trace#WithNewRoot). According to the documentation:

Any existing parent span context will be ignored when defining the Span's trace identifiers.

I'm assuming this would definitely create a new root transaction, even if otel_span.remote_contex != null

That option seems to be specific to Go and I'm not intimately familiar with it. But I suppose you're right.

felixbarny · 2022-02-01T07:14:39Z

How would the APM Server know, if there are no (optional) otel attributes on the event?

APM Server would only know if at least one attribute is set.

basepi · 2022-02-01T18:21:11Z

APM Server would only know if at least one attribute is set.

In that case, does it make sense to set an elastic-namespaced attribute so we always know if a span was created with opentelemetry? I think it would be really useful to be able to track this via telemetry v2 and I have no idea whether the average user would add any attributes or not, but I'd rather avoid that confounding variable altogether.

Edit: Seems like we can use the fact that otel spans will always have otel.span_kind set -- if the APM Server sees this field (prior to auto-inferring this field for non-otel spans), then it could set a field for telemetry to pick up later.

specs/agents/tracing-api-otel.md

basepi · 2022-02-08T18:14:57Z

In the Python otel agent, Span.record_exception() has an attributes argument. Should we be putting these attributes on the Elastic exception object (presumably at otel.attributes)? Putting them in the exception context? Ignoring them?

Currently in the python implementation I'm just putting them in the context dictionary. But if we want them to be on the exception object at otel.attributes I need to add some more logic in the agent's internals.

Edit: chatted with @felixbarny, seems like there's no otel.attributes on the error intake. We should put them in labels instead.

specs/agents/tracing-api-otel.md

Co-authored-by: Colton Myers <colton.myers@gmail.com>

eyalkoren · 2022-03-09T09:11:24Z

specs/agents/tracing-api-otel.md

+OTel relies on key-value pairs for span attributes.
+Keys and values are protocol-specific and are defined in [semantic convention](https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/trace/semantic_conventions) specification.
+
+In order to minimize the mapping complexity in agents, most of the mapping between OTel attributes and agent protocol will be delegated to APM server:


Makes sense, but looking at the inference algorithm (which is an additional one to one we already have), and result/outcome inference - it seems reasonable to reconsider moving more login from agents to the server.
Not blocking this PR of course, but I think something to reconsider

I agree with you, being able to delegate all of the mapping and only implement it once would be great.

The inference algorithm on the agent is only required for the "essential" fields as they are reused by other agent features.

type (and subtype) are required for breakdown metrics: making this feature otel-compliant seems challenging.

outcome is required as we map it from otel status and don't store it as-is. (in hindsight maybe copying this one to otel.status next to otel.attributes would have been a good idea.

SylvainJuge · 2022-03-14T08:55:14Z

As discussed in last week agents meeting, it is now time to merge this PR.
If there is no comment or vote against this in the next couple of days I'll merge it as-is.

SylvainJuge added 2 commits September 29, 2021 14:52

add OpenTracing spec from apm#32

08688fa

add otel bridge spec

4932d5a

SylvainJuge requested review from a team as code owners September 29, 2021 12:56

SylvainJuge requested a review from felixbarny September 29, 2021 13:03

SylvainJuge added 2 commits September 29, 2021 15:08

cleanup + add label fallback

5831f7b

add some clarification on server-side mapping

b94c05b

beniwohli reviewed Sep 29, 2021

View reviewed changes

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

specs/agents/tracing-api.md Outdated Show resolved Hide resolved

fix wording specs/agents/tracing-api.md

071253e

Co-authored-by: Benjamin Wohlwend <bw@piquadrat.ch>

SylvainJuge mentioned this pull request Sep 29, 2021

OpenTelemetry Span attributes mapping for agent bridges elastic/apm-server#6259

Closed

4 tasks

felixbarny reviewed Sep 29, 2021

View reviewed changes

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

specs/agents/tracing-api-ot.md Outdated Show resolved Hide resolved

simitt reviewed Sep 29, 2021

View reviewed changes

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

extend spec with fallbacks + context activations

27d79c1

felixbarny reviewed Sep 30, 2021

View reviewed changes

SylvainJuge and others added 2 commits September 30, 2021 11:16

Apply suggestions from code review

e72ad6d

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

remove opentracing from spec

1a59b14

felixbarny reviewed Sep 30, 2021

View reviewed changes

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

Apply suggestions from code review

e8e008b

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

trentm reviewed Oct 1, 2021

View reviewed changes

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

Fix typo specs/agents/tracing-api-otel.md

bb3b8ae

Co-authored-by: Trent Mick <trentm@gmail.com>

SylvainJuge marked this pull request as draft October 4, 2021 11:59

stuartnelson3 reviewed Oct 11, 2021

View reviewed changes

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

astorm reviewed Jan 24, 2022

View reviewed changes

stuartnelson3 reviewed Jan 27, 2022

View reviewed changes

specs/agents/tracing-api-otel.md Show resolved Hide resolved

stuartnelson3 reviewed Jan 27, 2022

View reviewed changes

basepi reviewed Feb 3, 2022

View reviewed changes

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

SylvainJuge added 4 commits February 7, 2022 15:26

add a few clarifications

a376d5e

Merge branch 'main' of github.com:elastic/apm into otel-bridge-spec

d1ac141

clarify user-experience

da7447a

clarify bridge limitations

7b9e59b

MAY use labels for server < 7.16

a476547

basepi reviewed Feb 9, 2022

View reviewed changes

specs/agents/tracing-api-otel.md Outdated Show resolved Hide resolved

Update specs/agents/tracing-api-otel.md

c9e3004

Co-authored-by: Colton Myers <colton.myers@gmail.com>

trentm mentioned this pull request Feb 14, 2022

Spec adding a span links API #594

Closed

AlexanderWert mentioned this pull request Feb 15, 2022

[OpenTelemetry] Semantic convention mappings: thread.*, db.*... elastic/apm-server#7284

Open

clarify error capture + supported features

b195414

SylvainJuge marked this pull request as ready for review February 24, 2022 15:01

eyalkoren reviewed Mar 9, 2022

View reviewed changes

basepi mentioned this pull request Mar 14, 2022

[Telemetry] OpenTelemetry usage telemetry elastic/apm-server#7532

Closed

SylvainJuge merged commit 48565ba into elastic:main Mar 16, 2022

SylvainJuge deleted the otel-bridge-spec branch March 16, 2022 08:50

OTel bridge spec #516

OTel bridge spec #516

Uh oh!

Conversation

SylvainJuge commented Sep 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Things to clarify

Uh oh!

ghost commented Sep 29, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

astorm Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

felixbarny Jan 25, 2022

Choose a reason for hiding this comment

Uh oh!

basepi commented Jan 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

stuartnelson3 Jan 27, 2022

Choose a reason for hiding this comment

Uh oh!

felixbarny Feb 1, 2022

Choose a reason for hiding this comment

Uh oh!

felixbarny commented Feb 1, 2022

Uh oh!

basepi commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

basepi commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eyalkoren Mar 9, 2022

Choose a reason for hiding this comment

Uh oh!

SylvainJuge Mar 9, 2022

Choose a reason for hiding this comment

Uh oh!

SylvainJuge commented Mar 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

SylvainJuge commented Sep 29, 2021 •

edited

Loading

ghost commented Sep 29, 2021 •

edited by ghost

Loading

basepi commented Jan 26, 2022 •

edited

Loading

basepi commented Feb 1, 2022 •

edited

Loading

basepi commented Feb 8, 2022 •

edited

Loading