-
Notifications
You must be signed in to change notification settings - Fork 124
OTel bridge spec #516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTel bridge spec #516
Conversation
Co-authored-by: Benjamin Wohlwend <bw@piquadrat.ch>
Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
Co-authored-by: Trent Mick <trentm@gmail.com>
| - Java Agent: Elastic active context is implemented as a thread-local stack | ||
| - Java OTel API: active context is implemented as a key-value map propagated through thread local | ||
|
|
||
| In order to avoid potentially complex and tedious synchronization issues between OTel and our existing agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into a little trickiness when implementing the "if there's not a parent span start a transaction" behavior when working on the Node.js Bridge related to the context objects. I believe I have a path forward for the Node.js agent -- I'm mainly sharing this as background in case other folks run into it (and also forcing myself to say everything out loud in case there's some corner I'm not considering ;))
First -- there's no span builder object in any of the Node.js API or SDK code. Second, neither the startSpan or the startActiveSpan methods of the Node.js tracer allow you to pass in the span that should be a parent. Instead, this information is stored on the passed in open telemetry context object. Specifically, the intended pattern here is that you can fetch the span that should be the parent (i.e. the span that's active) by using the context's getSpanContext method.
You can see an example of this in the default Node.js SDK/tracer:
The becomes a bit complicated because, currently, this span context is initially set by whatever tracecontext propagator is currently being used
The trickiness came from the fact that we're not using these W3C propagators in the Node.js agent bridge -- trace context needs to be propagated when creating a transaction. This means that in cases where we're starting a transaction and assigning it to the ElasticOtelSpan object, we also need to set that span context on the open telemetry context object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that in cases where we're starting a transaction and assigning it to the ElasticOtelSpan object, we also need to set that span context on the open telemetry context object.
The idea for the bridges is that there's just a single context storage - the internal one of the agent.
The case you mentioned is handled by this part of the spec:
After activating an Elastic span via the agent's API, the [
Context] returned via the [get current context API] should contain that Elastic span
What it means is that when ContextManager::active is called, the bridge implementation of the interface would create a Context on-the-fly which contains the Elastic span.
The trickiness came from the fact that we're not using these W3C propagators in the Node.js agent bridge -- trace context needs to be propagated when creating a transaction.
I don't think there's a need to implement a custom W3C propagators in the Node.js agent bridge. You can just rely on the provided implementation.
The becomes a bit complicated because, currently, this span context is initially set by whatever tracecontext propagator is currently being used
The trace context propagators don't alter the context storage. It's a bit confusing but trace.setSpanContext, which is called by the propagators, only creates an immutable copy of the provided Context and sets the span. In the case of W3CTraceContextPropagator, the span is a "fake" NonRecordingSpan. The whole purpose of this is to populate a Context object that has the trace ids from the traceparent header.
Personally, I find that part of the OTel API a bit bloated and a source of unnecessary ceremony and allocation.
Happy to jump on a zoom if that was unclear.
|
Have there been any thoughts on tracking uptake of these bridges in telemetry? Obviously it would have to wait for telemetry v2, but should we be setting an otel attribute or something so that we can know if a span/transaction was created via the bridge? Edit: the |
| if (otel_span.remote_contex != null) { | ||
| span_or_transaction = createTransactionWithParent(otel_span.remote_context); | ||
| } else if (otel_span.parent == null) { | ||
| span_or_transaction = createRootTransaction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The go otel library allows users to set NewRoot as an option when creating a new span (https://pkg.go.dev/go.opentelemetry.io/otel/trace#WithNewRoot). According to the documentation:
Any existing parent span context will be ignored when defining the Span's trace identifiers.
I'm assuming this would definitely create a new root transaction, even if otel_span.remote_contex != null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That option seems to be specific to Go and I'm not intimately familiar with it. But I suppose you're right.
APM Server would only know if at least one attribute is set. |
In that case, does it make sense to set an elastic-namespaced attribute so we always know if a span was created with opentelemetry? I think it would be really useful to be able to track this via telemetry v2 and I have no idea whether the average user would add any attributes or not, but I'd rather avoid that confounding variable altogether. Edit: Seems like we can use the fact that otel spans will always have |
|
In the Python otel agent, Currently in the python implementation I'm just putting them in the context dictionary. But if we want them to be on the exception object at Edit: chatted with @felixbarny, seems like there's no |
Co-authored-by: Colton Myers <colton.myers@gmail.com>
| OTel relies on key-value pairs for span attributes. | ||
| Keys and values are protocol-specific and are defined in [semantic convention](https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/trace/semantic_conventions) specification. | ||
|
|
||
| In order to minimize the mapping complexity in agents, most of the mapping between OTel attributes and agent protocol will be delegated to APM server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but looking at the inference algorithm (which is an additional one to one we already have), and result/outcome inference - it seems reasonable to reconsider moving more login from agents to the server.
Not blocking this PR of course, but I think something to reconsider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, being able to delegate all of the mapping and only implement it once would be great.
The inference algorithm on the agent is only required for the "essential" fields as they are reused by other agent features.
- type (and subtype) are required for breakdown metrics: making this feature otel-compliant seems challenging.
- outcome is required as we map it from otel status and don't store it as-is. (in hindsight maybe copying this one to
otel.statusnext tootel.attributeswould have been a good idea.
|
As discussed in last week agents meeting, it is now time to merge this PR. |
Add specification for OpenTelemetry API agent bridges.
Approval from agents is optional here, only Java agent will use this in the short term to provide an OpenTelemetry bridge.
Implementation on APM server : elastic/apm-server#6259
Things to clarify
user.idorspan.typecould allow to access vendor fields in Elastic agent protocol without direct API access. This mapping could also be handled on the server.