New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide tracing implementation using OpenTelemetry + APM agent #88443
Conversation
Remove unused vars, fix forbidden APIs.
@original-brownbear The test
I can un-break the test by reverting the change to For background about what's going in in this PR, see |
Instead of wrapping RequestHandlerRegistry#processMessageReceived(...) in a new trace context, perform the wrapping where that method is calling. This allows handling when `processMessageReceived` throws an exception to use the same context.
Instead of messing around with supporting tracing methods directly on a `RestChannel`, construct a `RestController` with a `Tracer` and trace requests the usual way. As a result, strip out added code from `RestChannel` and `HttpTracer`. Also add Javadoc to `Tracer`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one important question for now (and two nits :)).
distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/APMJvmOptions.java
Outdated
Show resolved
Hide resolved
modules/apm/src/main/java/org/elasticsearch/tracing/apm/APMTracer.java
Outdated
Show resolved
Hide resolved
Part of #84369. Split out from #88443. This PR wraps parts logic in `InternalExecutePolicyAction` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.
Part of #84369. Split out from #88443. This PR wraps parts logic in `AsyncTaskManagementService` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.
Part of #84369. Split out from #88443. This PR wraps parts logic in `TransportSubmitAsyncSearchAction` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.
Part of #84369. Split out from #88443. This PR wraps parts of the code in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal. In order to avoid future diff noise, the wrapped code has mostly been refactored into methods. Note that in some places we actually clear the tracing context completely. This is done where the operation to be performed should have no association with the current trace context. For example, when creating a new index via a REST request, the resulting background tasks for the index should not be associated with the REST request in perpetuity.
sources][agent-config], with a hierarchy that means, for example, that options | ||
set in the config file cannot be overridden via system properties. | ||
|
||
Now, in order to send tracing data to the APM server, ES needs to configured with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant "needs to be configured"?
Wonderful! Should we add the |
I'm not honestly sure. We haven't written externally-facing docs yet, as we're focussed on using it for ourselves right now. I'll leave that up to the Core/Infra team. |
Part of #84369. Split out from #87696. Implement the
Tracer
interface by providing an X-Pack module that uses OpenTelemetry, along with Elastic's APM agent for Java.See the file
TRACING.md
for background on the changes and the reasoning for some of the implementation decisions.The configuration mechanism is the most fiddly part of this PR. The Security Manager permissions required by the APM Java agent make it prohibitive to start an agent from within Elasticsearch programmatically, so it must be configured when the ES JVM starts. That means that the startup CLI needs to assemble the required JVM options.
To complicate matters further, the APM agent needs a secret token in order to ship traces to the APM server. We can't use Java system properties to configure this, since otherwise the secret will be readable to all code in Elasticsearch. It therefore has to be configured in a dedicated config file. This in itself is awkward, since we don't want to leave secrets in config files. Therefore, we pull the APM secret token from the keystore, write it to a config file, then delete the config file after ES starts.
There's a further issue with the config file. Any options we set in the APM agent config file cannot later be reconfigured via system properties, so we need to make sure that only "static" configuration goes into the config file.
I generated most of the files under
qa/apm
using an APM test utility (I can't remember which one now, unfortunately). The goal is to setup up a complete system so that traces can be captured in APM server, and the results in Elasticsearch inspected.