Provide tracing implementation using OpenTelemetry + APM agent #88443

pugnascotia · 2022-07-11T15:10:06Z

Part of #84369. Split out from #87696. Implement the Tracer interface by providing an X-Pack module that uses OpenTelemetry, along with Elastic's APM agent for Java.

See the file TRACING.md for background on the changes and the reasoning for some of the implementation decisions.

The configuration mechanism is the most fiddly part of this PR. The Security Manager permissions required by the APM Java agent make it prohibitive to start an agent from within Elasticsearch programmatically, so it must be configured when the ES JVM starts. That means that the startup CLI needs to assemble the required JVM options.

To complicate matters further, the APM agent needs a secret token in order to ship traces to the APM server. We can't use Java system properties to configure this, since otherwise the secret will be readable to all code in Elasticsearch. It therefore has to be configured in a dedicated config file. This in itself is awkward, since we don't want to leave secrets in config files. Therefore, we pull the APM secret token from the keystore, write it to a config file, then delete the config file after ES starts.

There's a further issue with the config file. Any options we set in the APM agent config file cannot later be reconfigured via system properties, so we need to make sure that only "static" configuration goes into the config file.

I generated most of the files under qa/apm using an APM test utility (I can't remember which one now, unfortunately). The goal is to setup up a complete system so that traces can be captured in APM server, and the results in Elasticsearch inspected.

Remove unused vars, fix forbidden APIs.

pugnascotia · 2022-07-13T14:29:14Z

@original-brownbear The test SimpleSecurityNetty4ServerTransportTests.testResponseHeadersArePreserved is failing, here a repro line:

./gradlew ':x-pack:plugin:security:test' --tests "org.elasticsearch.xpack.security.transport.netty4.SimpleSecurityNetty4ServerTransportTests.testResponseHeadersArePreserved" -Dtests.seed=5ACB6BA0E11D5128 -Dtests.locale=sl -Dtests.timezone=ACT -Druntime.java=17

I can un-break the test by reverting the change to RequestHandlerRegistry, but I needed that change to avoid the problem of setting the same key in a given thread context.

For background about what's going in in this PR, see TRACING.md. That'll give you some useful context.

Instead of wrapping RequestHandlerRegistry#processMessageReceived(...) in a new trace context, perform the wrapping where that method is calling. This allows handling when `processMessageReceived` throws an exception to use the same context.

Instead of messing around with supporting tracing methods directly on a `RestChannel`, construct a `RestController` with a `Tracer` and trace requests the usual way. As a result, strip out added code from `RestChannel` and `HttpTracer`. Also add Javadoc to `Tracer`.

…art-2

original-brownbear

I left one important question for now (and two nits :)).

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/APMJvmOptions.java

modules/apm/src/main/java/org/elasticsearch/tracing/apm/APMTracer.java

Part of #84369. Split out from #88443. This PR wraps parts logic in `InternalExecutePolicyAction` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.

Part of #84369. Split out from #88443. This PR wraps parts logic in `AsyncTaskManagementService` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.

Part of #84369. Split out from #88443. This PR wraps parts logic in `TransportSubmitAsyncSearchAction` in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal.

Part of #84369. Split out from #88443. This PR wraps parts of the code in a new tracing context. This is necessary so that a tracing implementation can use the thread context to propagate tracing headers, but without the code attempting to set the same key twice in the thread context, which is illegal. In order to avoid future diff noise, the wrapped code has mostly been refactored into methods. Note that in some places we actually clear the tracing context completely. This is done where the operation to be performed should have no association with the current trace context. For example, when creating a new index via a REST request, the resulting background tasks for the index should not be associated with the REST request in perpetuity.

jpountz · 2022-08-03T21:01:19Z

TRACING.md

+sources][agent-config], with a hierarchy that means, for example, that options
+set in the config file cannot be overridden via system properties.
+
+Now, in order to send tracing data to the APM server, ES needs to configured with


I think you meant "needs to be configured"?

jpountz · 2022-08-03T21:02:32Z

Wonderful! Should we add the release highlight label?

pugnascotia · 2022-08-08T08:27:34Z

Wonderful! Should we add the release highlight label?

I'm not honestly sure. We haven't written externally-facing docs yet, as we're focussed on using it for ourselves right now. I'll leave that up to the Core/Infra team.

pugnascotia added 7 commits July 7, 2022 12:56

Introduce tracer interfaces

8f3f5f1

Tweaks

c99f313

Add APM agent and tracer

5790c2e

Update APM libs

5066b3b

Open keystore just once, plus refactoring

b9c291c

Remove unused vars, fix forbidden APIs.

Tweaks to tracing docs

58c7bef

Fixes and refactorings

6bda3e1

pugnascotia added >feature WIP :Core/Infra/Core Core issues without another label v8.4.0 labels Jul 11, 2022

pugnascotia mentioned this pull request Jul 11, 2022

Instrument Elasticsearch with APM #84369

Closed

21 tasks

pugnascotia added 2 commits July 11, 2022 16:23

Imports

c95b643

Fixes

6288541

pugnascotia mentioned this pull request Jul 14, 2022

Introduce tracing interfaces #87921

Merged

pugnascotia added 13 commits July 14, 2022 13:01

Remove withScope from DefaultRestChannel

c0f9eeb

Merge remote-tracking branch 'upstream/master' into apm-integration-p…

e384c71

…art-2

Formatting

585f200

Revert changes to HttpTracer

eb55854

More Javadoc

d6ec348

Merge remote-tracking branch 'upstream/master' into apm-integration-p…

a677db8

…art-2

Drop Traceable interface, shift work to call sites

f0f0d77

Actually shift work to call sites

00d4b50

Imports

c1ad5ea

Fix test

adbc7c6

Damn you, spotless

98099e9

Cleanup up DefaultRestChannel

fbfa576

pugnascotia added 3 commits July 29, 2022 16:41

Imports

091d3b3

Explicitly resolve apm module path to diagnose CI failure

9e852f1

Be lenient about missing apm module in snapshot builds

bb7d41b

original-brownbear reviewed Aug 1, 2022

View reviewed changes

pugnascotia added 3 commits August 1, 2022 11:19

s/LOGGER/logger/g

1b0c815

Just use immutable map classes in APMJvmOptions

5d68db8

Test fix

bebb115

pugnascotia mentioned this pull request Aug 2, 2022

Wrap enrich execute action in new tracing context #89021

Merged

Fixes to qa test

41dd85f

pugnascotia mentioned this pull request Aug 2, 2022

Wrap ML model loading task in new tracing context #89024

Merged

Merge remote-tracking branch 'upstream/main' into apm-integration-part-3

760738f

pugnascotia mentioned this pull request Aug 2, 2022

Wrap async QL task execution in new tracing context #89029

Merged

pugnascotia added 2 commits August 2, 2022 14:46

Revert changed to TrainedModelAssignmentNodeService

182810d

Merge remote-tracking branch 'upstream/main' into apm-integration-part-3

ed02388

pugnascotia added 5 commits August 3, 2022 11:22

Merge remote-tracking branch 'upstream/main' into apm-integration-part-3

cc2310c

Update TRACING.md

859981e

Move keystore from ServerArgs

17a183d

Merge remote-tracking branch 'upstream/main' into apm-integration-part-3

c4da4ff

Formatting

3868a31

pugnascotia merged commit 512bfeb into elastic:main Aug 3, 2022

pugnascotia deleted the apm-integration-part-3 branch August 3, 2022 13:13

jpountz reviewed Aug 3, 2022

View reviewed changes

pugnascotia mentioned this pull request Aug 17, 2022

APM secret token briefly exists in plain text #89439

Closed

lizozom mentioned this pull request Nov 30, 2022

ES APM integration in performance \ functional tests elastic/kibana#146604

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide tracing implementation using OpenTelemetry + APM agent #88443

Provide tracing implementation using OpenTelemetry + APM agent #88443

pugnascotia commented Jul 11, 2022 •

edited

pugnascotia commented Jul 13, 2022

original-brownbear left a comment

jpountz Aug 3, 2022

jpountz commented Aug 3, 2022

pugnascotia commented Aug 8, 2022

Provide tracing implementation using OpenTelemetry + APM agent #88443

Provide tracing implementation using OpenTelemetry + APM agent #88443

Conversation

pugnascotia commented Jul 11, 2022 • edited

pugnascotia commented Jul 13, 2022

original-brownbear left a comment

Choose a reason for hiding this comment

jpountz Aug 3, 2022

Choose a reason for hiding this comment

jpountz commented Aug 3, 2022

pugnascotia commented Aug 8, 2022

pugnascotia commented Jul 11, 2022 •

edited