Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide tracing implementation using OpenTelemetry + APM agent #88443

Merged
merged 69 commits into from Aug 3, 2022

Conversation

pugnascotia
Copy link
Contributor

@pugnascotia pugnascotia commented Jul 11, 2022

Part of #84369. Split out from #87696. Implement the Tracer interface by providing an X-Pack module that uses OpenTelemetry, along with Elastic's APM agent for Java.

See the file TRACING.md for background on the changes and the reasoning for some of the implementation decisions.

The configuration mechanism is the most fiddly part of this PR. The Security Manager permissions required by the APM Java agent make it prohibitive to start an agent from within Elasticsearch programmatically, so it must be configured when the ES JVM starts. That means that the startup CLI needs to assemble the required JVM options.

To complicate matters further, the APM agent needs a secret token in order to ship traces to the APM server. We can't use Java system properties to configure this, since otherwise the secret will be readable to all code in Elasticsearch. It therefore has to be configured in a dedicated config file. This in itself is awkward, since we don't want to leave secrets in config files. Therefore, we pull the APM secret token from the keystore, write it to a config file, then delete the config file after ES starts.

There's a further issue with the config file. Any options we set in the APM agent config file cannot later be reconfigured via system properties, so we need to make sure that only "static" configuration goes into the config file.

I generated most of the files under qa/apm using an APM test utility (I can't remember which one now, unfortunately). The goal is to setup up a complete system so that traces can be captured in APM server, and the results in Elasticsearch inspected.

@pugnascotia pugnascotia added >feature WIP :Core/Infra/Core Core issues without another label v8.4.0 labels Jul 11, 2022
@pugnascotia
Copy link
Contributor Author

@original-brownbear The test SimpleSecurityNetty4ServerTransportTests.testResponseHeadersArePreserved is failing, here a repro line:

./gradlew ':x-pack:plugin:security:test' --tests "org.elasticsearch.xpack.security.transport.netty4.SimpleSecurityNetty4ServerTransportTests.testResponseHeadersArePreserved" -Dtests.seed=5ACB6BA0E11D5128 -Dtests.locale=sl -Dtests.timezone=ACT -Druntime.java=17

I can un-break the test by reverting the change to RequestHandlerRegistry, but I needed that change to avoid the problem of setting the same key in a given thread context.

For background about what's going in in this PR, see TRACING.md. That'll give you some useful context.

Instead of wrapping RequestHandlerRegistry#processMessageReceived(...)
in a new trace context, perform the wrapping where that method is
calling. This allows handling when `processMessageReceived` throws an
exception to use the same context.
Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one important question for now (and two nits :)).

pugnascotia added a commit that referenced this pull request Aug 2, 2022
Part of #84369. Split out from #88443. This PR wraps parts logic in
`InternalExecutePolicyAction` in a new tracing context. This is
necessary so that a tracing implementation can use the thread context
to propagate tracing headers, but without the code attempting to set the
same key twice in the thread context, which is illegal.
pugnascotia added a commit that referenced this pull request Aug 2, 2022
Part of #84369. Split out from #88443. This PR wraps parts logic in
`AsyncTaskManagementService` in a new tracing context. This is
necessary so that a tracing implementation can use the thread context
to propagate tracing headers, but without the code attempting to set the
same key twice in the thread context, which is illegal.
pugnascotia added a commit that referenced this pull request Aug 2, 2022
Part of #84369. Split out from #88443. This PR wraps parts logic in
`TransportSubmitAsyncSearchAction` in a new tracing context. This is
necessary so that a tracing implementation can use the thread context
to propagate tracing headers, but without the code attempting to set the
same key twice in the thread context, which is illegal.
pugnascotia added a commit that referenced this pull request Aug 3, 2022
Part of #84369. Split out from #88443. This PR wraps parts of the code
in a new tracing context. This is necessary so that a tracing
implementation can use the thread context to propagate tracing headers,
but without the code attempting to set the same key twice in the thread
context, which is illegal. In order to avoid future diff noise, the wrapped
code has mostly been refactored into methods.

Note that in some places we actually clear the tracing context
completely. This is done where the operation to be performed should have
no association with the current trace context. For example, when
creating a new index via a REST request, the resulting background tasks
for the index should not be associated with the REST request in
perpetuity.
@pugnascotia pugnascotia merged commit 512bfeb into elastic:main Aug 3, 2022
@pugnascotia pugnascotia deleted the apm-integration-part-3 branch August 3, 2022 13:13
sources][agent-config], with a hierarchy that means, for example, that options
set in the config file cannot be overridden via system properties.

Now, in order to send tracing data to the APM server, ES needs to configured with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant "needs to be configured"?

@jpountz
Copy link
Contributor

jpountz commented Aug 3, 2022

Wonderful! Should we add the release highlight label?

@pugnascotia
Copy link
Contributor Author

Wonderful! Should we add the release highlight label?

I'm not honestly sure. We haven't written externally-facing docs yet, as we're focussed on using it for ourselves right now. I'll leave that up to the Core/Infra team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >feature Team:Core/Infra Meta label for core/infra team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants