New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure AWS SDK spans suppress client spans all the time. #1837
Make sure AWS SDK spans suppress client spans all the time. #1837
Conversation
public static class ScopeHolder { | ||
public static final ThreadLocal<Scope> CURRENT = new ThreadLocal<>(); | ||
} | ||
|
||
private final ExecutionInterceptor delegate; | ||
|
||
public TracingExecutionInterceptor() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after #1643 I may try to remove this class, instead just service loading the shaded library instrumentation.
1c618e4
to
61a5f3a
Compare
...java/io/opentelemetry/javaagent/instrumentation/netty/v4_1/client/NettyHttpClientTracer.java
Outdated
Show resolved
Hide resolved
…ntelemetry/javaagent/instrumentation/netty/v4_1/client/NettyHttpClientTracer.java Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
// The AWS SDK uses Netty for asynchronous clients but constructs a request signature before | ||
// beginning transport. This means we MUST suppress Netty spans we would normally create or | ||
// they will inject their own trace header, which does not match what was present when the | ||
// signature was computed, breaking the SDK request completely We have not found how to | ||
// cleanly propagate context from the SDK instrumentation, which executes on an application | ||
// thread, to Netty instrumentation, which executes on event loops. If it's possible, it may | ||
// require instrumenting internal classes. Using a header which is more or less guaranteed to | ||
// always exist is arguably more stable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
OpenTelemetry.getGlobalPropagators() | ||
.getTextMapPropagator() | ||
.inject(otelContext, builder, AwsSdkInjectAdapter.INSTANCE); | ||
AwsXRayPropagator.getInstance().inject(otelContext, builder, AwsSdkInjectAdapter.INSTANCE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if you are sure that AWS SDK calls are always terminal and that nobody uses some alternative implementations of AWS services (like local S3 server?), we at least have to document this behaviour very clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a bit of extra README out of it too :)
…emetry#1837) * Make sure AWS SDK spans suppress client spans all the time. * Make more consistent with other instrumentation * Update instrumentation/netty/netty-4.1/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/netty/v4_1/client/NettyHttpClientTracer.java Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com> * more dragons * Grammar * README Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
Also explicitly populate aws header instead of global propagators.
A lot to process here.
Currently, AWS SDK instrumentation always breaks the signature because transport spans aren't suppressed.
When separating out library / javaagent instrumentation ages ago, I was still very green in this repo and don't think I understood the point of the
testImplementation
dependencies on downstream instrumentation and didn't port them over. So for a while we haven't verified suppression of transport spans.Sync: This was working in 0.10.0 and stopped working in 0.11.0. I haven't dug into why but presume some refactoring around client tracers is related. We would have caught the issue if I hadn't broken the tests in the first place.
Async: I suspect this has been broken for a long time, but none of our (e.g., aws-otel-java) integration tests use async clients yet. The docs from age-old code in the instrumentation seemed to indicate context is propagated to Netty properly, though extremely hackily, so I removed the even more hacky
amz-sdk-invocation-id
header check that we used to have to prevent signature breakage. But after adding the downstream instrumentation to the tests, I found that this wasn't the case. Using an IntelliJ debugger, I could confirmbeforeTransmission
is run on the application thread, not Netty thread, so the complicated instrumentation hack was having no effect (our executor instrumentation does not apply to netty event loops so I wouldn't expect those to ever have been helping here. Anyways, it's too deep to rely on IMO.). I went ahead and removed the hacky instrumentation and moved the scope management to the interceptor for use in synchronous clients only. For the netty spans, I restored the funamz-sdk-invocation-id
as no alternative seems clearly better.I also went ahead and had the SDK explicitly use aws propagation instead of the global ones. Unlike lambda, there is no use case for any other header, probably not for the next several years really, so this will allow a user's app to be more efficient, they don't have to enable aws globally (meaning on their own ingestion and extraction), and still get the benefit it provides on AWS SDK calls. This makes the integration test confirming that headers haven't been added (breaking signature, the worst thing our instrumentation could ever do) more robust in the process.