Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

propagating trace context headers with aws-sdk HTTP requests breaks AWS4 auth when there are retries #2134

Closed
1 of 2 tasks
trentm opened this issue Jun 29, 2021 · 1 comment · Fixed by #2135
Closed
1 of 2 tasks
Assignees
Labels
agent-nodejs Make available for APM Agents project planning.

Comments

@trentm
Copy link
Member

trentm commented Jun 29, 2021

Background

Currently the Node.js APM agent specifically avoids adding trace context headers (traceparent et al) on outgoing HTTP requests that look to be using a AWS signature version 4 "authorization" header -- which is the default for AWS client libraries (like aws-sdk). The history on that was:

Why am I here?

From recent APM agents team discussion (and motivated initially by this Elasticsearch clients issue: https://github.com/elastic/clients-team/issues/424) we want to change the APM agents spec to prefer to propagate trace context headers for all HTTP requests by default. This special case in the Node.js agent for "AWS4" requests is the only exception/argument that was brought up where adding those headers could be harmful. What is going on, then? Any HTTP auth mechanism should be fine coping with added HTTP headers, so what is going on.

tl;dr

The understanding in the above PRs was incomplete. The fix to "exclude trace context headers from AWS4 requests" fixes the issue, but isn't necessary. There was another bug going on where the Node.js agent's modification of the aws-sdk AWS.Requests "headers" object broke request retries if necessary. Details below.

How does it break on retry?

The aws-sdk package supports retries if a request fails and is retryable. Some errors are retryable by default, for example when talking to an S3 bucket in us-west-1 with a client that is configured to (or defaults to) talk to a different AWS region, say, us-east-1. (Note: Not all AWS client SDKs support retrying in this case.)

Scenario: We will be doing "HeadObject" on a S3 object with key aDir/aFile.txt in bucket trentm-play-s3-bukkit2 that lives in us-west-1. We will use a client that doesn't specify the region (which means it defaults to us-east-1):

const apm = require('elastic-apm-node').start({ /* ... */ })
const AWS = require('aws-sdk')
const s3 = new AWS.S3({ apiVersion: '2006-03-01' })

const BUKKIT = 'trentm-play-s3-bukkit2'
const KEY = 'aDir/aFile.txt'
s3.headObject({ Bucket: BUKKIT, Key: KEY }, function (err, data) {
    // ...
})

request 1: HEAD /aDir/aFile.txt to us-east-1

HEAD trentm-play-s3-bukkit2.s3.amazonaws.com/aDir/aFile.txt
Authorization: AWS4-HMAC-SHA256 Credential=ASI.../20210629/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=ccc060a33e839aa9b43de9ed5a1415ffe17d7a45fdb84fc1a87f1a7d6e0af72d
traceparent: 00-5516bb78f27c7ac32b9751945e204f2c-318dd10f671bb2f1-01
tracestate: es=s:1
...

response:

HTTP/1.1 400 Bad Request
x-amz-request-id: Y1SY641471AMCVDM
server: AmazonS3
...

request 2: HEAD / to get "x-amz-bucket-region"

That "400 Bad Request" was because we had the wrong region, so the client library
reverts to calling the HEAD / endpoint that returns a "x-amz-bucket-region"
header telling the caller what region the bucket lives in.

HEAD trentm-play-s3-bukkit2.s3.amazonaws.com/
Authorization: AWS4-HMAC-SHA256 Credential=ASI.../20210629/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=0c72eb43f8ea4c8e33eb2190a45d77ca32d0722f0dc669548114853795227f13
traceparent: 00-5516bb78f27c7ac32b9751945e204f2c-ead5115e70038cf4-01
tracestate: es=s:1
...

response:

HTTP/1.1 400 Bad Request
x-amz-bucket-region: us-west-1
x-amz-request-id: Y1SZ3FP4YVXEFM5C
server: AmazonS3
...

request 3: HEAD /aDir/aFile.txt with correct auth region now

Now that we know the region we can sign with the appropriate credentials.
Notice how above we had Credential=.../.../us-east-1/s3/aws4_request,
but in the next request we have the proper region .../.../us-west-1/...:

HEAD trentm-play-s3-bukkit2.s3.amazonaws.com/aDir/aFile.txt
Authorization: AWS4-HMAC-SHA256 Credential=ASI.../20210629/us-west-1/s3/aws4_request, SignedHeaders=elastic-apm-traceparent;host;traceparent;tracestate;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=b8a1f9247e3a0f2ee5bffe729899b132383b55ab97f83a7f1837247e320b365a
traceparent: 00-5516bb78f27c7ac32b9751945e204f2c-feb42473b7cba90e-01
tracestate: es=s:1
...

response:

HTTP/1.1 403 Forbidden
x-amz-request-id: CBD76CTWSNKBES8Q
server: AmazonS3

Oops. What happened here? In the previous two requests the headers included
in the auth signature were SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token
(headers that existed in the request but I've elided for brevity). However,
in this 3rd request we have:
SignedHeaders=elastic-apm-traceparent;host;traceparent;tracestate;x-amz-content-sha256;x-amz-date;x-amz-security-token.
The extra agent-added headers are being included in the signature. This should
be fine, except the order of operations here is:

  1. aws-sdk selects the headers from its request object, and creates a signature for this request
    (this is the Signature=b8a1f9247... above)
  2. aws-sdk calls Node.js's core https.request(...) to make an HTTP request.
  3. The Node.js APM agent's instrumentation of the https module notices that
    call and adds traceparent, tracestate, elastic-apm-traceparent with
    their current value.

That "current value" is different for this request than for the last one.
On this retried (3rd) request, the AWS.Request "headers" object has the
traceparent et al headers from the 1st request. In step 1, aws-sdk is
using the traceparent from request 1 to sign, but the request headers are then
modified to have the traceparent for request 3 before being sent.

That breaks the signature.

The bug is that the Node.js APM agent's instrumentation of https.request()
is modifying the request headers object in-place instead of creating a new
object. This breaks re-use of that headers object by the 'aws-sdk' lib.

Proposal

  • Have a Node.js APM agent release with our in-progress AWS service instrumentations. (Update 20210706: v3.17.0 of the agent has this.)
  • Fix the in-place headers modification; remove the special-casing of "AWS4" requests; and release that in a subsequent release. The reason for two releases here is that if we find removing the special-casing to result in some problem (I don't anticipate it will), then users can temporarily back up a release.
@trentm trentm self-assigned this Jun 29, 2021
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Jun 29, 2021
@trentm
Copy link
Member Author

trentm commented Jun 29, 2021

A starter fix for this is:

diff --git a/lib/instrumentation/http-shared.js b/lib/instrumentation/http-shared.js
index 9ef99862..424807f8 100644
--- a/lib/instrumentation/http-shared.js
+++ b/lib/instrumentation/http-shared.js
@@ -153,15 +153,17 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
       // to indicate requested services should not be sampled.
       // Use the transaction context as the parent, in this case.
       var parent = span || agent.currentTransaction
-      if (parent && parent._context && shouldPropagateTraceContext(options)) {
+      if (parent && parent._context) {
         const headerValue = parent._context.toTraceParentString()
         const traceStateValue = parent._context.toTraceStateString()

-        options.headers.traceparent = headerValue
-        options.headers.tracestate = traceStateValue
+        const newHeaders = Object.assign({}, options.headers)
+        newHeaders.traceparent = headerValue
+        newHeaders.tracestate = traceStateValue
         if (agent._conf.useElasticTraceparentHeader) {
-          options.headers['elastic-apm-traceparent'] = headerValue
+          newHeaders['elastic-apm-traceparent'] = headerValue
         }
+        options.headers = newHeaders
       }

       var req = orig.apply(this, newArgs)
@@ -249,15 +251,6 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
   }
 }

-function shouldPropagateTraceContext (opts) {
-  return !isAWSSigned(opts)
-}
-
-function isAWSSigned (opts) {
-  const auth = opts.headers && (opts.headers.Authorization || opts.headers.authorization)
-  return typeof auth === 'string' ? auth.startsWith('AWS4-') : false
-}
-
 // Creates a sanitized URL suitable for the span's HTTP context
 //
 // This function reconstructs a URL using the request object's properties

trentm added a commit that referenced this issue Jun 29, 2021
…ct in-place

This is a bug that broke propagation of trace-context headers with the
aws-sdk package, AWS4 signature auth, and request *retries*. Fixing this
allows us to propagate those headers now and remove the special case.

Fixes: #2134
trentm added a commit that referenced this issue Jul 8, 2021
…ct in-place (#2135)

This is a bug that broke propagation of trace-context headers with the
aws-sdk package, AWS4 signature auth, and request *retries*. Fixing this
allows us to propagate those headers now and remove the special case.

Fixes: #2134
dgieselaar pushed a commit to dgieselaar/apm-agent-nodejs that referenced this issue Sep 10, 2021
…ct in-place (elastic#2135)

This is a bug that broke propagation of trace-context headers with the
aws-sdk package, AWS4 signature auth, and request *retries*. Fixing this
allows us to propagate those headers now and remove the special case.

Fixes: elastic#2134
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
1 participant