Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Nov 24, 2025

AWS S3 uses the ETag header to identify the object contents in various
API responses. S3HttpHandler doesn't today return this header from its
GetObject API, and its APIs which do return an ETag do not properly
conform to its spec (particularly, they are not surrounded by "
characters). This commit adds the missing response header to the
GetObject API, fixes its format, and uses SHA256 rather than MD5 to
compute the result.

AWS S3 uses the `ETag` header to identify the object contents in various
API responses. `S3HttpHandler` returns this header on some paths, but
not very many, and the returned header does not conform to the spec
(particularly, it is not surrounded by `"` characters). This commit adds
the missing response header to the `GetObject` API, fixes its format,
and uses SHA256 rather than MD5 to compute the result.
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Nov 24, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Nov 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@joshua-adams-1 joshua-adams-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just had a few small points

return;
}

exchange.getResponseHeaders().add("ETag", getEtagFromContents(blob));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a brief comment here explaining what an ETag is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment in da4e70e. Not sure where to put it really, ETag is a pretty standard piece of HTTP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK S3 returns ETag on creation(PutObject, MPU) too, but our fixture does not return ETag unless there is precondition failure. Should we add ETag to all operations that suppose to have one?

Also can compute ETag once on creation and store together with blob

record Blob (BytesReference bytes, String etag) {}
private final ConcurrentMap<String, Blob> blobs = new ConcurrentHashMap<>();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no plans to use the ETag returned by the PutObject and CompleteMultipartUpload APIs so I'm hesitant to add them now. It'd be easy enough to add later if needed.

I did consider tracking etags alongside the object contents but it's a bigger change and not really necessary. This is purely test fixture code, and indeed a bit of latency here and there is helpful for test coverage.

}
}

public static String getEtagFromContents(BytesReference blobContents) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value adding a specific unit test for this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh possibly, I meant to at least. Added in 6e98292.

*/
private static final Set<String> METHODS_HAVING_NO_REQUEST_BODY = Set.of("GET", "HEAD", "DELETE");

private static final String SHA_256_ETAG_PREFIX = "es-test-sha-256-";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this string going to be visible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is only in the test fixture, I'm using something descriptive here just to aid future troubleshooting. It's treated as an opaque string everywhere else.

@joshua-adams-1
Copy link
Contributor

S3HttpHandler returns this header on some paths, but
not very many

Where else should this be added?

S3HttpHandler returns this header on some paths, but
not very many

Is there an obvious reason for this?

@DaveCTurner
Copy link
Contributor Author

Is there an obvious reason for this?

Really just necessity - we haven't needed it so it hasn't been included.

@DaveCTurner DaveCTurner changed the title Return ETag header from S3 fixture Return ETag from S3 fixture GetObject API Nov 24, 2025
Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit 0e5dc16 into elastic:main Nov 24, 2025
34 checks passed
@DaveCTurner DaveCTurner deleted the 2025/11/24/s3-fixture-etag branch November 24, 2025 19:56
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Nov 26, 2025
AWS S3 uses the `ETag` header to identify the object contents in various
API responses. `S3HttpHandler` doesn't today return this header from its
`GetObject` API, and its APIs which do return an `ETag` do not properly
conform to its spec (particularly, they are not surrounded by `"`
characters). This commit adds the missing response header to the
`GetObject` API, fixes its format, and uses SHA256 rather than MD5 to
compute the result.
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
AWS S3 uses the `ETag` header to identify the object contents in various
API responses. `S3HttpHandler` doesn't today return this header from its
`GetObject` API, and its APIs which do return an `ETag` do not properly
conform to its spec (particularly, they are not surrounded by `"`
characters). This commit adds the missing response header to the
`GetObject` API, fixes its format, and uses SHA256 rather than MD5 to
compute the result.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants