-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Implement failIfAlreadyExists in S3 repositories #133030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good approach to me. I left a few small comments.
} | ||
} | ||
|
||
public void testFailIfAlreadyExists() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, so nice to see this test being added at last.
Could we make it slightly stronger and instead start by performing two writes concurrently, asserting that exactly one of them succeeds, and then follow up with the check that we can overwrite blobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the test to include the changes you requested, but now it fails in HdfsRepositoryTests
. Both writes succeed. I also tried using blobStore.writeBlobAtomic(...)
, without luck.
From what I can tell, TestingFs
might not provide atomic renaming. Could you advise on how best to proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh HDFS how we love thee. I'd suggest suppressing the test thusly:
diff --git a/plugins/repository-hdfs/src/test/java/org/elasticsearch/repositories/hdfs/HdfsRepositoryTests.java b/plugins/repository-hdfs/src/test/java/org/elasticsearch/repositories/hdfs/HdfsRepositoryTests.java
index 7961ca0257be..3d75d9915bf7 100644
--- a/plugins/repository-hdfs/src/test/java/org/elasticsearch/repositories/hdfs/HdfsRepositoryTests.java
+++ b/plugins/repository-hdfs/src/test/java/org/elasticsearch/repositories/hdfs/HdfsRepositoryTests.java
@@ -62,4 +62,9 @@ public class HdfsRepositoryTests extends AbstractThirdPartyRepositoryTestCase {
assertThat(response.result().blobs(), equalTo(0L));
}
}
+
+ @Override
+ public void testFailIfAlreadyExists() {
+ // HDFS does not implement failIfAlreadyExists correctly
+ }
}
+ "</Key>\n" | ||
+ "</CompleteMultipartUploadResult>").getBytes(StandardCharsets.UTF_8); | ||
|
||
if (isProtectOverwrite(exchange) && blobs.containsKey(request.path())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking containsKey
before the put
is racy. I think we need a putIfAbsent
if If-None-Match: *
is specified, and therefore also a test which tries to catch a race here.
// a copy request is a put request with an X-amz-copy-source header | ||
final var copySource = copySourceName(exchange); | ||
if (copySource != null) { | ||
if (isProtectOverwrite(exchange) && blobs.containsKey(request.path())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, we need to use putIfAbsent
to avoid the race on this path.
...third-party/src/test/java/org/elasticsearch/repositories/s3/S3RepositoryThirdPartyTests.java
Outdated
Show resolved
Hide resolved
// initial write blob | ||
writeBlob(container, blobName, new BytesArray(data), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good spot to check for potential races in S3HttpHandler
, if we initially wrote two blobs concurrently and verified that exactly one of those writes succeeded.
Could we also sometimes write a much larger blob to trigger the multipart upload path?
|
||
} | ||
|
||
public void testPreventObjectOverwrite() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'd like us to test for races here too. Could we concurrently issue one or two PutObject
requests, and one or two CompleteMultipartUpload
requests, and verify that exactly one of them succeeds? And ideally that doing a GetObject
afterwards returns the contents of the object which succeeded?
… S3 repository test
… and concurrent write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR solves #128565
Not quite, we also need to verify the failIfAlreadyExists
behaviour in repo analysis before we can close this issue. But I'm ok to merge this PR to implement the flag even without the repo analysis changes.
I left a few small comments but structurally this all looks good now.
final String destinationBlobName, | ||
final long blobSize | ||
final long blobSize, | ||
final boolean failIfAlreadyExists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter is always false
(in production code anyway) and the true
case isn't really tested - I'd rather we inlined it, passing the literal false
to executeMultipart
, instead.
exchange.sendResponseHeaders(RestStatus.OK.getStatus(), response.length); | ||
exchange.getResponseBody().write(response); | ||
boolean preconditionFailed = false; | ||
if (isProtectOverwrite(exchange)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a way for this to be true
with a request from Elasticsearch as it stands today, nor does it look like this is covered by unit tests, so this is effectively dead code. I'd rather leave this area alone for now, except perhaps to throw something (AssertionError
would be fine) to document that If-None-Match: *
isn't supported here if ever we change the callers in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand, are you suggesting I revert the changes to the request.isPutObjectRequest()
case? That would also mean updating S3HttpHandlerTests.testPreventObjectOverwrite()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the PutObject
request handling is doing what we want and is adequately tested. It's just the CopyObject
API here where we're not actually using or even testing the If-None-Match: *
option.
|
||
TestWriteTask(Consumer<TestWriteTask> consumer, Consumer<TestWriteTask> prepare) { | ||
this(consumer); | ||
prepare.accept(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just call the prepare steps directly in the caller? It's kinda hard to follow the flow as written, it looks as if we're doing both the prepare and complete steps concurrently (and in the wrong order too!)
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) { | ||
tasks.forEach(task -> executor.submit(task.consumer)); | ||
executor.shutdown(); | ||
var done = executor.awaitTermination(1, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1s
is too short to be a reliable timeout in tests (we push our CI workers pretty hard sometimes):
var done = executor.awaitTermination(1, TimeUnit.SECONDS); | |
var done = executor.awaitTermination(SAFE_AWAIT_TIMEOUT.seconds(), TimeUnit.SECONDS); |
ex2 = e; | ||
} | ||
|
||
assertTrue("Exactly one of the writes must fail", ex1 != null ^ ex2 != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it's that exactly one must succeed, it just happens that that's equivalent to exactly one failing when there's only two requests (and ^
is a little unfriendly to readers, I'd recommend !=
)
assertTrue("Exactly one of the writes must fail", ex1 != null ^ ex2 != null); | |
assertTrue("Exactly one of the writes must succeed", (ex1 == null) != (ex2 == null)); |
…dition check logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (assuming CI is happy, I'll take care of that) - thanks for your work on this @kvanerum
@elasticmachine test this please |
Today, S3-backed repositories ignore the `failIfAlreadyExists` flag and may therefore overwrite a blob which already exists, potentially corrupting a repository subject to concurrent writes, rather than failing the second write. AWS S3 now supports writes conditional on the non-existence of an object via the `If-None-Match: *` HTTP header. This commit adjusts the S3-backed repository implementation to respect the `failIfAlreadyExists` flag using these conditional writes, eliminating the possibility of overwriting blobs which should not be overwritten. Relates elastic#128565
Today, S3-backed repositories ignore the
failIfAlreadyExists
flag andmay therefore overwrite a blob which already exists, potentially
corrupting a repository subject to concurrent writes, rather than
failing the second write.
AWS S3 now supports writes conditional on the non-existence of an object
via the
If-None-Match: *
HTTP header. This commit adjusts theS3-backed repository implementation to respect the
failIfAlreadyExists
flag using these conditional writes, eliminating the possibility of
overwriting blobs which should not be overwritten.
Relates #128565