-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minio s3 not compliant with ListMultipartUploads #13246
Comments
We do not implement full ListMultipartUploads API on purpose. There are no real uses for it, any application that needs real use for it should ask for actual object names instead which we support. Heirarchies inside ListMultipartUploads is not going to be implemented as there are no real reasons to have such an API. |
Read the explanation here on why and what you thinking as resumable is meaningless in AWS S3 and has potential for corruptions. |
Thanks for letting me know. This is interesting because even 5 years later (in reference to aws/aws-sdk-go#1518 (comment)), I don't experience any consistency issues with the List Parts API. At least from the testing/scripts that I have used, it has always consistently persisted the last part up until the min chunk size. Also what do you mean by corruptions? Each part upload in S3 (and I assume minio) is md5 checksummed and is part of the etag. |
You are not reading it properly, there is no locking between multiple listParts calls, concurrent callers can have invalid part details leading to incorrect complete multipart uploads. So solely relying on some resuming mechanism based on these APIs will never work. The state is meant to be purely persisted on client when completing a multipart upload. For us added benefit of this is we don't have to comply for the sake of compliance. There is no real point of an API that provides hierarchical output in this case. We just implement what is necessary that is allow listing when object name is exact. AWS S3 over complicated this API and we have no interest in unnecessary compliance for no benefits. |
Yeah in my use case there was locking on the client so no concurrent requests would have happened and it also would have only occurred after the multipart upload would have been halted (see https://github.com/mdedetrich/alpakka/blob/add-listuploadmultipart/s3/src/test/scala/akka/stream/alpakka/s3/scaladsl/S3IntegrationSpec.scala#L463-L465). I assume by concurrent requests you mean having multiple ListPart requests happen at the same time while the file is still being uploaded? The idea was to get the state of the parts only when you know that no one is currently uploading to the specific
Understand this now, although it does greatly complicate things because I wanted to implement pause/resume functionality without needing to keep track of state after the pause and relying on the server being the source of truth for state Anyways as you said there is no reason to push this change into minio |
This issue is a follow on from #5613
Expected Behavior
The S3 list multipart uploads API (see https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListMultipartUploads.html) does not work on Minio
Current Behavior
The current behavior just returns an empty list when a prefix is not supplied and it also intentionally doesn't appear to be supported (see #5613 (comment))
Possible Solution
Implement S3's list multipart upload API so that it behaves like the real S3
Steps to Reproduce (for bugs)
I am currently doing a PR to add list multipart upload/list part API functionality to alpakka when I came across this issue in a PR. The relevant test is at https://github.com/mdedetrich/alpakka/blob/add-listuploadmultipart/s3/src/test/scala/akka/stream/alpakka/s3/scaladsl/S3IntegrationSpec.scala#L449-L495
sbt s3/testOnly *.MinioS3IntegrationSpec
. If you don't have sbt installed then follow the instructions at https://www.scala-sbt.org/download.html (sbt should already be available in package managers for most Linux distros and homebrew for mac)Note that this test works fine against S3 but fails against minio because the list multipart uploads request returns an empty JSON list.
Context
So this is a follow on from #5613 however I disagree with the reasoning stated at #5613 (comment), i.e.
Billing is one reason for listing multipart uploads, but its not the only reason. The other reason (which has nothing to do with billing) is to give the ability to resume previously aborted multipart uploads in a stateless manner, i.e. https://stackoverflow.com/questions/53764876/resume-s3-multipart-upload-partetag.
More concretely, if you abort a multipart upload while its uploading and you want to resume/complete the upload later on and you don't have the
uploadId
(i.e. because you are stateless), using the list multipart uploads is the only way to retrieve thatuploadId
so that you can complete the upload. This is typically done by listing all aborted multipart uploads and then filtering them bykey
so you can then retrieve theuploadId
(you can then proceed to use the https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListParts.html to find the latest part for thatuploadId
so you can get the etag/part number in order to complete the upload)Your Environment
minio --version
): https://hub.docker.com/layers/minio/minio/latest/images/sha256-3346321023fc6d6a198328a41c2a7b210c6a51e6e3a842a1679b7752bb352f40?context=exploredocker run -e MINIO_ACCESS_KEY=TESTKEY -e MINIO_SECRET_KEY=TESTSECRET -e MINIO_DOMAIN=s3minio.alpakka -p 9000:9000 minio/minio server /data
uname -a
):Linux mdedetrich-aiven 5.14.2-1-MANJARO #1 SMP PREEMPT Wed Sep 8 14:11:01 UTC 2021 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: