Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minio s3 not compliant with ListMultipartUploads #13246

Closed
mdedetrich opened this issue Sep 18, 2021 · 5 comments
Closed

Minio s3 not compliant with ListMultipartUploads #13246

mdedetrich opened this issue Sep 18, 2021 · 5 comments

Comments

@mdedetrich
Copy link

mdedetrich commented Sep 18, 2021

This issue is a follow on from #5613

Expected Behavior

The S3 list multipart uploads API (see https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListMultipartUploads.html) does not work on Minio

Current Behavior

The current behavior just returns an empty list when a prefix is not supplied and it also intentionally doesn't appear to be supported (see #5613 (comment))

Possible Solution

Implement S3's list multipart upload API so that it behaves like the real S3

Steps to Reproduce (for bugs)

I am currently doing a PR to add list multipart upload/list part API functionality to alpakka when I came across this issue in a PR. The relevant test is at https://github.com/mdedetrich/alpakka/blob/add-listuploadmultipart/s3/src/test/scala/akka/stream/alpakka/s3/scaladsl/S3IntegrationSpec.scala#L449-L495

  1. Clone S3: Add MultiPartUpload and ListParts APIs akka/alpakka#2730, specifically the branch where the PR is
  2. Follow these instructions to start a minio docker https://github.com/mdedetrich/alpakka/blob/add-listuploadmultipart/s3/src/test/scala/akka/stream/alpakka/s3/scaladsl/S3IntegrationSpec.scala#L719-L721
  3. Comment out https://github.com/mdedetrich/alpakka/blob/add-listuploadmultipart/s3/src/test/scala/akka/stream/alpakka/s3/scaladsl/S3IntegrationSpec.scala#L452 to enable the test to run on minio
  4. Run the minio tests using sbt s3/testOnly *.MinioS3IntegrationSpec. If you don't have sbt installed then follow the instructions at https://www.scala-sbt.org/download.html (sbt should already be available in package managers for most Linux distros and homebrew for mac)

Note that this test works fine against S3 but fails against minio because the list multipart uploads request returns an empty JSON list.

Context

So this is a follow on from #5613 however I disagree with the reasoning stated at #5613 (comment), i.e.

Reason for ListMultipartUploads to exist in AWS-S3 is for clients to list multipart uploads and remove them so that AWS do not bill for them. In our case we auto purge old multipart uploads and not support ListMultipartUploads.

Billing is one reason for listing multipart uploads, but its not the only reason. The other reason (which has nothing to do with billing) is to give the ability to resume previously aborted multipart uploads in a stateless manner, i.e. https://stackoverflow.com/questions/53764876/resume-s3-multipart-upload-partetag.

More concretely, if you abort a multipart upload while its uploading and you want to resume/complete the upload later on and you don't have the uploadId (i.e. because you are stateless), using the list multipart uploads is the only way to retrieve that uploadId so that you can complete the upload. This is typically done by listing all aborted multipart uploads and then filtering them by key so you can then retrieve the uploadId (you can then proceed to use the https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListParts.html to find the latest part for that uploadId so you can get the etag/part number in order to complete the upload)

Your Environment

@harshavardhana
Copy link
Member

We do not implement full ListMultipartUploads API on purpose.

There are no real uses for it, any application that needs real use for it should ask for actual object names instead which we support.

Heirarchies inside ListMultipartUploads is not going to be implemented as there are no real reasons to have such an API.

@harshavardhana
Copy link
Member

Read the explanation here on why and what you thinking as resumable is meaningless in AWS S3 and has potential for corruptions.

#11686 (comment)

@mdedetrich
Copy link
Author

mdedetrich commented Sep 18, 2021

Thanks for letting me know. This is interesting because even 5 years later (in reference to aws/aws-sdk-go#1518 (comment)), I don't experience any consistency issues with the List Parts API.

At least from the testing/scripts that I have used, it has always consistently persisted the last part up until the min chunk size.

Also what do you mean by corruptions? Each part upload in S3 (and I assume minio) is md5 checksummed and is part of the etag.

@harshavardhana
Copy link
Member

Thanks for letting me know. This is interesting because even 5 years later (in reference to aws/aws-sdk-go#1518 (comment)), I don't experience any consistency issues with the List Parts API.

At least from the testing/scripts that I have used, it has always consistently persisted the last part up until the min chunk size.

Also what do you mean by corruptions? Each part upload in S3 (and I assume minio) is md5 checksummed and is part of the etag.

You are not reading it properly, there is no locking between multiple listParts calls, concurrent callers can have invalid part details leading to incorrect complete multipart uploads. So solely relying on some resuming mechanism based on these APIs will never work.

The state is meant to be purely persisted on client when completing a multipart upload.

For us added benefit of this is we don't have to comply for the sake of compliance. There is no real point of an API that provides hierarchical output in this case.

We just implement what is necessary that is allow listing when object name is exact.

AWS S3 over complicated this API and we have no interest in unnecessary compliance for no benefits.

@mdedetrich
Copy link
Author

mdedetrich commented Sep 18, 2021

You are not reading it properly, there is no locking between multiple listParts calls, concurrent callers can have invalid part details leading to incorrect complete multipart uploads. So solely relying on some resuming mechanism based on these APIs will never work.

Yeah in my use case there was locking on the client so no concurrent requests would have happened and it also would have only occurred after the multipart upload would have been halted (see https://github.com/mdedetrich/alpakka/blob/add-listuploadmultipart/s3/src/test/scala/akka/stream/alpakka/s3/scaladsl/S3IntegrationSpec.scala#L463-L465). I assume by concurrent requests you mean having multiple ListPart requests happen at the same time while the file is still being uploaded?

The idea was to get the state of the parts only when you know that no one is currently uploading to the specific uploadId so concurrency wouldn't have been an issue.

The state is meant to be purely persisted on client when completing a multipart upload.

Understand this now, although it does greatly complicate things because I wanted to implement pause/resume functionality without needing to keep track of state after the pause and relying on the server being the source of truth for state

Anyways as you said there is no reason to push this change into minio

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants