Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate expected MD5sums of multipart-uploaded files #520

Open
mdomsch opened this issue Apr 1, 2015 · 5 comments
Open

calculate expected MD5sums of multipart-uploaded files #520

mdomsch opened this issue Apr 1, 2015 · 5 comments

Comments

@mdomsch
Copy link
Contributor

mdomsch commented Apr 1, 2015

http://stackoverflow.com/questions/12186993/what-is-the-algorithm-to-compute-the-amazon-s3-etag-for-a-file-larger-than-5gb/19896823#19896823
describes the method to determine the ETag of a multipart-uploaded file. You have to know the multipart chunk size used to upload the file though.

Can we use this somehow?

@sambrightman
Copy link

I presume you know both the chunk size and total size when uploading, since you control it? Then reproduce the hashing algorithm you linked to.

When downloading, you know the ContentLength and the hyphenated suffix of the ETag (if any) - isn't doing this division enough to verify the content matches the ETag?

@fviard
Copy link
Contributor

fviard commented Nov 23, 2016

The issue is that you can not be sure that the "chunk-size" that was used to upload the existing files on S3, is the same as the configuration of the s3cmd that you will use now.

Like if some uploaded the files using the webinterface, or whatever tool. And then you try to retrieve or update them with s3cmd.

@sambrightman
Copy link

Is there something specifically wrong with dividing the ContentLength by the number of parts (from the ETag suffix)?

@fviard
Copy link
Contributor

fviard commented Nov 23, 2016

The point is that you assume that we know the "chunk size". But this is only true if s3cmd with the same chunk size was used for upload and for download.
When s3cmd arrive and list the s3 storage, he can't know the chunk size.

And you can't calculate the chunk size based on the number of parts and the contentlength because of the last part. The last part can have any size between 1 and chunk-size. So it is not like if there was exactly N parts with exactly the same size.

@sambrightman
Copy link

I didn't intend to assume knowledge of the chunk size. But you're right of course - rounding is working for me (manually) because the chunk size is so small compared to the file I'm checking, and people don't tend to pick odd-sized chunks. It doesn't work in general due to the last chunk problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants