New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calculate expected MD5sums of multipart-uploaded files #520
Comments
I presume you know both the chunk size and total size when uploading, since you control it? Then reproduce the hashing algorithm you linked to. When downloading, you know the ContentLength and the hyphenated suffix of the ETag (if any) - isn't doing this division enough to verify the content matches the ETag? |
The issue is that you can not be sure that the "chunk-size" that was used to upload the existing files on S3, is the same as the configuration of the s3cmd that you will use now. Like if some uploaded the files using the webinterface, or whatever tool. And then you try to retrieve or update them with s3cmd. |
Is there something specifically wrong with dividing the ContentLength by the number of parts (from the ETag suffix)? |
The point is that you assume that we know the "chunk size". But this is only true if s3cmd with the same chunk size was used for upload and for download. And you can't calculate the chunk size based on the number of parts and the contentlength because of the last part. The last part can have any size between 1 and chunk-size. So it is not like if there was exactly N parts with exactly the same size. |
I didn't intend to assume knowledge of the chunk size. But you're right of course - rounding is working for me (manually) because the chunk size is so small compared to the file I'm checking, and people don't tend to pick odd-sized chunks. It doesn't work in general due to the last chunk problem. |
http://stackoverflow.com/questions/12186993/what-is-the-algorithm-to-compute-the-amazon-s3-etag-for-a-file-larger-than-5gb/19896823#19896823
describes the method to determine the ETag of a multipart-uploaded file. You have to know the multipart chunk size used to upload the file though.
Can we use this somehow?
The text was updated successfully, but these errors were encountered: