calculate expected MD5sums of multipart-uploaded files #520

mdomsch · 2015-04-01T18:42:12Z

http://stackoverflow.com/questions/12186993/what-is-the-algorithm-to-compute-the-amazon-s3-etag-for-a-file-larger-than-5gb/19896823#19896823
describes the method to determine the ETag of a multipart-uploaded file. You have to know the multipart chunk size used to upload the file though.

Can we use this somehow?

sambrightman · 2016-11-23T16:11:39Z

I presume you know both the chunk size and total size when uploading, since you control it? Then reproduce the hashing algorithm you linked to.

When downloading, you know the ContentLength and the hyphenated suffix of the ETag (if any) - isn't doing this division enough to verify the content matches the ETag?

fviard · 2016-11-23T16:32:09Z

The issue is that you can not be sure that the "chunk-size" that was used to upload the existing files on S3, is the same as the configuration of the s3cmd that you will use now.

Like if some uploaded the files using the webinterface, or whatever tool. And then you try to retrieve or update them with s3cmd.

sambrightman · 2016-11-23T17:11:43Z

Is there something specifically wrong with dividing the ContentLength by the number of parts (from the ETag suffix)?

fviard · 2016-11-23T17:37:31Z

The point is that you assume that we know the "chunk size". But this is only true if s3cmd with the same chunk size was used for upload and for download.
When s3cmd arrive and list the s3 storage, he can't know the chunk size.

And you can't calculate the chunk size based on the number of parts and the contentlength because of the last part. The last part can have any size between 1 and chunk-size. So it is not like if there was exactly N parts with exactly the same size.

sambrightman · 2016-11-23T19:18:13Z

I didn't intend to assume knowledge of the chunk size. But you're right of course - rounding is working for me (manually) because the chunk size is so small compared to the file I'm checking, and people don't tend to pick odd-sized chunks. It doesn't work in general due to the last chunk problem.

mdomsch added the feature-request label Apr 1, 2015

mdomsch mentioned this issue Apr 1, 2015

"s3cmd sync" uses excessive requests #476

Closed

vshlapakov mentioned this issue Feb 18, 2016

Increate multipart chunk size when storing to S3 scrapy-plugins/scrapy-dotpersistence#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculate expected MD5sums of multipart-uploaded files #520

calculate expected MD5sums of multipart-uploaded files #520

mdomsch commented Apr 1, 2015

sambrightman commented Nov 23, 2016

fviard commented Nov 23, 2016

sambrightman commented Nov 23, 2016

fviard commented Nov 23, 2016

sambrightman commented Nov 23, 2016

calculate expected MD5sums of multipart-uploaded files #520

calculate expected MD5sums of multipart-uploaded files #520

Comments

mdomsch commented Apr 1, 2015

sambrightman commented Nov 23, 2016

fviard commented Nov 23, 2016

sambrightman commented Nov 23, 2016

fviard commented Nov 23, 2016

sambrightman commented Nov 23, 2016