Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform MD5 hash calculation during upload #5186

Closed
cyberduck opened this issue Sep 13, 2010 · 4 comments
Closed

Perform MD5 hash calculation during upload #5186

cyberduck opened this issue Sep 13, 2010 · 4 comments
Assignees
Labels
enhancement fixed low priority s3
Milestone

Comments

@cyberduck
Copy link
Collaborator

@cyberduck cyberduck commented Sep 13, 2010

4cddcf2 created the issue

Currently a MD5 hash of every upload to S3 is calculated before starting the upload. This can consume a large amount of time and no progress bar can be given during that operation therefor the upload time estimate is useless.

I suggest to calculate the MD5 hash during the upload when reading from the stream. See for an example: http://stackoverflow.com/questions/304268/using-java-to-get-a-files-md5-checksum

Now S3 will not return an error for a corrupted upload since it has no hash to compare. Instead the returned ETag from S3 has to be used to verify that the upload was successful: http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectPOST.html

Alternatively it would be good to have at least the option to disable the hash computation since there are cases where the overhead is not justified.

@cyberduck
Copy link
Collaborator Author

@cyberduck cyberduck commented Sep 13, 2010

@dkocher commented

I agree. Calculating the hash on the fly would be an improvment. The only downside is that we need a second request when we still want to set the value of the MD5 in the metadata of the file as we currently do (see md5-hash in metadata).

@cyberduck
Copy link
Collaborator Author

@cyberduck cyberduck commented Sep 13, 2010

4cddcf2 commented

Agreed and i was thinking about it. I think however that it is obsolete, one could use the ETag all the way through instead.

See:
http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectGET.html
http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectHEAD.html

@cyberduck
Copy link
Collaborator Author

@cyberduck cyberduck commented Nov 19, 2010

@dkocher commented

If the property s3.upload.metadata.md5 is set to true (false is default), then set the Content-MD5 header and let S3 check the integrity of the upload. Otherwise, we calculate the MD5 on the fly during the upload and compare it to the ETag returned for the upload.

In 82239e1.

@cyberduck
Copy link
Collaborator Author

@cyberduck cyberduck commented Nov 19, 2010

@dkocher commented

Same fix for Rackspace Cloudfiles in 52ab01c. Addendum in d7f153a.

@iterate-ch iterate-ch locked as resolved and limited conversation to collaborators Nov 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement fixed low priority s3
Projects
None yet
Development

No branches or pull requests

2 participants