Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor clarification of GCS data validation docs; change default hash algo? #1749

Closed
zbjornson opened this issue Oct 24, 2016 · 3 comments · Fixed by #1750
Closed

Minor clarification of GCS data validation docs; change default hash algo? #1749

zbjornson opened this issue Oct 24, 2016 · 3 comments · Fixed by #1750
Assignees
Labels
api: storage Issues related to the Cloud Storage API. type: question Request for information or clarification. Not an issue.

Comments

@zbjornson
Copy link
Contributor

The docs for createReadStream and createWriteStream say

"By default, data integrity is validated with an MD5 checksum for maximum reliability, falling back to CRC32c when an MD5 hash wasn't returned from the API."

It appears that actually both are calculated, and both are verified if present (although the md5 check overrules the crc32c check), so I think it would be more accurate to say

"By default, checksums are calculated using both MD5 and CRC32c on the fly; MD5 is used for the final validation for maximum reliability, falling back to CRC32c when an MD5 hash wasn't returned from the API (i.e. in the case of composite objects)."

But, using a cryptographic hash seems like an overkill default. What do you think of changing it to by default calculating CRC32c only, and not calculating MD5 unless the validation method is "MD5"?

(Also, if you specify "md5" then it does not appear to fallback to CRC32c if there was no MD5 returned -- ideally it would fallback instead of doing zero validation, I think?)

@zbjornson zbjornson changed the title Minor clarification of GCS data validation; change default hash algo? Minor clarification of GCS data validation docs; change default hash algo? Oct 24, 2016
@stephenplusplus
Copy link
Contributor

I'm fine with the docs change. As far as disabling MD5 check by default, that would be validating using the less-reliable method, by our own words. If it being expensive to compute is an issue to the user, they have the option to disable, which I think is good enough.

(Also, if you specify "md5" then it does not appear to fallback to CRC32c if there was no MD5 returned -- ideally it would fallback instead of doing zero validation, I think?)

This seems ideal. If the user wanted MD5 validation specifically and we couldn't do it, that should be a failure.

@stephenplusplus stephenplusplus added type: question Request for information or clarification. Not an issue. docs api: storage Issues related to the Cloud Storage API. labels Oct 24, 2016
@zbjornson
Copy link
Contributor Author

Thanks as always for the prompt reply!

validating using the less-reliable method, by our own words

Sure, although I think those words might be google-cloud-node's? The GCS docs advise CRC32C "for all cases":

Google recommends that customers use CRC32C for all cases, as described in the Validation section below. Customers that prefer MD5 can use that hash, but that hash is not supported for composite objects.

Because it's configurable ❤️, I totally don't feel strongly about changing it, but I think most people would say MD5 is overkill for data integrity checks. (The risk of undetected corruption with CRC32c is 1/2^32, or 0.99999999977 detection.)

(Re: last point, makes sense.)

@stephenplusplus
Copy link
Contributor

Oh jeez, if we just made that up, then I'm fine re-thinking this. We can follow that doc's advice and do crc32c by default and md5 if asked. Any interest in a PR? No pressure, if not, I'll get to it soon :)

Thanks for calling this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants