Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudflare R2: WARNING: MD5 Sums don't match! #1273

Open
Lusitaniae opened this issue Aug 18, 2022 · 4 comments
Open

Cloudflare R2: WARNING: MD5 Sums don't match! #1273

Lusitaniae opened this issue Aug 18, 2022 · 4 comments

Comments

@Lusitaniae
Copy link

Lusitaniae commented Aug 18, 2022

Happens when upload large (>5G) files that require multi part upload to Cloudflare R2.

s3cmd put -d -v my-file.tar.zst s3://my-bucket/ 
DEBUG: Canonical Request:
PUT
/my-file.tar.zst
partNumber=1&uploadId=[redacted]
content-length:15728640
host:[redacted].r2.cloudflarestorage.com
x-amz-content-sha256:22c5bf1bd95afe12f8cd6e13ae5db4299a9defcb6df2cfc69285488e2deb5c09
x-amz-date:20220818T052008Z

content-length;host;x-amz-content-sha256;x-amz-date
22c5bf1bd95afe12f8cd6e13ae5db4299a9defcb6df2cfc69285488e2deb5c09
----------------------
DEBUG: signature-v4 headers: {'content-length': '15728640', 'x-amz-date': '20220818T052008Z', 'Authorization': '[redacted]', 'x-amz-content-sha256': '22c5bf1bd95afe12f8cd6e13ae5db4299a9defcb6df2cfc69285488e2deb5c09'}
DEBUG: get_hostname([redacted]): [redacted].r2.cloudflarestorage.com
DEBUG: ConnMan.get(): re-using connection: https://[redacted].r2.cloudflarestorage.com#6
DEBUG: format_uri(): /my-file.tar.zst?partNumber=1&uploadId=[redacted]
    65536 of 15728640     0% in    0s     4.97 MB/sDEBUG: ConnMan.put(): connection put back to pool (https://[redacted].r2.cloudflarestorage.com#7)
DEBUG: Response:
{'data': b'',
 'headers': {'cf-ray': '73c832d5dd7e15cb-EWR',
             'connection': 'keep-alive',
             'content-length': '0',
             'date': 'Thu, 18 Aug 2022 05:20:13 GMT',
             'etag': '"ABwcsNIEIx/3TXD+37wkhu1YIh8AgUg/++I5bsBm9MiQotlGsTOkpQhTeRkj/p5IFx2PSa/ouG94ghv+Mniyltsnj6QDUb9omfJfRLd0hJVqTPReu9NfKcBp0Z9NTBHcwf83xI3u49eLDXsDH9rS/EDF9ALqJ6Y6HmUCfB4g6bwZSeAgly77Amaqib1kkH+uta/NcIfe1ot1he0iaLC5ZIwruHOrG+F5gsZkmJ1qZXpWrYLBVUhyFPZ6Yo1LlKjSJw=="',
             'expect-ct': 'max-age=604800, '
                          'report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',
             'server': 'cloudflare',
             'vary': 'Accept-Encoding'},
 'reason': 'OK',
 'size': 15728640,
 'status': 200}
 15728640 of 15728640   100% in    4s     3.34 MB/s  done
DEBUG: MD5 sums: computed=e7df577f795e45df5535f558c9931973, received=ABwcsNIEIx/3TXD+37wkhu1YIh8AgUg/++I5bsBm9MiQotlGsTOkpQhTeRkj/p5IFx2PSa/ouG94ghv+Mniyltsnj6QDUb9omfJfRLd0hJVqTPReu9NfKcBp0Z9NTBHcwf83xI3u49eLDXsDH9rS/EDF9ALqJ6Y6HmUCfB4g6bwZSeAgly77Amaqib1kkH+uta/NcIfe1ot1he0iaLC5ZIwruHOrG+F5gsZkmJ1qZXpWrYLBVUhyFPZ6Yo1LlKjSJw==
WARNING: MD5 Sums don't match!
WARNING: Too many failures. Giving up on 'my-file.tar.zst'
s3cmd --version
s3cmd version 2.2.0

Maybe Cloudflare is missing API compatability? Docs looks ok to me

https://developers.cloudflare.com/r2/platform/s3-compatibility/api/#object-level-operations

Issue in Cloudflare Community forums: https://community.cloudflare.com/t/multi-part-uploads-from-s3cmd-broken/412143

@fviard
Copy link
Contributor

fviard commented Aug 22, 2022

Sadly, there is not so much we can do at the moment if Cloudflare does not fix their api.

I think that you can still use s3cmd with the following flag: "--no-check-md5". You will not have the md5 checked for "sync", but if you only use "put", that should not change too much.

Also, if you are willing to give a try to a hack to the source code of s3cmd, you can try to do something:
in S3/s3.py, replace all occurences of:
'-' not in md5_from_s3
by:
'-' not in md5_from_s3 and len(md5_from_s3) < 50

For example here:

s3cmd/S3/S3.py

Line 1844 in b7520e5

if ('-' not in md5_from_s3) and (md5_from_s3 != md5_hash.hexdigest()) and response["headers"].get("x-amz-server-side-encryption") != 'aws:kms':

Currently, in the code, we have some detection for ETAGs that are not "hash", and to overcome that we our own customer header. But we do that by detecting the character "-" inside the value. Because, on AWS, for multipart parts there will be a minus with the number of the part.

If the modification works, you can give a try asking Cloudflare if, by chance, they would not want to change their "ETAG" to one with a syntax that match what is expected.
Some possibilities:

  • PREFIX-.....current_etag.....
  • ....current_etag......-1
  • ....current_etag......-0

@vlovich
Copy link

vlovich commented Sep 21, 2022

Is this about the ETag for UploadPart or for the completed download? For UploadPart we're not going to be returning the MD5 and that's an intentional deviation. If that's the case, so far I've only heard of s3cmd having an issue. For completed multipart we return <hash>-<nparts> as the etag but "hash" is not the same as how S3 computes it.

@fviard
Copy link
Contributor

fviard commented Sep 21, 2022

@vlovich Yes, here we are speaking about the ETag of the UploadPart.
For the completed multipart upload, it is expected that the ETag might be different and particular for some providers.

But, for a single part upload, there is no reason that the ETag behavior would not be the same as for a simple non-multipart file.
As far as I know, Cloudflare is the single S3 (not-)compatible implementation that does not put the MD5 as ETag of one part.

We might try to detect that it is not a md5 based on the size, but it is a little bit sad to have to do a hack just because of Cloudflare implementation...

@anuaimi
Copy link

anuaimi commented Jun 23, 2023

anyone know if this issue has been fixed. I'm looking to upload large files to R2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants