Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rclone fails to move small files to s3 buckets with default encryption enabled #1824

Open
ccoakley opened this issue Nov 14, 2017 · 9 comments

Comments

@ccoakley
Copy link

commented Nov 14, 2017

Scripts to reproduce this bug can be found here:
https://github.com/ccoakley/rclone-kms-s3-test

We wanted to make sure it wasn't tied to our particular environment, so the above scripts can be run on a fresh aws account from an admin IAM user (or just follow along and do the steps manually). Note that a very current botocore is required to run the scripts, as this is a rather new feature.

We noticed that once we enabled default encryption on s3 buckets, small files failed to move with rclone. Once a file is large enough to transfer via multipart uploads, the problem goes away. Note that the etag for uploaded files is not stable (this can be seen in the debug output for the 3 retries). Tested with current beta.

To reproduce, create an s3 bucket and a kms key. Enable default encryption on the bucket using the new key. Use rclone to move a small (less than 5MB) file to the s3 bucket.

What is your rclone version (eg output from rclone -V)

rclone v1.38-095-g413faa99β

  • go version: go1.9.2

Which OS you are using and how many bits (eg Windows 7, 64 bit)

  • os/arch: darwin/amd64

Which cloud storage system are you using? (eg Google Drive)

s3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone move

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

2017/11/14 13:22:55 DEBUG : Using config file from "/Users/ccoakley/development/rclone_bug/rclone.conf"
2017/11/14 13:22:55 DEBUG : rclone: Version "v1.38-095-g413faa99β" starting with parameters ["rclone" "-vv" "--config" "./rclone.conf" "move" "copy_small.txt" "s3:bucket_ec041f5862e945e8a41126f7c6f9256f"]
2017/11/14 13:22:55 INFO  : S3 bucket bucket_ec041f5862e945e8a41126f7c6f9256f: Modify window is 1s
2017/11/14 13:22:55 DEBUG : .git: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : .gitignore: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : cleanup.sh: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : exercise_bug.sh: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : finish_setup.py: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : rclone.conf: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : README.md: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : requirements.txt: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : settings.sh: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : setup.sh: Excluded from sync (and deletion)
2017/11/14 13:22:55 DEBUG : venv: Excluded from sync (and deletion)
2017/11/14 13:22:55 INFO  : S3 bucket bucket_ec041f5862e945e8a41126f7c6f9256f: Waiting for checks to finish
2017/11/14 13:22:55 INFO  : S3 bucket bucket_ec041f5862e945e8a41126f7c6f9256f: Waiting for transfers to finish
2017/11/14 13:22:55 ERROR : copy_small.txt: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "9bfb495fdac16b31bde2a790d7d98326"
2017/11/14 13:22:55 INFO  : copy_small.txt: Removing failed copy
2017/11/14 13:22:56 ERROR : copy_small.txt: Not deleting source as copy failed: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "9bfb495fdac16b31bde2a790d7d98326"
2017/11/14 13:22:56 DEBUG : Local file system at /Users/ccoakley/development/rclone_bug: Removing directory
2017/11/14 13:22:56 DEBUG : Local file system at /Users/ccoakley/development/rclone_bug: Failed to Rmdir: remove /Users/ccoakley/development/rclone_bug: directory not empty
2017/11/14 13:22:56 ERROR : Attempt 1/3 failed with 1 errors and: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "9bfb495fdac16b31bde2a790d7d98326"
2017/11/14 13:22:56 DEBUG : .git: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : .gitignore: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : cleanup.sh: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : exercise_bug.sh: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : finish_setup.py: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : rclone.conf: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : README.md: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : requirements.txt: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : settings.sh: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : setup.sh: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : venv: Excluded from sync (and deletion)
2017/11/14 13:22:56 INFO  : S3 bucket bucket_ec041f5862e945e8a41126f7c6f9256f: Waiting for checks to finish
2017/11/14 13:22:56 INFO  : S3 bucket bucket_ec041f5862e945e8a41126f7c6f9256f: Waiting for transfers to finish
2017/11/14 13:22:56 ERROR : copy_small.txt: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "0721aafabcc75dfd2e8477d00bf556e2"
2017/11/14 13:22:56 INFO  : copy_small.txt: Removing failed copy
2017/11/14 13:22:56 ERROR : copy_small.txt: Not deleting source as copy failed: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "0721aafabcc75dfd2e8477d00bf556e2"
2017/11/14 13:22:56 DEBUG : Local file system at /Users/ccoakley/development/rclone_bug: Removing directory
2017/11/14 13:22:56 DEBUG : Local file system at /Users/ccoakley/development/rclone_bug: Failed to Rmdir: remove /Users/ccoakley/development/rclone_bug: directory not empty
2017/11/14 13:22:56 ERROR : Attempt 2/3 failed with 1 errors and: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "0721aafabcc75dfd2e8477d00bf556e2"
2017/11/14 13:22:56 DEBUG : .git: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : .gitignore: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : cleanup.sh: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : exercise_bug.sh: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : finish_setup.py: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : rclone.conf: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : README.md: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : requirements.txt: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : settings.sh: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : setup.sh: Excluded from sync (and deletion)
2017/11/14 13:22:56 DEBUG : venv: Excluded from sync (and deletion)
2017/11/14 13:22:56 INFO  : S3 bucket bucket_ec041f5862e945e8a41126f7c6f9256f: Waiting for checks to finish
2017/11/14 13:22:56 INFO  : S3 bucket bucket_ec041f5862e945e8a41126f7c6f9256f: Waiting for transfers to finish
2017/11/14 13:22:56 ERROR : copy_small.txt: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "05c0d3de94e16831b9922aba416018f4"
2017/11/14 13:22:56 INFO  : copy_small.txt: Removing failed copy
2017/11/14 13:22:57 ERROR : copy_small.txt: Not deleting source as copy failed: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "05c0d3de94e16831b9922aba416018f4"
2017/11/14 13:22:57 DEBUG : Local file system at /Users/ccoakley/development/rclone_bug: Removing directory
2017/11/14 13:22:57 DEBUG : Local file system at /Users/ccoakley/development/rclone_bug: Failed to Rmdir: remove /Users/ccoakley/development/rclone_bug: directory not empty
2017/11/14 13:22:57 ERROR : Attempt 3/3 failed with 1 errors and: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "05c0d3de94e16831b9922aba416018f4"
2017/11/14 13:22:57 Failed to move: corrupted on transfer: MD5 hash differ "d1a661a9218bd8c35ced78e3cc31db77" vs "05c0d3de94e16831b9922aba416018f4"
@ncw

This comment has been minimized.

Copy link
Collaborator

commented Nov 15, 2017

Thanks for the writeup and discussion of the problem.

So in the s3 docs it states:

Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-C or SSE-KMS, have ETags that are not an MD5 digest of their object data.

Now rclone assumes that all Etags are MD5 hashes. The reason why it works for multipart upload is that rclone knows that these Etags aren't MD5 hashes and ignores them.

If you use this flag

  --ignore-checksum                   Skip post copy check of checksums.

then I expect the uploads will work just fine.

This isn't an entirely satisfactory workaround though as rclone should be able to work out that the Etags aren't MD5SUMs..

Is there anything returned by amazon for rclone to know that the objects are encrypted? Or something rclone could query?

Alternatively rclone could take another config parameter which isn't ideal since some buckets may be encrypted and some not.

At minimum something in the docs about using --ignore-checksum when using KMS would be a good idea.

@ncw ncw added bug doc fix labels Nov 15, 2017

@ncw ncw added this to the v1.40 milestone Nov 15, 2017

ncw added a commit that referenced this issue Mar 17, 2018

@ncw

This comment has been minimized.

Copy link
Collaborator

commented Mar 17, 2018

I've done the update the docs fix, however this could do with a better fix.

If there was some header rclone could check then rclone could automatically ignore the ETag for encrypted items.

If you fetch a small textual item with rclone cat --ignore-checksum -vv --dump bodies s3:path/to/file.txt then rclone will dump the headers - can you paste them here?

@ncw ncw added Remote: S3 and removed doc fix labels Mar 17, 2018

@ncw ncw modified the milestones: v1.40, v1.41 Mar 17, 2018

@ncw ncw modified the milestones: v1.41, v1.42 Apr 28, 2018

@ncw ncw modified the milestones: v1.42, v1.43 Jun 17, 2018

@wgrrrr

This comment has been minimized.

Copy link

commented Jun 30, 2018

Hi there... this issue is affecting me as well, so I've taken the liberty of dumping the headers you wanted to see. I hope this is helpful. I've sanitized the output a bit, but I think it's still useful for your purposes.

2018/06/30 14:34:22 DEBUG : HTTP REQUEST (req 0xc420111800)
2018/06/30 14:34:22 DEBUG : HEAD /bucket-name/path/to/small-file.key HTTP/1.1
Host: s3.us-west-2.amazonaws.com
User-Agent: rclone/v1.42
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20180630T143422Z

2018/06/30 14:34:22 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2018/06/30 14:34:22 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2018/06/30 14:34:22 DEBUG : HTTP RESPONSE (req 0xc420111800)
2018/06/30 14:34:22 DEBUG : HTTP/1.1 200 OK
Content-Length: 1706
Accept-Ranges: bytes
Content-Type:
Date: Sat, 30 Jun 2018 14:34:23 GMT
Etag: "96c9b2988c9c388935236b5a719374c7"
Last-Modified: Fri, 29 Jun 2018 20:04:29 GMT
Server: AmazonS3
X-Amz-Id-2: AFgUQcJa2OlXhGNijiHPdE3Jzk8O0bL4mkQQnNANB3l4bh4hKO7jPfWZ2ZPTyY1O+kPZKfBUmMg=
X-Amz-Request-Id: D841ED0CDDE41FA2
X-Amz-Server-Side-Encryption: aws:kms
X-Amz-Server-Side-Encryption-Aws-Kms-Key-Id: arn:aws:kms:region:account-id:key/key-uuid

2018/06/30 14:34:22 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2018/06/30 14:34:22 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2018/06/30 14:34:22 DEBUG : HTTP REQUEST (req 0xc4201f6400)
2018/06/30 14:34:22 DEBUG : GET /bucket-name?delimiter=%2F&max-keys=1000&prefix=path%2Fto%2F HTTP/1.1
Host: s3.us-west-2.amazonaws.com
User-Agent: rclone/v1.42
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20180630T143422Z
Accept-Encoding: gzip

2018/06/30 14:34:22 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2018/06/30 14:34:22 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2018/06/30 14:34:22 DEBUG : HTTP RESPONSE (req 0xc4201f6400)
2018/06/30 14:34:22 DEBUG : HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Sat, 30 Jun 2018 14:34:23 GMT
Server: AmazonS3
X-Amz-Bucket-Region: region
X-Amz-Id-2: Cn87BtiG8KsRqdKb7tLA9IG3IRc8YS8KL0soP1QmixjZnGNB/dGW3/oTEByekAGOTdt8zW65pXg=
X-Amz-Request-Id: 7744C35851D3CAB6

334
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>bucket-name</Name><Prefix>path/to/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><Delimiter>/</Delimiter><IsTruncated>false</IsTruncated><Contents><Key>path/to/small-file.crt</Key><LastModified>2018-06-29T20:04:29.000Z</LastModified><ETag>&quot;9eb9691c8a3f0d803e6ae2b729680687&quot;</ETag><Size>6478</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>path/to/small-file.key</Key><LastModified>2018-06-29T20:04:29.000Z</LastModified><ETag>&quot;96c9b2988c9c388935236b5a719374c7&quot;</ETag><Size>1706</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
0

2018/06/30 14:34:22 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2018/06/30 14:34:22 DEBUG : small-file.crt: Excluded from sync (and deletion)
2018/06/30 14:34:22 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2018/06/30 14:34:22 DEBUG : HTTP REQUEST (req 0xc4202b4100)
2018/06/30 14:34:22 DEBUG : GET /bucket-name/path/to/small-file.key HTTP/1.1
Host: s3.us-west-2.amazonaws.com
User-Agent: rclone/v1.42
Authorization: XXXX
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20180630T143422Z
Accept-Encoding: gzip

2018/06/30 14:34:22 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2018/06/30 14:34:22 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2018/06/30 14:34:22 DEBUG : HTTP RESPONSE (req 0xc4202b4100)
2018/06/30 14:34:22 DEBUG : HTTP/1.1 200 OK
Content-Length: 1706
Accept-Ranges: bytes
Content-Type:
Date: Sat, 30 Jun 2018 14:34:23 GMT
Etag: "96c9b2988c9c388935236b5a719374c7"
Last-Modified: Fri, 29 Jun 2018 20:04:29 GMT
Server: AmazonS3
X-Amz-Id-2: WaGbSgU30kBSnjgS+vWXfDVRNJbmczXIt6n5vHe/neDesNst4h3H1Ur5W08TqutLyWOQPmutoZE=
X-Amz-Request-Id: 1BFC4C04D0D894B1
X-Amz-Server-Side-Encryption: aws:kms
X-Amz-Server-Side-Encryption-Aws-Kms-Key-Id: arn:aws:kms:region:account-id:key/key-uuid
@ncw

This comment has been minimized.

Copy link
Collaborator

commented Jun 30, 2018

@wgrrrr thank you for that - very useful.

It looks like this one is the header X-Amz-Server-Side-Encryption: aws:kms

I've found the docs on that and it is a bit vague as to what the possible values can be!

I'm unsure I should skip the etag check if the X-Amz-Server-Side-Encryption header is present, or only if it is set to aws:kms..

I managed to find some fairly definitive docs:

https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. Whether or not it is depends on how the object was created and how it is encrypted as described below:

  • Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-S3 or plaintext, have ETags that are an MD5 digest of their object data.
  • Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-C or SSE-KMS, have ETags that are not an MD5 digest of their object data.
  • If an object is created by either the Multipart Upload or Part Copy operation, the ETag is not an MD5 digest, regardless of the method of encryption.

So it looks like if there is no X-Amz-Server-Side-Encryption or if it matches sse-s3 then the Etag is an MD5. I'm not quite sure what amazon will put in that header though for sse-s3 any ideas?

@wgrrrr

This comment has been minimized.

Copy link

commented Jun 30, 2018

This page seems to have some more detail on potential values of the X-Amz-Server-Side-Encryption header.

According to that page, the values could be NULL (i.e. no value), AES256 or aws:kms.

@ncw

This comment has been minimized.

Copy link
Collaborator

commented Jun 30, 2018

So if I have the metadata for an object then I can determine whether X-Amz-Server-Side-Encryption is present and if it not present or has the value AES256 then the Etag is invalid.

However reading the metadata needs a whole other transaction with S3 as it doesn't come in the ListObjects call.

So to implement this check it would need rclone to do a metadata read (a HEAD on the object) for all objects. At the moment it only does it for objects which were uploaded with multipart upload.

Can objects be individually encrypted or are all objects in a bucket encrypted? Maybe looking at the bucket metadata would be a better idea? I looked at the docs and it doesn't look that useful...

Another idea would be to make this a flag, say --s3-use-md5-metadata which would tell rclone to make extra md5 metadata for each object and only use that metadata and not the ETag header.

@wgrrrr

This comment has been minimized.

Copy link

commented Jul 1, 2018

S3 objects can be encrypted via bucket default encryption or via the REST API on a variety of operations.

It would appear to me that each object will need to be evaluated individually to determine its encryption status.

@ncw

This comment has been minimized.

Copy link
Collaborator

commented Jul 2, 2018

Thanks for looking that up. Doing a metadata read for every checksum read will certainly slow down S3 for everyone and cost more transactions :-(

Maybe it will just have to be a flag for kms users. You'll be able to set this in the config shortly so it isn't too inconvenient.

What do you think?

@ncw

This comment has been minimized.

Copy link
Collaborator

commented Sep 2, 2018

I think what needs to be done is that if the config parameter server_side_encryption is set to aws:kms the rclone should just be ignoring the Etag and only using the one rclone provides.

Does anyone want to have a go at this?

@ncw ncw modified the milestones: v1.43, v1.44 Sep 2, 2018

@ncw ncw modified the milestones: v1.44, v1.45 Oct 19, 2018

@ncw ncw modified the milestones: v1.45, v1.46 Nov 24, 2018

@ncw ncw modified the milestones: v1.46, v1.47 Feb 9, 2019

@ncw ncw removed this from the v1.47 milestone Apr 15, 2019

@ncw ncw added this to the v1.48 milestone Apr 15, 2019

@ncw ncw modified the milestones: v1.48, v1.49 Jun 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.