Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
rclone fails to move small files to s3 buckets with default encryption enabled #1824
Scripts to reproduce this bug can be found here:
We wanted to make sure it wasn't tied to our particular environment, so the above scripts can be run on a fresh aws account from an admin IAM user (or just follow along and do the steps manually). Note that a very current botocore is required to run the scripts, as this is a rather new feature.
We noticed that once we enabled default encryption on s3 buckets, small files failed to move with rclone. Once a file is large enough to transfer via multipart uploads, the problem goes away. Note that the etag for uploaded files is not stable (this can be seen in the debug output for the 3 retries). Tested with current beta.
To reproduce, create an s3 bucket and a kms key. Enable default encryption on the bucket using the new key. Use rclone to move a small (less than 5MB) file to the s3 bucket.
Thanks for the writeup and discussion of the problem.
So in the s3 docs it states:
Now rclone assumes that all Etags are MD5 hashes. The reason why it works for multipart upload is that rclone knows that these Etags aren't MD5 hashes and ignores them.
If you use this flag
then I expect the uploads will work just fine.
This isn't an entirely satisfactory workaround though as rclone should be able to work out that the Etags aren't MD5SUMs..
Is there anything returned by amazon for rclone to know that the objects are encrypted? Or something rclone could query?
Alternatively rclone could take another config parameter which isn't ideal since some buckets may be encrypted and some not.
At minimum something in the docs about using
I've done the update the docs fix, however this could do with a better fix.
If there was some header rclone could check then rclone could automatically ignore the ETag for encrypted items.
If you fetch a small textual item with
Hi there... this issue is affecting me as well, so I've taken the liberty of dumping the headers you wanted to see. I hope this is helpful. I've sanitized the output a bit, but I think it's still useful for your purposes.
@wgrrrr thank you for that - very useful.
It looks like this one is the header
I've found the docs on that and it is a bit vague as to what the possible values can be!
I'm unsure I should skip the etag check if the
I managed to find some fairly definitive docs:
The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. Whether or not it is depends on how the object was created and how it is encrypted as described below:
So it looks like if there is no
So if I have the metadata for an object then I can determine whether
However reading the metadata needs a whole other transaction with S3 as it doesn't come in the ListObjects call.
So to implement this check it would need rclone to do a metadata read (a HEAD on the object) for all objects. At the moment it only does it for objects which were uploaded with multipart upload.
Can objects be individually encrypted or are all objects in a bucket encrypted? Maybe looking at the bucket metadata would be a better idea? I looked at the docs and it doesn't look that useful...
Another idea would be to make this a flag, say
Thanks for looking that up. Doing a metadata read for every checksum read will certainly slow down S3 for everyone and cost more transactions :-(
Maybe it will just have to be a flag for kms users. You'll be able to set this in the config shortly so it isn't too inconvenient.
What do you think?