-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rclone fails to move small files to s3 buckets with default encryption enabled #1824
Comments
Thanks for the writeup and discussion of the problem. So in the s3 docs it states:
Now rclone assumes that all Etags are MD5 hashes. The reason why it works for multipart upload is that rclone knows that these Etags aren't MD5 hashes and ignores them. If you use this flag
then I expect the uploads will work just fine. This isn't an entirely satisfactory workaround though as rclone should be able to work out that the Etags aren't MD5SUMs.. Is there anything returned by amazon for rclone to know that the objects are encrypted? Or something rclone could query? Alternatively rclone could take another config parameter which isn't ideal since some buckets may be encrypted and some not. At minimum something in the docs about using |
I've done the update the docs fix, however this could do with a better fix. If there was some header rclone could check then rclone could automatically ignore the ETag for encrypted items. If you fetch a small textual item with |
Hi there... this issue is affecting me as well, so I've taken the liberty of dumping the headers you wanted to see. I hope this is helpful. I've sanitized the output a bit, but I think it's still useful for your purposes.
|
@wgrrrr thank you for that - very useful. It looks like this one is the header I've found the docs on that and it is a bit vague as to what the possible values can be! I'm unsure I should skip the etag check if the I managed to find some fairly definitive docs: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. Whether or not it is depends on how the object was created and how it is encrypted as described below:
So it looks like if there is no |
This page seems to have some more detail on potential values of the According to that page, the values could be |
So if I have the metadata for an object then I can determine whether However reading the metadata needs a whole other transaction with S3 as it doesn't come in the ListObjects call. So to implement this check it would need rclone to do a metadata read (a HEAD on the object) for all objects. At the moment it only does it for objects which were uploaded with multipart upload. Can objects be individually encrypted or are all objects in a bucket encrypted? Maybe looking at the bucket metadata would be a better idea? I looked at the docs and it doesn't look that useful... Another idea would be to make this a flag, say |
S3 objects can be encrypted via bucket default encryption or via the REST API on a variety of operations. It would appear to me that each object will need to be evaluated individually to determine its encryption status. |
Thanks for looking that up. Doing a metadata read for every checksum read will certainly slow down S3 for everyone and cost more transactions :-( Maybe it will just have to be a flag for kms users. You'll be able to set this in the config shortly so it isn't too inconvenient. What do you think? |
I think what needs to be done is that if the config parameter Does anyone want to have a go at this? |
Hey @ncw, Does turning off the checksum verification with Seems like a quite important feature to lose. |
@ncw, Okay so that seems to be a thing. TCP packets are indeed verified for integrity. Assuming that HTTP is used for data transfer (rclone uses s3 REST API right?) there is protection from data corruption in transfer (HTTP sits on top of TCP). Then what does rclone use md5 hashes for? |
For doing whole file data integrity checks. |
Excuse my ignorance but I am not sure if I understood it properly.. if I use --ignore-checksum flag for my s3 sync job, rclone will NOT validate integrity of my files, is that right? In other words, I will NOT know if the files transferred properly and I won't be able to trust those backups, right? note: I use s3 with server-side encryption enabled. |
@manipulator01 with that flag yes, rclone won't do an md5 check. |
Hi all, any fix in the pipeline to handle this natively? At the moment I'm using
as a patch. |
If rclone is configured for server side encryption - either aws:kms or sse-c (but not sse-s3) then don't treat the ETags returned on objects as MD5 hashes. This fixes being able to upload small files. Fixes #1824
I've had a go at fixing this. Note that you will need Testing appreciated :-) v1.54.0-beta.4905.cbd93519c.fix-s3-sse on branch fix-s3-sse (uploaded in 15-30 mins) |
If rclone is configured for server side encryption - either aws:kms or sse-c (but not sse-s3) then don't treat the ETags returned on objects as MD5 hashes. This fixes being able to upload small files. Fixes #1824
Update which stored MD5 as metadata on SSE uploaded objects (like multipart uploads have) v1.54.0-beta.4907.78e3ba830.fix-s3-sse on branch fix-s3-sse (uploaded in 15-30 mins) |
I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.54 |
If rclone is configured for server side encryption - either aws:kms or sse-c (but not sse-s3) then don't treat the ETags returned on objects as MD5 hashes. This fixes being able to upload small files. Fixes rclone#1824
This enables us to set the md5 to cache it. See: rclone#1824 rclone#2827
Before this change, small objects uploaded with SSE-AWS/SSE-C would not have MD5 sums. This change adds metadata for these objects in the same way that the metadata is stored for multipart uploaded objects. See: rclone#1824 rclone#2827
If rclone is configured for server side encryption - either aws:kms or sse-c (but not sse-s3) then don't treat the ETags returned on objects as MD5 hashes. This fixes being able to upload small files. Fixes rclone#1824
This enables us to set the md5 to cache it. See: rclone#1824 rclone#2827
Before this change, small objects uploaded with SSE-AWS/SSE-C would not have MD5 sums. This change adds metadata for these objects in the same way that the metadata is stored for multipart uploaded objects. See: rclone#1824 rclone#2827
* Workaround can be replaced after rclone v1.54 release with option server_side_encryption: aws:kms * rclone/rclone#1824
* Workaround can be replaced after rclone v1.54 release with option server_side_encryption: aws:kms * rclone/rclone#1824
This still happens with version 1.56.1 and iDrive buckets with default encryption on. |
@evanthomas have you told rclone you are using server side encryption? See https://rclone.org/s3/#key-management-system-kms
|
My bad - that option is not exposed on the UI frontend I'm using. |
I am still facing this issue when I try to transfer file (Bucket is encyrpted with kms key ID) :
Rconfig is as shown below:
~ |
Scripts to reproduce this bug can be found here:
https://github.com/ccoakley/rclone-kms-s3-test
We wanted to make sure it wasn't tied to our particular environment, so the above scripts can be run on a fresh aws account from an admin IAM user (or just follow along and do the steps manually). Note that a very current botocore is required to run the scripts, as this is a rather new feature.
We noticed that once we enabled default encryption on s3 buckets, small files failed to move with rclone. Once a file is large enough to transfer via multipart uploads, the problem goes away. Note that the etag for uploaded files is not stable (this can be seen in the debug output for the 3 retries). Tested with current beta.
To reproduce, create an s3 bucket and a kms key. Enable default encryption on the bucket using the new key. Use rclone to move a small (less than 5MB) file to the s3 bucket.
rclone v1.38-095-g413faa99β
s3
rclone move
The text was updated successfully, but these errors were encountered: