Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to ignore size and checksum when copying to avoid errors - --ignore-checksum flag #793

Closed
paulocoghi opened this issue Oct 15, 2016 · 12 comments

Comments

Projects
None yet
3 participants
@paulocoghi
Copy link

commented Oct 15, 2016

See also #863 and #981

Description:
Copy from Google Cloud Storage to Minio S3 with Minio suggestion --size-only, since it doesn't support MD5 checksums (ETags) or metadata.

rclone copy -v --size-only google-core-staging:trustvox-core-staging-assets trustvox:trustvox-core-staging-assets

Result: corrupted on transfer: sizes differ

Although many files are copied successfully, many small files (like this 5KB one) do not copy, giving:

2016/10/15 01:58:12 assets/tours-8d9f2fb6ef33cda2651a9b90a64bd225.js: Read MimeType as "text/javascript"
2016/10/15 01:58:12 assets/tours-8d9f2fb6ef33cda2651a9b90a64bd225.js: corrupted on transfer: sizes differ 5120 vs 20563
2016/10/15 01:58:12 assets/tours-8d9f2fb6ef33cda2651a9b90a64bd225.js: Removing failed copy

Attempt 2: with --ignore-size and --ignore-size --checksum
Trying both cases, specially --checksum to avoid MD5.

rclone copy -v --ignore-size --checksum google-core-staging:trustvox-core-staging-assets trustvox:trustvox-core-staging-assets

Result: --checksum gives no effect, and MD5 is still used

2016/10/15 02:19:28 assets/tours-8d9f2fb6ef33cda2651a9b90a64bd225.js: corrupted on transfer: MD5 hash differ "e0ee57f91d539c82db12621db9337b65" vs "1571304f9e5d98af2d50f5c3902823c9"
2016/10/15 02:19:28 assets/tours-8d9f2fb6ef33cda2651a9b90a64bd225.js: Removing failed copy

My suggestion
Since currently it is not possible to use --size-only and --ignore-size together, I would suggest something like --ignore-size and --ignore-checksum, at least in copy.


Rclone version:

rclone v1.33-64-gbc414b6β

OS:

Ubuntu Server 16.04.1 (fully upgraded)

Storage Systems:

Origin: Google Cloud Storage
Dest: Minio (S3)

The command:

rclone copy -v --size-only google-core-staging:trustvox-core-staging-assets trustvox:trustvox-core-staging-assets

A log from the command with the -v flag:

2016/10/15 03:03:38 assets/widgets-f53a2d84b27ce6b9aba1eec0f2007559.js: Read MimeType as "text/javascript"
2016/10/15 03:03:38 assets/widgets-f53a2d84b27ce6b9aba1eec0f2007559.js: corrupted on transfer: sizes differ 6036 vs 17556
2016/10/15 03:03:38 assets/widgets-f53a2d84b27ce6b9aba1eec0f2007559.js: Removing failed copy
2016/10/15 03:03:38 Attempt 3/3 failed with 59 errors and: corrupted on transfer: sizes differ 6036 vs 17556
2016/10/15 03:03:38 Failed to copy: corrupted on transfer: sizes differ 6036 vs 17556

@paulocoghi paulocoghi changed the title Possibility to ignore size and checksum when copying to avoid "sizes differ" error Possibility to ignore size and checksum when copying to avoid errors Oct 15, 2016

@danzig666

This comment has been minimized.

Copy link

commented Oct 15, 2016

Perhaps implementing a --modtime-only flag would help too.

@ncw

This comment has been minimized.

Copy link
Owner

commented Oct 17, 2016

I would say you shouldn't be seeing the corrupted on transfer: sizes differ 6036 vs 17556 messages at all. It might be worth adding the flag --no-gzip-encoding to see if that makes a difference.

Can you copy the files to a local directory without the error message? And how about from the local directory to minio?

@paulocoghi

This comment has been minimized.

Copy link
Author

commented Oct 18, 2016

@ncw I will do this test right now

@paulocoghi

This comment has been minimized.

Copy link
Author

commented Oct 18, 2016

@ncw Same results (even when adding --no-gzip-encoding).

I will remove and add all remotes and test again.

@paulocoghi

This comment has been minimized.

Copy link
Author

commented Oct 18, 2016

I identified that only this bucket (assets) presents this error, and to my luck I pick exactly this one to start with.

All the others are going perfectly fine. 👍

@ncw

This comment has been minimized.

Copy link
Owner

commented Oct 25, 2016

I identified that only this bucket (assets) presents this error, and to my luck I pick exactly this one to start with.

Is there a problem with that bucket do you think? Or a problem with rclone?

@paulocoghi

This comment has been minimized.

Copy link
Author

commented Oct 25, 2016

Is there a problem with that bucket do you think? Or a problem with rclone?

There is a problem with this bucket, not with rclone.

The customer's app that populate this bucket, specifically, was pre-compressing the assets with gzip, in order to use Google Storage Transcoding feature.

@ncw ncw added the enhancement label Nov 3, 2016

@ncw

This comment has been minimized.

Copy link
Owner

commented Nov 3, 2016

Ah I see!

An --ignore-checksum command which disabled the post copy checksum check would fix this for you.

@ncw ncw changed the title Possibility to ignore size and checksum when copying to avoid errors Possibility to ignore size and checksum when copying to avoid errors - --ignore-checksum flag Nov 3, 2016

@paulocoghi paulocoghi closed this Nov 4, 2016

@ncw

This comment has been minimized.

Copy link
Owner

commented Nov 5, 2016

I'm going to open this again, so I actually implement the feature!

@ncw

This comment has been minimized.

Copy link
Owner

commented Jan 5, 2017

Can you try this which implements the --ignore-checksum command?

http://pub.rclone.org/v1.35-15-g82742f5-ignore-checksum%CE%B2/

@paulocoghi

This comment has been minimized.

Copy link
Author

commented Jan 6, 2017

The customer whose buckets I've migrated from Google Cloud no longer have them there (on Google).

But I'll certainly test your fix as soon as I do a similar migration!

@ncw

This comment has been minimized.

Copy link
Owner

commented Feb 3, 2017

I've merged this to master for the 1.36 release

Here is the beta - http://beta.rclone.org/v1.35-62-g9d331ce/ (uploaded in 15-30 mins)

@ncw ncw closed this in 9d331ce Feb 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.