Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mc mirror overwrite currently broken #3060

Closed
sebschlue opened this issue Jan 28, 2020 · 12 comments
Closed

mc mirror overwrite currently broken #3060

sebschlue opened this issue Jan 28, 2020 · 12 comments

Comments

@sebschlue
Copy link

Expected behavior

mc mirror --overwrite should detect changed files

Actual behavior

It seems, it currently doesn't

Steps to reproduce the behavior

$ mc mb myminio/mybucket 
Bucket created successfully `myminio/mybucket`.

$ echo one > testdir/testfile.txt

$ cat testdir/testfile.txt 
one

$ mc mirror --overwrite testdir myminio/mybucket 
...estfile.txt:  4 B / 4 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 227 B/s 0s

$ mc cat myminio/mybucket/testfile.txt
one

$ echo two > testdir/testfile.txt

$ cat testdir/testfile.txt 
two

$ mc mirror --overwrite testdir myminio/mybucket 
 0 B / ? ┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓┃ 0s

$ mc cat myminio/mybucket/testfile.txt
one

mc --version

mc version RELEASE.2020-01-25T03-02-19Z

System information

Client and Server: Fedora 31 with XFS as filesystem
minio version 2020-01-25T02:50:51Z

@sebschlue
Copy link
Author

With --overwrite and --preserve:

$ mc mb myminio/mybucket
Bucket created successfully `myminio/mybucket`.

$ echo one > testdir/testfile.txt

$ cat testdir/testfile.txt 
one

$ mc mirror --overwrite --preserve testdir myminio/mybucket 
...estfile.txt:  4 B / 4 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 283 B/s 0s

$ mc cat myminio/mybucket/testfile.txt
one

$ echo two > testdir/testfile.txt

$ cat testdir/testfile.txt 
two

$ mc mirror --overwrite --preserve testdir myminio/mybucket 
 0 B / ? ┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓┃ 0s

$ mc cat myminio/mybucket/testfile.txt
one

@vadmeste
Copy link
Member

@sebschlue this is actually known & expected. mc mirror does not detect changes in a file if its size does not change, like one & two has the same length.

@klauspost
Copy link
Contributor

@vadmeste What is the limitation that causes this? It seems inconvenient at best.

@vadmeste
Copy link
Member

@vadmeste What is the limitation that causes this? It seems inconvenient at best.

No checksum stored in the server's side (ETag is not equal to the md5sum of the object in some cases)

@sebschlue
Copy link
Author

At Slack channel, some confirmed that it should work when using --preserve

@seqizz
Copy link

seqizz commented Apr 20, 2020

@vadmeste What is the limitation that causes this? It seems inconvenient at best.

No checksum stored in the server's side (ETag is not equal to the md5sum of the object in some cases)

Ouch. That means for snapshotting certain stuff we'd need to rely on rsync.
Is there a way to append/change some harmless metadata which is checked to force this? Or ensure etag is equal to hash?

@harshavardhana
Copy link
Member

harshavardhana commented Apr 20, 2020

No checksum stored in the server's side (ETag is not equal to the md5sum of the object in some cases)

Ouch. That means for snapshotting certain stuff we'd need to rely on rsync.
Is there a way to append/change some harmless metadata which is checked to force this? Or ensure etag is equal to hash?

For that use rclone @seqizz which calculates checksum of entire content - ETag is not md5sum not always see SSE-C, Multipart etc - and md5sum is not reliable many objects out there can simply match the same md5sum - https://www.mscs.dal.ca/~selinger/md5collision/ and its quite common apparently at scale.

Unless of course we can calculate checksum of entire objects using techniques like blake2b - we need to calculate this before uploading the content, slowing this down significantly which you are going to upload.

rsync is meant for local disk to remote disk using delta protocol which reads both ends for checksum this would be unexpected in case of object storage, due to cloud costs.

@seqizz
Copy link

seqizz commented Apr 20, 2020

Ah, of course, I am just free-shooting since currently not bound by "cloud traffic costs" :) I'll check the rclone. Thanks.

Just curious, would it even be possible to add another header like etag but containing hash for minio (on create/modify), without breaking compatibility?

@harshavardhana
Copy link
Member

Just curious, would it even be possible to add another header like etag but containing hash for minio (on create/modify), without breaking compatibility?

It is definitely possible @seqizz it is going to be very mc specific, meaning we have no control over your storage backend anyways, so any state change there wouldn't be properly understood by mc.

this can lead to double copy etc like issues, it is left away on purpose as we couldn't figure out cost effective way to do it proprely for all generalized usecases.

@sebschlue
Copy link
Author

Can this issue be closed, then?

@seqizz
Copy link

seqizz commented Jun 10, 2020

IMHO this needs to be documented more clearly, preferably in the mirror section of mc documentation directly.
But yeah if this is how minio works, doesn't sound like a bug. 👍

@stale
Copy link

stale bot commented Sep 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 9, 2020
@stale stale bot closed this as completed Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants