Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bisync: optimize --resync #5681

Closed
ivandeex opened this issue Oct 8, 2021 · 15 comments · Fixed by #7529
Closed

bisync: optimize --resync #5681

ivandeex opened this issue Oct 8, 2021 · 15 comments · Fixed by #7529

Comments

@ivandeex
Copy link
Member

ivandeex commented Oct 8, 2021

Synposis

Clearly explain in documentation whether --resync is asymmetrical.
(@cjnaz could you comment?)

Investigate whether --resync can be re-implemented by the equivalent of:

rclone copy --update Path1 Path2
rclone copy --update Path2 Path1

Prior discussions

optimize --resync

cjnaz/rclonesync-V2#66 (eric-void) -> (cjnaz)

looking at the code there is a lot of work to copy Path2 files not in Path1, and then do an "rclone sync" from Path1 to Path2. Why don't you just do and "rclone copy" from Path1 to Path2, and then an "rclone copy" from Path2 to Path1?
This is much simpler, there is no need to read 2 times LSL files and to loop for missing Path2 files, and the result should be the same. Am i missing something?

The first-sync optimization is deferred for now since its not broke and should be rarely run.

#5164 (comment) (ncw)

So --resync is asymetrical?

Perhaps it would be better implemented by the equivalent of

rclone copy --update Path1 Path2
rclone copy --update Path2 Path1

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@ivandeex ivandeex self-assigned this Oct 8, 2021
@ivandeex ivandeex added this to To do in bisync Oct 8, 2021
@cjnaz
Copy link
Contributor

cjnaz commented Oct 8, 2021

I think the core question here is whether rclonesync's asymmetrical --first-sync solves some subtle problem. I think the answer is no, none that I can recall. When I implemented the current --first-sync method I think I was too embedded in the new algorithm (building file lists for raw copies/deletes) that I missed the simpler picture. I'd say go for this method...

rclone copy --update Path1 Path2
rclone copy --update Path2 Path1

@ivandeex
Copy link
Member Author

ivandeex commented Oct 8, 2021

Unit tests and documentation will be updated respectively.

nielash added a commit to nielash/rclone that referenced this issue Apr 23, 2023
…and equality check

bisync: bug fixes and new features including --create-empty-src-dirs and equality check

* Fixed an issue causing dry runs to inadvertently commit filter changes
* Fixed an issue causing --resync to erroneously delete empty folders and duplicate files unique to Path2
* --check-access is now enforced during --resync, preventing data loss in certain user error scenarios
* Fixed an issue causing bisync to consider more files than necessary due to overbroad filters during delete operations
* Improved detection of false positive change conflicts (identical files are now left alone instead of renamed)
* Added support for --create-empty-src-dirs
* Added experimental --resilient mode to allow recovery from self-correctable errors
* Added new --ignore-listing-checksum flag to distinguish from --ignore-checksum
* Performance improvements for large remotes
* Documentation and testing improvements

Fixes rclone#6109

Also addresses: rclone#6841 rclone#5683 rclone#5681 rclone#5676 rclone#5675 rclone#5674

See also: https://forum.rclone.org/t/bisync-bugs-and-feature-requests/37636
@nielash
Copy link
Collaborator

nielash commented Aug 21, 2023

I had the same thought recently:

// This preserves the original resync order for backward compatibility. It is essentially:
// rclone copy Path2 Path1 --ignore-existing
// rclone copy Path1 Path2 --create-empty-src-dirs
// rclone copy Path2 Path1 --create-empty-src-dirs
// although if we were starting from scratch, it might be cleaner and faster to just do:
// rclone copy Path2 Path1 --create-empty-src-dirs
// rclone copy Path1 Path2 --create-empty-src-dirs

I tend to think it would be better to change it, as in addition to being simpler and more symmetrical, it would also make --resync substantially faster for users of --create-empty-src-dirs. However, it would be a breaking change, and I'm not sure how concerned we should be about that. It's possible that some users have come to depend on the asymmetry in some way. For instance, imagine a scenario where you're bisyncing one central server with multiple devices, and the server is the source-of-truth. The asymmetry allows you to declare that the server should take priority in the event of a --resync, even if the corresponding file on the other path is newer.

Perhaps the best compromise is to make the behavior configurable with a flag?

@crocinsocks
Copy link

@nielash I've moved my comments over from #7332 regarding bisync..

I agree with both @ivandeex @cjnaz that the correct implementation for bisync should be:

rclone copy --update Path1 Path2 (--create-empty-src-dirs)
rclone copy --update Path2 Path1 (--create-empty-src-dirs)

This is what would happen if I was using Dropbox, OneDrive, GDrive, etc (e.g. the newest file will always survive on init and where the local directory already has data).

As bisync is still experimental I seem no harm in changing this to be the default behavior. Also, I believe this would be inline with the expectations of any new users of bisync and would also benefit existing users.

This also makes a --resync far less destructive. Currently, Path1 will always take precedent which is the last thing I believe anyone would want from a bisync (existing or new users) rather they are stuck with the current implementation. If there are current users that would like this functionality perhaps either a --legacy flag or simply they can run a sync manually to have the desired effect based of your comments:

rclone copy Path2 Path1 --ignore-existing 
rclone copy Path1 Path2 (--create-empty-src-dirs)
rclone copy Path2 Path1 (--create-empty-src-dirs) 

Thanks 🙏

@nielash
Copy link
Collaborator

nielash commented Sep 30, 2023

I agree with both @ivandeex @cjnaz that the correct implementation for bisync should be:

rclone copy --update Path1 Path2 (--create-empty-src-dirs)
rclone copy --update Path2 Path1 (--create-empty-src-dirs)

I generally agree as well, although I have a slight concern about the use of --update for the same reasons articulated in our other thread, about the potential dangers of picking winners and losers at the file level instead of the filesystem level. But I think this could be solved by just letting the user choose their desired behavior with a flag.

This also makes a --resync far less destructive. Currently, Path1 will always take precedent which is the last thing I believe anyone would want from a bisync

I can think of a few reasons a user might want the asymmetry -- I described one of them above. Imagine, for example, that I pull my dusty old laptop out of the closet which hasn't been synced in a long time, and I want to bring it up to date with my Google Drive which I know is totally up-to-date. In this case, I do in fact want the trusted drive side to take precedence over the untrusted local side -- even if certain local files happen to be newer (intentional reversions do sometimes happen -- maybe I restored something from a backup recently on the drive side, and now it's about to get overwritten.) An asymmetrical --resync lets you declare one side the source-of-truth; while --update prioritizes the newness of individual files, possibly at the expense of consistency at the directory-level.

Another use case (which I've used myself) is safer preservation of metadata that rclone doesn't currently support, such as directory metadata, macOS xattrs, and permissions. An asymmetrical sync allows you to avoid overwriting that stuff on the trusted side, when possible.

Additionally, it should be noted that bisync's current approximation of --ignore-existing does have a performance benefit, because in 1 of the 2 directions it can skip the modtime (or checksum) check for some of the files. (But that performance benefit is negated if using --create-empty-src-dirs, which requires a third operation -- unless we change that.)

I tend to think that rclone's default (which is still slightly asymmetrical) should be bisync's default:

rclone copy Path2 Path1 (--create-empty-src-dirs)
rclone copy Path1 Path2 (--create-empty-src-dirs)

but that this should be overridable with flags for users who think this is madness 😄

@ncw I think this is ultimately a question for you -- what is your guiding principle for breaking changes such as this, where it's not exactly a bug, but there does seem to be consensus that a different way would be better? Is it ok to change the behavior, provided it's documented clearly? (And is the calculation different because bisync is still in beta?)

@crocinsocks
Copy link

crocinsocks commented Sep 30, 2023

The problem here is you are using the wrong tool. Bisync should be bisync not sometimes bisync. Let me explain.

For the below standard means: Dropbox, OneDrive, GDrive, Unison, etc...

  • Prior State Available

    • standard - run as normal
    • rclone - run as normal
  • Prior State NOT Available

    • standard - remove dir and run
    • rclone
      • remove dir and run --resync
      • rclone sync (yes sync not bisync) then run --resync

So in your example you have two options, remove the data or run a rclone sync prior.

With regards to directory metadata there is already a ticket raised #6685 to handle this and again would not conflate these two issues as this will be solved once metadata is added to the directory where possible or where not the use of a .dirmeta file or similar. Also, using rclone sync as above would still keep existing metadata.

As we are talking about a --resync here I do not think a slight increase in time is a matter of contention.

Ultimately, there should be one way for bisync to work which is using --update flag as that would make it bidirectional and inline with expectations e.g. the same as every other bidirectional syncing tool. If someone wants to do another type of sync they should use copy or sync. Don't take this the wrong way I mean well but in my opinion the general approach by other tools seems logical, well thought out and have yet so see any good reasons to change this.

Maybe @ncw could give his input on this matter too alongside confirming whether such a change can take place.

@nielash
Copy link
Collaborator

nielash commented Oct 1, 2023

For the below standard means: Dropbox, OneDrive, GDrive, Unison, etc...

I think it is an oversimplification to say that there is one "standard" way of doing this. For example, Unison has the -prefer xxx flag which allows root, newer, and older.

Also, the "standard - remove dir and run" that you cite for Dropbox and GDrive would be more equivalent to:

rclone sync remote local (--create-empty-src-dirs)

which is still different from the copy --update that you are suggesting:

rclone copy --update Path1 Path2 (--create-empty-src-dirs)
rclone copy --update Path2 Path1 (--create-empty-src-dirs)

So it actually seems that you are suggesting a different default than what the above tools use (closest would be Unison with -prefer newer, which is not the default.) (And FWIW: I think a different default is ok, if you can make the case that yours is better.)

You are correct that in my example about the laptop, I could have first run rclone sync drive local to make the subsequent --resync give the desired result. I think the question is: should I have to? (Maybe the answer is yes -- I'm ok with that. I'm just clarifying what the question is.) Alternatively, it would be easy to redesign --resync such that --resync --update produces your desired result (symmetrical) while keeping the default asymmetrical.

Fair points about the metadata and speed.

Bisync should be bisync not sometimes bisync.

--resync is already a departure from the normal bisync rules (for example: --resync never deletes), so I think bisync is already "sometimes bisync" in a sense. It's a design choice that bisync made at the very beginning (and I personally think it's a very good one.)

Ultimately, there should be one way for bisync to work which is using --update flag

I'm generally wary of forcing a specific flag on users without giving them a way to change it. If we go with --update as the default for --resync, I think we should at least offer a --resync --prefer Path1 option. (If ncw thinks it is changeable at all, at this point.)

@crocinsocks
Copy link

I disagree there is definitely a default way of doing bidirectional file sync and it is symmetrical based on the tools that most people use day-to-day and in which sys admins maintain e.g. the main three Dropbox, Google Drive, OneDrive then others such as Amazon Drive, NextCloud, Seafile, OwnCloud, etc..

Though you are correct I was wrong to put Unison in this category; it's been a while since I last used. Though it is hard to compare as --prefer is a conflict flag and Unison does not use --resync so this means Unison is always asymmetrical by default unless using --prefer newer which makes it always symmetrical. I must also add that Unison does not have at it's disposal copy and sync as with rclone so options to change this behavior make a lot more sense. Further Unison does have a --force flag which is arguably more what we are talking about here and where I say rclone has this in copy and sync.

Also, the "standard - remove dir and run" that you cite for Dropbox and GDrive would be more equivalent to:

This is untrue. Whilst it may give the illusion of a sync command because you are starting with an empty directory in essence you are still doing a copy --update. If I were to add files or change existing files the newest would be kept.

But then there is the argument why use --resync at all? 😂 If the tool is always symmetrical (or vice versa) then there seems to be little need. Maybe this is the missing flag? And --resync should be removed all together?

So to conclude my original point still stands bidirectional sync should be symmetrical by default if the goal is to be inline with common understanding. Further, I think it's extremely bad practice for a tool to be by default asymmetrical then symmetrical. Unison by default is asymmetrical at all times not a combination of the two.

@nielash
Copy link
Collaborator

nielash commented Oct 3, 2023

@crocinsocks I took some time today refreshing my memory on how Dropbox handles this scenario. It is documented here: Why does Dropbox need to “sync my files again”?, and it matches the behavior I observed when I tried it myself. In relevant part:

To keep your files safe during this process, Dropbox renames your Dropbox folder in File Explorer (Windows) or Finder (Mac) from “Dropbox ([account name])” to “Dropbox (Old).” Then, Dropbox creates a new folder in the same location on your computer and syncs your files there as well.

How does this work?
The result is two folders. This process is just a precaution to ensure that copies of your files are safe on your computer while your files are re-syncing. It’s recommended that you don’t touch the “Dropbox (Old)” folder while this process is happening.

So, it looks like what Dropbox actually does is equivalent to this:

rclone move "Dropbox" "Dropbox (Old)"
rclone sync dropbox: "Dropbox" --copy-dest "Dropbox (Old)"
[rclone bisync dropbox: "Dropbox"]

The --copy-dest part is so that it doesn't have to re-download any files that are already on your computer with a hash that matches the cloud version. Hence the instruction "don’t touch the “Dropbox (Old)” folder while this process is happening".

In other words, Dropbox's equivalent to --resync is totally asymmetrical, in favor of the cloud side. It does not attempt to merge or reconcile the two sides in any way -- it simply moves the local side to "Dropbox (Old)", where you can then do whatever you want to with it.

I also tried signing out and doing a fresh install to a location that already had existing files, to see what it would do. Every time, it refused with the following error messages:

Dropbox Error1 Dropbox Error2

So, not only did it not give precedence to the newer file on my machine, it also didn't even keep files that were only on my machine.

I don't doubt that Dropbox does behave symmetrically after it is fully set-up (as does bisync), but as we are only discussing --resync here, I am only looking at its equivalent of --resync. Dropbox did not give both sides equal weight -- it always considered the cloud side to be superior (which I guess kind of makes sense, since it is first and foremost a cloud storage service.) This is much closer to the "remove dir and run" that you described earlier, not copy --update. If you think I've missed something in my test, please feel free to show me a test of your own where you are able to get dropbox to overwrite a cloud-side file with a local-side file during a resync scenario (not just during normal everyday use.)

I also continue to think that what Dropbox does is a little bit beside the point, because we are not obligated to follow exactly what Dropbox does. I use rclone and bisync instead of alternative tools because I like how it works better 🙂

But then there is the argument why use --resync at all? 😂 If the tool is always symmetrical (or vice versa) then there seems to be little need. Maybe this is the missing flag? And --resync should be removed all together?

To back up just slightly: --resync performs 3 different important roles for bisync:

  1. It brings disagreeing sides into agreement on the first sync, so that each subsequent "normal" (non-resync) bisync run can assume that both sides matched after the prior run
  2. It provides a mechanism of ensuring user permission to proceed with destructive changes (such as changes to the --filters-file)
  3. It provides a way of regenerating a trusted state when the prior state is unknown or untrusted (for example, after a critical error during a bisync run, or a --check-sync failure.)

There are various bisync alternatives out there that don't have a --resync equivalent, but I find that they all take unacceptable (in my opinion) shortcuts on at least one of these points, in exchange. Usually, it is a tradeoff of safety for convenience. Bidirectional sync can never truly be stateless, so any tool that does it without knowing the prior state is guessing in some way. Often that guess is right so much of the time that the convenience tradeoff is worth it -- but it is still a guess. A particularly common strategy is "if we don't know or don't trust the prior state, then we consider everything to be new".

Some tools also have some method of making an educated guess to resolve conflicts between two file versions (such as guessing that the version with the newest modtime is "best"), but a file conflict is not actually required for trouble to arise. Consider the following thought experiment: you're a bidirectional sync tool and you come across a folder you've never seen before. On Side1, it contains files a and b. On Side2, it contains only a. Which of these statements describes the most recent action?:

  • b was added to Side1
  • b was deleted from Side2

The answer is that you can't know this with certainty -- you can only guess. Most tools that engage in this type of guessing will guess that b was added to Side1, so as to err on the side of preserving rather than deleting a file. But this guess could indeed be wrong -- perhaps the user wanted b deleted, and now it has suddenly reappeared?

Bisync, on the other hand, just refuses to guess in such a scenario. It instead requires the user to make the decision, by refusing to run again until the user manually runs --resync. This is why it's a tradeoff of convenience for safety. It is inconvenient to the user to have to run --resync, but it is the safer option, because the user has assurance that bisync will never make guesses when it's less than 100% sure.

Personally, I love this about bisync, and it's the only reason I feel comfortable trusting my important files to it. The day that --resync is removed from bisync is probably the day that I stop using bisync.

All of that said, I recognize that different users have different use cases and preferences, and some users may prefer convenience over safety. This could be sensible if, for example, you are just using bisync to sync your movie collection, which you could just re-download from the internet should you ever need to. This user's preference and standard for perfection will naturally be different than the user bisyncing the only surviving photos of his grandmother, or the novel he's been working on for 5 years, or other priceless data. (Yes, they should be making backups too, but the point still stands.) So, while I think bisync should choose "safe" by default, I would not be opposed to adding some optional flags to enable more of the "convenient" behavior. It actually would not be that difficult to add a -force newer mode that essentially never needs a --resync, and I'm not opposed to doing so. But I personally would never use it. And I'd want to make sure the docs include a big warning like the one Unison includes:

This preference should be used only if you are sure you know what you are doing!

There are also some tools that get around this issue by watching filesystem events -- something that rclone cannot yet do for the local backend. And even for such apps, they can still end up missing events that happen while they're not running, and so end up having to still do some small amount of "guessing".

So to conclude my original point still stands bidirectional sync should be symmetrical by default if the goal is to be inline with common understanding.

I think the goal is to provide the best bidirectional sync tool, and the one that is most consistent with rclone's design principles more generally -- and then make sure that any departures from "common understanding" are clearly articulated in the docs. (I also don't think that there is one standard "common understanding" on this particular issue. As I've pointed out, neither Dropbox nor Unison actually use something like copy --update for their default in a situation equivalent to --resync.)

Further, I think it's extremely bad practice for a tool to be by default asymmetrical then symmetrical.

I think we just see this one differently. I love that bisync has two different modes for different purposes -- I think it's one of the genius things about its design, and part of why I prefer it to other bidirectional sync tools. I also think that Dropbox, as far as I can tell, is first asymmetrical and then symmetrical after that. I see no problem with that choice, as long as it's documented clearly.

I must also add that Unison does not have at it's disposal copy and sync as with rclone so options to change this behavior make a lot more sense.

If I'm understanding correctly, I think what you are saying is that because rclone copy exists, therefore I should have to run it as an extra step before --resync if I want that behavior. But wouldn't the reverse argument be just as true? i.e.: because rclone copy --update exists, therefore you should have to run it as an extra step before --resync if you want that behavior? (And to be clear -- that's not what I'm proposing. What I'm proposing is that --update should be supported in --resync as an optional non-default flag, the same as it is in rclone sync. I also remain open to making it the default, as long as there is a way to disable it, such as --prefer Path1.)

To summarize, I think what the decision comes down to is:

  • --update should be default if symmetry is the most important factor
  • --update should be non-default if consistency with other rclone commands is the most important factor
  • --update is best if your priority is newness at the individual file level
  • No --update is best if your priority is newness or atomicity at the filesystem level, or if one whole side is more trusted than the other
  • Either result could be achieved by running a combination of different rclone commands first, before running --resync
  • Both choices are defensible if documented clearly to users
  • IMO, whichever one is chosen as the default, the other should be supported with a flag. (Assuming the default is changeable at all, at this point.)

@crocinsocks
Copy link

crocinsocks commented Oct 3, 2023

Dropbox (+ other sync tools)

Hi @nielash, you are missing why Dropbox does this in the first place. Dropbox has created this feature as an add-on for specifically macos and windows user as they consider these users to be less tech knowledgeable which in general is true. This feature has been added to prevent someone who does not understand Dropbox is always symmetrical and that when syncing an old dropbox folder stuff that may have been deleted will be re-added and any changes where future changes have been made e.g. was changed but not the newest, will be lost. As you can see this does not apply to Linux as they expect power users to understand this. There is a simple workaround which many poweruser and sysadmins use on windows and mac with all these type of sync clients as many make the same assumptions as Dropbox. That is rename the folder or choose another location for "live" and then move the files from "old" into the "live" dropbox folder. As you can see, the newer files from the "old" folder or newer files from the cloud will always be chosen and unchanged files left untouched. This is because dropbox (init) sync is equivalent to copy --update. Also, if you were to follow dropbox procedure you could still edit the "live" dropbox folder with new and existing (e.g. made at the same path) files so it's definitely not a sync command. Hopefully, that clears up why Dropbox does this and that the (init) sync of their client is always copy --update (symmetrical) as with all others in this category. If someone was to restore a folder that for example is older than a few days this could create too many unwanted items and not because the sync asymmetrical.

Resync

Let's take the most common use case for bisync (unless you disagree?). Multiple devices/users are working on a central drive; without knowing prior state one has to assume the central drive is always correct unless you have a "new" / newer file(s) locally. Of course, for bisync to make more intelligent decisions it needs prior state. I'm struggling to find an example of bisync that is used in production where you would not want this to be the case. Otherwise, by nature you are making a destructive sync and assuming one-side is more important and if so my point is that one should not use bisync but another command of rclone in this case. At the end of the day, most people would be okay with some data they had deleted coming back (non-destructive) but not newer files being overridden with older (destructive) by default. Sure, your argument would be that conflicted files would not get created on first sync rather lost and I agree. But are most users actually going to sit there and go through every file that is different and manually handle conflicts (if they even exist) before sync or simply run with --resync or another directional command of rclone. Realistically, in a multi user environment this is not an approachable task and perhaps not even in a single user environment. Bisync will never remove the need for archives regardless of approach and nor should it try.

I also want to comment on the safety of data, currently bisyncs approach makes me s**t my pants at how destructive it is and I have also seen others comment on this in the forums. Currently, safe is the last word I would use for bisync with regards to --resync.

Perhaps --resync 'newer' | 'path1' | 'path2' and then --auto-resync flag? I'd be careful repeating flags from Unison that would have different meaning unless you change -prefer to also have effect with conflict resolution in future syncs too as well as force to always mean force which is why I suggest a param value for resync above.

In my view the only reason for --resync should be if prior state is lost. Filter changes, crashed sync's etc should not require this though appreciate work would have to be done to make this work. Though, I can see the point that if you remove a filter rather than add perhaps a resync would be required but only if you already have said files on both sides and decision has to made and --auto-resync isn't present though ideally this should only --resync the new paths (if any) so last state for all the other files is still valid.

I think the above answers a lot of your statements as it again relies on the approach that most bisync tools are not symmetrical by default and as I've shown this is not the case; they are symmetrical by default and from first sync. I still would be concerned about a tool having both asymmetrical and symmetrical defaults. In almost all cases the odds that newer is not the "correct" way to handle an initial bidirectional sync is low. It seems unlikely most would want a directional and destructive sync from bisync and why I keep referring to sync or copy in these cases. However, if you believe that people really need a destructive sync under bisync (e.g. destructive because one-side is preferred in what should be a bidirectional sync) then I appreciate we will likely never agree on this and is why I have suggested --resync 'newer' | 'path1' | 'path2' to satisfy all needs.

@cjnaz
Copy link
Contributor

cjnaz commented Oct 3, 2023

Its good to see some expansive and passionate discussion on this issue. (And there is a diminishing return on words.) I've followed this thread at a cursory level, so I'm not prepared to weigh in on the merits and accuracy of the analysis.

As a user, I want a minimum amount of damage. That's my guiding principle. So no guessing. Damage is either loss of files or (sometimes massive) reappearance of previously deleted files.

Are there any problems with the originally proposed change?

  rclone copy --update Path1 Path2
  rclone copy --update Path2 Path1

An alternate, good-guess resync method could be added with a option switch.

Whether the resync favors one path or the other is not a critical issue for me. What matters is to reestablish coherence with minimum damage.

Is the debate about which method should be the default? My vote is always to alert the user and make it her responsibility to sort it out when there is a problem, so don't change the default resync behavior.

@crocinsocks
Copy link

crocinsocks commented Oct 3, 2023

@cjnaz I agree I should be better at shortening my responses (you should see the other thread between @nielash and myself - it's almost biblical 😂 but I believe in a good way and has definitely helped me as I hope @nielash).

The original proposed change makes the most sense to me as by default bidirectional sync should not have a preferred side e.g. symmetrical (the only exception would be when there is no mod time available which has yet to be discussed more generally in this thread).

@nielash thinks that the default should be directional which to me only makes sense under limited circumstances where one can instinctively declare your local copy is always best and all those changes made on other devices should be ignored. The problem here is this would also take precedent over newer files which is where I get worried for myself and other users of this tool and I believe would be "unusual" and fundamentally destructive.

Currently, --resync works in this manner e.g. asymmetrical and is the reason for the ticket to be opened e.g. to move to symmetrical

As a user, I want a minimum amount of damage. That's my guiding principle. So no guessing. Damage is either loss of files or (sometimes massive) reappearance of previously deleted files.

Regardless, both have the chance of choosing the wrong files (from the user perspective) and creating already deleted directories though this should be minimized if not --resync'ing a directory that's out-of-date and more importantly contains many files/folders that have already been deleted.

In my view the more destructive tool is the one that deliberately ignores newer files for a older files and therefore --update should be used. Though, simply allowing the user to choose with something like --resync 'newer' | 'path1' | 'path2' would keep everybody happy and allow for all options including where there is no modtime. Though, I'd still put forward that 'newer' should be the default which would be the same as the original suggestion e.g.:

rclone copy --update Path1 Path2
rclone copy --update Path2 Path1

And where there is no modtime it would fail without a path being specified.

@nielash
Copy link
Collaborator

nielash commented Oct 4, 2023

Hi @cjnaz! Great to finally meet you, and thank you for all of your great work on rclonesync! I sometimes feel like I spend a lot of time in your brain these days, as I work on bisync 😂

Bisync does tend to elicit some passionate opinions. 😄 To make sure we're all on the same page: --resync is currently asymmetrical in favor of Path1. While cjnaz and ivandeex discussed changing this in 2021, it never actually got changed. The --resync code that was actually released is roughly equivalent to:

rclone copy Path2 Path1 --ignore-existing
rclone copy Path1 Path2

The questions before us now are:

  1. Should --resync be changed at all at this point, given that it has been one way for 2 years and changing it would be a breaking change?
  • My opinion: yes it should (if ncw gives his blessing). Bisync is still in beta and there seems to be consensus that the change is worth it.
  1. If so, what should be the new default behavior?
  • My opinion explained below, though I don't feel as strongly about this one, given the good workarounds.
  1. Should --resync be removed entirely, as proposed recently by @crocinsocks?
  • I feel strongly that no, --resync should not be removed. Instead, we should consider adding a non-default --auto-resync mode with lots of warnings.

As a user, I want a minimum amount of damage. That's my guiding principle. So no guessing. Damage is either loss of files or (sometimes massive) reappearance of previously deleted files.

100% agree on this.

Are there any problems with the originally proposed change?

 rclone copy --update Path1 Path2
 rclone copy --update Path2 Path1

From my perspective, there is a slight problem with it, but since there are good workarounds it's ultimately something I could live with. The problem, from my perspective, is that it evaluates each file independently at the file level, without considering the impact on the directory or filesystem level. It could easily cause "damage" for the kind of directory where atomicity matters -- like a source code repo or an audio/video project folder. For users that work with these kinds of directories, it's probably unlikely they'd want a result that is a "merged" directory with some files from both sides -- much more likely they'd want one whole directory from one side to overwrite the whole directory on the other.

In my personal experience of using bisync, I've found that in a scenario where a --resync is required, it's uncommon that both sides are partially right. It's much more common that one whole side is right and the other is wrong. A symmetrical --resync in this scenario would do more harm than good, because it would pollute the right side with things from the wrong side. (For context: my main use case is keeping my desktop and laptop both in sync with Google Drive as a single-user. I am usually working on only one side at a time, which is why there's usually an entire side that is more "up to date" at any given time.)

Suppose that I've been working on my desktop for awhile, and something goes wrong that requires a --resync, and my desktop is "more correct" than drive but there are still a few files in drive that haven't been synced down yet. @crocinsocks is suggesting that in this scenario, what I should have to do is:

rclone copy local: drive: --filter-from /path/to/filters.txt
rclone copy drive: local: --filter-from /path/to/filters.txt
rclone bisync local: drive: --filters-file /path/to/filters.txt --resync

This works, but it's more steps, especially with making sure to get the filters right, and the --resync step would just be a formality (since the sides would already match.)

It also occurs to me that in more of a "star topology" setup, an asymmetrical --resync allows the hub to take priority over the spokes. Whereas with symmetrical, any one spoke could end up being given undue weight over all the other spokes.

I don't think it is as simple as "asymmetrical = destructive", because it depends on context. I've just described some scenarios where the symmetrical option would be the more destructive. I likewise find "newer = better" too simplistic, as it fails to consider intentional rollbacks to older versions, and the damage it could be causing to the repo as a whole by prioritizing the newness of individual files, without regard for their interdependence.

I also don't subscribe to the view that there is anything inherently "unsafe" about an asymmetrical tool, so long as it is clearly documented. Of course, any tool is "unsafe" if used without a full understanding of how it works. 🙂

All of that said, if there is consensus around the --resync newer option as the default, I am fine with that, especially if --resync path1 and --resync path2 also exist. (I would also propose that --resync older should probably exist too, although I suppose @crocinsocks will think that's heresy 😆 )

Ultimately, all of these concerns are substantially mitigated by the fact that --resync is never automatic, and the user always has the ability to perform manual cleanup first to override whatever we decide should be its default. That is the beauty of that design choice (and why I feel strongly that --resync should stay.)

@crocinsocks
Copy link

crocinsocks commented Oct 4, 2023

Agreed @cjnaz thank you for providing such a brilliant tool which @nielash is now excellently managing/adding to!

Bisync does tend to elicit some passionate opinions. 😄
(I would also propose that --resync older should probably exist too, although I suppose @crocinsocks will think that's heresy 😆 )

I agree and you're not wrong but I've added to the below anyways 😆

Points in this thread (heavily generalised)

  1. So --resync can offer all options it would seem nielash and I are in agreement with the structure

--resync 'newer' | 'path1' | 'path2' | 'older'

I agree that for project directories (code, music, video, etc..), and perhaps some single-user environments, a 'path1'/'path2' resync can make a lot of sense otherwise for non-project/multi-user directories 'newer' does.

  1. That in an experimental/beta application breaking changes are valid. In theory, this could be implemented without breaking changes (mostly) by keeping the default as --resync 'path1' which perhaps at this stage is the preferred option. My reasoning for 'newer' as the default is I believe most coming to this tool will be coming from the usual suspects (Dropbox, OneDrive, etc..). I also agree, that as long as the option exists for 'newer' and is documented this covers all bases.

  2. I agree, that having --resync is useful and shouldn't be removed from a power-user tool such as this. The reason I mentioned --auto-resync 'newer' | ... / removal is bisync can be a little sensitive and requiring a --resync is always a little daunting. Short of the prior listing not being present this should be minimised ideally to 0. However, this is perhaps better spoken about in another issue? Things such as filter changes shouldn't require a resync when a filter is added and when removed should try to sync without requiring a resync or perhaps a --resync-filters flag so one does not have to run the risk of a complete --resync. This can also be seen with a crashed sync - though nielash and I have spoken about this in another issue. Another reason is a user is just going to run a resync with their preferred method anyways as I can't see someone sitting and going through every diff file and manually syncing/merging each one to get back in sync. Ultimately, if --resync can get to a place where it is seldom used then an --auto-resync flag seems unnecessary so perhaps should be ignored for now.

Decisions

It would seem everybody agrees that bisync could benefit from being optimised to use existing rclone copy. So ignoring point 3 for now the only thing I believe left to decide is should the default:

  • continue as asymmetric e.g. --resync path1
rclone copy Path2 Path1 --ignore-existing
rclone copy Path1 Path2
  • change to symmetric e.g. --resync newer
rclone copy --update Path1 Path2
rclone copy --update Path2 Path1

And are these changes something @ncw (and everyone else) would be happy with.

Thanks 🙏

EDIT: to show the commands of asymmetric and symmetric syncs

@nielash
Copy link
Collaborator

nielash commented Oct 5, 2023

  1. So --resync can offer all options it would seem nielash and I are in agreement with the structure

--resync 'newer' | 'path1' | 'path2' | 'older'

I agree that for project directories (code, music, video, etc..), and perhaps some single-user environments, a 'path1'/'path2' resync can make a lot of sense otherwise for non-project/multi-user directories 'newer' does.

I agree -- this is a sensible approach, and allows each user to choose the best option for their own use case.

Short of the prior listing not being present this should be minimised ideally to 0. However, this is perhaps better spoken about in another issue? Things such as filter changes shouldn't require a resync when a filter is added and when removed should try to sync without requiring a resync or perhaps a --resync-filters flag so one does not have to run the risk of a complete --resync.

I think I understand what you're getting at -- which is that a --resync is technically superfluous for certain kinds of session changes, and that requiring a --resync when superfluous poses dangers of its own. So, I agree could be beneficial to consider decoupling the "user permission" pillar from the other two --resync pillars, when possible (perhaps similar to how --force currently serves that role for the --max-delete protection.) However, some filter changes really do require a --resync, if the change makes the filters more inclusive rather than less. Detecting this could be tricky, as a --filters-file can contain a combination of both include and exclude rules. I think we should consider this part of the Sessions project (#5678) and continue discussion about it there, independently from this ticket.

It would seem everybody agrees that bisync could benefit from being optimised to use existing rclone copy.

Just to clarify: it already uses the same underlying function as rclone copy, but it does so after doing some unnecessary work to implement its own --ignore-existing logic, which also causes problems with unicode normalization and --create-empty-src-dirs, instead of just setting ci.IgnoreExisting. There is a way to fix that part of it without breaking anything, so I think that should be done regardless of whatever else we decide here.

So ignoring point 3 for now the only thing I believe left to decide is should the default

Agree.

nielash added a commit to nielash/rclone that referenced this issue Nov 6, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 8, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 9, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 11, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 11, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 12, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 12, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 15, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 22, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 22, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 22, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 23, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Nov 30, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Dec 4, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Dec 8, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Dec 8, 2023


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
nielash added a commit to nielash/rclone that referenced this issue Dec 24, 2023
Before this change, the path1 version of a file always prevailed during
--resync, and many users requested options to automatically select the winner
based on characteristics such as newer, older, larger, and smaller. This change
adds support for such options.

Note that ideally this feature would have been implemented by allowing the
existing `--resync` flag to optionally accept string values such as `--resync
newer`. However, this would have been a breaking change, as the existing flag
is a `bool` and it does not seem to be possible to have a `string` flag that
accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal`
does not work for this, as it would force an `=` like `--resync=newer`.) So
instead, the best compromise to avoid a breaking change was to add a new
`--resync-mode CHOICE` flag that implies `--resync`, while maintaining the
existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both
flags are now valid, and either can be used without the other.

--resync-mode CHOICE

In the event that a file differs on both sides during a `--resync`,
`--resync-mode` controls which version will overwrite the other. The supported
options are similar to `--conflict-resolve`. For all of the following options,
the version that is kept is referred to as the "winner", and the version that
is overwritten (deleted) is referred to as the "loser". The options are named
after the "winner":

- `path1` - (the default) - the version from Path1 is unconditionally
considered the winner (regardless of `modtime` and `size`, if any). This can be
useful if one side is more trusted or up-to-date than the other, at the time of
the `--resync`.
- `path2` - same as `path1`, except the path2 version is considered the winner.
- `newer` - the newer file (by `modtime`) is considered the winner, regardless
of which side it came from. This may result in having a mix of some winners
from Path1, and some winners from Path2. (The implementation is analagous to
running `rclone copy --update` in both directions.)
- `older` - same as `newer`, except the older file is considered the winner,
and the newer file is considered the loser.
- `larger` - the larger file (by `size`) is considered the winner (regardless
of `modtime`, if any). This can be a useful option for remotes without
`modtime` support, or with the kinds of files (such as logs) that tend to grow
but not shrink, over time.
- `smaller` - the smaller file (by `size`) is considered the winner (regardless
of `modtime`, if any).

For all of the above options, note the following:
- If either of the underlying remotes lacks support for the chosen method, it
will be ignored and will fall back to the default of `path1`. (For example, if
`--resync-mode newer` is set, but one of the paths uses a remote that doesn't
support `modtime`.)
- If a winner can't be determined because the chosen method's attribute is
missing or equal, it will be ignored, and bisync will instead try to determine
whether the files differ by looking at the other `--compare` methods in effect.
(For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes
are identical, bisync will compare the sizes.) If bisync concludes that they
differ, preference is given to whichever is the "source" at that moment. (In
practice, this gives a slight advantage to Path2, as the 2to1 copy comes before
the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides
are already correct).
- These options apply only to files that exist on both sides (with the same
name and relative path). Files that exist *only* on one side and not the other
are *always* copied to the other, during `--resync` (this is one of the main
differences between resync and non-resync runs.).
- `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not
apply during `--resync`, and unlike these flags, nothing is renamed during
`--resync`. When a file differs on both sides during `--resync`, one version
always overwrites the other (much like in `rclone copy`.) (Consider using
`--backup-dir` to retain a backup of the losing version.)
- Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option
(or rather, it will be interpreted as "no resync", unless `--resync` has also
been specified, in which case it will be ignored.)
- Winners and losers are decided at the individual file-level only (there is
not currently an option to pick an entire winning directory atomically,
although the `path1` and `path2` options typically produce a similar result.)
- To maintain backward-compatibility, the `--resync` flag implies
`--resync-mode path1` unless a different `--resync-mode` is explicitly
specified. Similarly, all `--resync-mode` options (except `none`) imply
`--resync`, so it is not necessary to use both the `--resync` and
`--resync-mode` flags simultaneously -- either one is sufficient without the
other.
@nielash nielash linked a pull request Dec 24, 2023 that will close this issue
5 tasks
nielash added a commit to nielash/rclone that referenced this issue Dec 24, 2023
Before this change, the path1 version of a file always prevailed during
--resync, and many users requested options to automatically select the winner
based on characteristics such as newer, older, larger, and smaller. This change
adds support for such options.

Note that ideally this feature would have been implemented by allowing the
existing `--resync` flag to optionally accept string values such as `--resync
newer`. However, this would have been a breaking change, as the existing flag
is a `bool` and it does not seem to be possible to have a `string` flag that
accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal`
does not work for this, as it would force an `=` like `--resync=newer`.) So
instead, the best compromise to avoid a breaking change was to add a new
`--resync-mode CHOICE` flag that implies `--resync`, while maintaining the
existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both
flags are now valid, and either can be used without the other.

--resync-mode CHOICE

In the event that a file differs on both sides during a `--resync`,
`--resync-mode` controls which version will overwrite the other. The supported
options are similar to `--conflict-resolve`. For all of the following options,
the version that is kept is referred to as the "winner", and the version that
is overwritten (deleted) is referred to as the "loser". The options are named
after the "winner":

- `path1` - (the default) - the version from Path1 is unconditionally
considered the winner (regardless of `modtime` and `size`, if any). This can be
useful if one side is more trusted or up-to-date than the other, at the time of
the `--resync`.
- `path2` - same as `path1`, except the path2 version is considered the winner.
- `newer` - the newer file (by `modtime`) is considered the winner, regardless
of which side it came from. This may result in having a mix of some winners
from Path1, and some winners from Path2. (The implementation is analagous to
running `rclone copy --update` in both directions.)
- `older` - same as `newer`, except the older file is considered the winner,
and the newer file is considered the loser.
- `larger` - the larger file (by `size`) is considered the winner (regardless
of `modtime`, if any). This can be a useful option for remotes without
`modtime` support, or with the kinds of files (such as logs) that tend to grow
but not shrink, over time.
- `smaller` - the smaller file (by `size`) is considered the winner (regardless
of `modtime`, if any).

For all of the above options, note the following:
- If either of the underlying remotes lacks support for the chosen method, it
will be ignored and will fall back to the default of `path1`. (For example, if
`--resync-mode newer` is set, but one of the paths uses a remote that doesn't
support `modtime`.)
- If a winner can't be determined because the chosen method's attribute is
missing or equal, it will be ignored, and bisync will instead try to determine
whether the files differ by looking at the other `--compare` methods in effect.
(For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes
are identical, bisync will compare the sizes.) If bisync concludes that they
differ, preference is given to whichever is the "source" at that moment. (In
practice, this gives a slight advantage to Path2, as the 2to1 copy comes before
the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides
are already correct).
- These options apply only to files that exist on both sides (with the same
name and relative path). Files that exist *only* on one side and not the other
are *always* copied to the other, during `--resync` (this is one of the main
differences between resync and non-resync runs.).
- `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not
apply during `--resync`, and unlike these flags, nothing is renamed during
`--resync`. When a file differs on both sides during `--resync`, one version
always overwrites the other (much like in `rclone copy`.) (Consider using
`--backup-dir` to retain a backup of the losing version.)
- Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option
(or rather, it will be interpreted as "no resync", unless `--resync` has also
been specified, in which case it will be ignored.)
- Winners and losers are decided at the individual file-level only (there is
not currently an option to pick an entire winning directory atomically,
although the `path1` and `path2` options typically produce a similar result.)
- To maintain backward-compatibility, the `--resync` flag implies
`--resync-mode path1` unless a different `--resync-mode` is explicitly
specified. Similarly, all `--resync-mode` options (except `none`) imply
`--resync`, so it is not necessary to use both the `--resync` and
`--resync-mode` flags simultaneously -- either one is sufficient without the
other.
@nielash nielash moved this from To do to In progress in bisync Dec 24, 2023
nielash added a commit that referenced this issue Jan 20, 2024
Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in #5681!)
nielash added a commit to nielash/rclone that referenced this issue Jan 20, 2024
Before this change, the path1 version of a file always prevailed during
--resync, and many users requested options to automatically select the winner
based on characteristics such as newer, older, larger, and smaller. This change
adds support for such options.

Note that ideally this feature would have been implemented by allowing the
existing `--resync` flag to optionally accept string values such as `--resync
newer`. However, this would have been a breaking change, as the existing flag
is a `bool` and it does not seem to be possible to have a `string` flag that
accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal`
does not work for this, as it would force an `=` like `--resync=newer`.) So
instead, the best compromise to avoid a breaking change was to add a new
`--resync-mode CHOICE` flag that implies `--resync`, while maintaining the
existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both
flags are now valid, and either can be used without the other.

--resync-mode CHOICE

In the event that a file differs on both sides during a `--resync`,
`--resync-mode` controls which version will overwrite the other. The supported
options are similar to `--conflict-resolve`. For all of the following options,
the version that is kept is referred to as the "winner", and the version that
is overwritten (deleted) is referred to as the "loser". The options are named
after the "winner":

- `path1` - (the default) - the version from Path1 is unconditionally
considered the winner (regardless of `modtime` and `size`, if any). This can be
useful if one side is more trusted or up-to-date than the other, at the time of
the `--resync`.
- `path2` - same as `path1`, except the path2 version is considered the winner.
- `newer` - the newer file (by `modtime`) is considered the winner, regardless
of which side it came from. This may result in having a mix of some winners
from Path1, and some winners from Path2. (The implementation is analagous to
running `rclone copy --update` in both directions.)
- `older` - same as `newer`, except the older file is considered the winner,
and the newer file is considered the loser.
- `larger` - the larger file (by `size`) is considered the winner (regardless
of `modtime`, if any). This can be a useful option for remotes without
`modtime` support, or with the kinds of files (such as logs) that tend to grow
but not shrink, over time.
- `smaller` - the smaller file (by `size`) is considered the winner (regardless
of `modtime`, if any).

For all of the above options, note the following:
- If either of the underlying remotes lacks support for the chosen method, it
will be ignored and will fall back to the default of `path1`. (For example, if
`--resync-mode newer` is set, but one of the paths uses a remote that doesn't
support `modtime`.)
- If a winner can't be determined because the chosen method's attribute is
missing or equal, it will be ignored, and bisync will instead try to determine
whether the files differ by looking at the other `--compare` methods in effect.
(For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes
are identical, bisync will compare the sizes.) If bisync concludes that they
differ, preference is given to whichever is the "source" at that moment. (In
practice, this gives a slight advantage to Path2, as the 2to1 copy comes before
the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides
are already correct).
- These options apply only to files that exist on both sides (with the same
name and relative path). Files that exist *only* on one side and not the other
are *always* copied to the other, during `--resync` (this is one of the main
differences between resync and non-resync runs.).
- `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not
apply during `--resync`, and unlike these flags, nothing is renamed during
`--resync`. When a file differs on both sides during `--resync`, one version
always overwrites the other (much like in `rclone copy`.) (Consider using
`--backup-dir` to retain a backup of the losing version.)
- Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option
(or rather, it will be interpreted as "no resync", unless `--resync` has also
been specified, in which case it will be ignored.)
- Winners and losers are decided at the individual file-level only (there is
not currently an option to pick an entire winning directory atomically,
although the `path1` and `path2` options typically produce a similar result.)
- To maintain backward-compatibility, the `--resync` flag implies
`--resync-mode path1` unless a different `--resync-mode` is explicitly
specified. Similarly, all `--resync-mode` options (except `none`) imply
`--resync`, so it is not necessary to use both the `--resync` and
`--resync-mode` flags simultaneously -- either one is sufficient without the
other.
nielash added a commit that referenced this issue Jan 20, 2024
Before this change, the path1 version of a file always prevailed during
--resync, and many users requested options to automatically select the winner
based on characteristics such as newer, older, larger, and smaller. This change
adds support for such options.

Note that ideally this feature would have been implemented by allowing the
existing `--resync` flag to optionally accept string values such as `--resync
newer`. However, this would have been a breaking change, as the existing flag
is a `bool` and it does not seem to be possible to have a `string` flag that
accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal`
does not work for this, as it would force an `=` like `--resync=newer`.) So
instead, the best compromise to avoid a breaking change was to add a new
`--resync-mode CHOICE` flag that implies `--resync`, while maintaining the
existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both
flags are now valid, and either can be used without the other.

--resync-mode CHOICE

In the event that a file differs on both sides during a `--resync`,
`--resync-mode` controls which version will overwrite the other. The supported
options are similar to `--conflict-resolve`. For all of the following options,
the version that is kept is referred to as the "winner", and the version that
is overwritten (deleted) is referred to as the "loser". The options are named
after the "winner":

- `path1` - (the default) - the version from Path1 is unconditionally
considered the winner (regardless of `modtime` and `size`, if any). This can be
useful if one side is more trusted or up-to-date than the other, at the time of
the `--resync`.
- `path2` - same as `path1`, except the path2 version is considered the winner.
- `newer` - the newer file (by `modtime`) is considered the winner, regardless
of which side it came from. This may result in having a mix of some winners
from Path1, and some winners from Path2. (The implementation is analagous to
running `rclone copy --update` in both directions.)
- `older` - same as `newer`, except the older file is considered the winner,
and the newer file is considered the loser.
- `larger` - the larger file (by `size`) is considered the winner (regardless
of `modtime`, if any). This can be a useful option for remotes without
`modtime` support, or with the kinds of files (such as logs) that tend to grow
but not shrink, over time.
- `smaller` - the smaller file (by `size`) is considered the winner (regardless
of `modtime`, if any).

For all of the above options, note the following:
- If either of the underlying remotes lacks support for the chosen method, it
will be ignored and will fall back to the default of `path1`. (For example, if
`--resync-mode newer` is set, but one of the paths uses a remote that doesn't
support `modtime`.)
- If a winner can't be determined because the chosen method's attribute is
missing or equal, it will be ignored, and bisync will instead try to determine
whether the files differ by looking at the other `--compare` methods in effect.
(For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes
are identical, bisync will compare the sizes.) If bisync concludes that they
differ, preference is given to whichever is the "source" at that moment. (In
practice, this gives a slight advantage to Path2, as the 2to1 copy comes before
the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides
are already correct).
- These options apply only to files that exist on both sides (with the same
name and relative path). Files that exist *only* on one side and not the other
are *always* copied to the other, during `--resync` (this is one of the main
differences between resync and non-resync runs.).
- `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not
apply during `--resync`, and unlike these flags, nothing is renamed during
`--resync`. When a file differs on both sides during `--resync`, one version
always overwrites the other (much like in `rclone copy`.) (Consider using
`--backup-dir` to retain a backup of the losing version.)
- Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option
(or rather, it will be interpreted as "no resync", unless `--resync` has also
been specified, in which case it will be ignored.)
- Winners and losers are decided at the individual file-level only (there is
not currently an option to pick an entire winning directory atomically,
although the `path1` and `path2` options typically produce a similar result.)
- To maintain backward-compatibility, the `--resync` flag implies
`--resync-mode path1` unless a different `--resync-mode` is explicitly
specified. Similarly, all `--resync-mode` options (except `none`) imply
`--resync`, so it is not necessary to use both the `--resync` and
`--resync-mode` flags simultaneously -- either one is sufficient without the
other.
@nielash nielash moved this from In progress to Done in bisync Jan 20, 2024
miku added a commit to internetarchive/rclone that referenced this issue Jan 23, 2024
* master: (86 commits)
  fs: add more detailed logging for file includes/excludes
  bisync: add --resync-mode for customizing --resync - fixes rclone#5681
  bisync: fix --colors flag
  bisync: factor resync to separate file
  bisync: skip empty test case dirs
  bisync: add options to auto-resolve conflicts - fixes rclone#7471
  bisync: check for syntax errors in path args - fixes rclone#7511
  bisync: add overlapping paths check
  bisync: allow lock file expiration/renewal with --max-lock - rclone#7470
  bisync: Graceful Shutdown, --recover from interruptions without --resync - fixes rclone#7470
  bisync: full support for comparing checksum, size, modtime - fixes rclone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675
  bisync: document beta status more clearly - fixes rclone#6082
  bisync: normalize session name to non-canonical - fixes rclone#7423
  bisync: update version number in docs
  bisync: account for differences in backend features on integration tests - see rclone#5679
  operations: fix renaming a file on macOS
  bisync: fallback to cryptcheck or --download when can't check hash
  local: fix cleanRootPath on Windows after go1.21.4 stdlib update
  bisync: support two --backup-dir paths on different remotes
  bisync: support files with unknown length, including Google Docs - fixes rclone#5696
  ...
WuTofu pushed a commit to WuTofu/rclone that referenced this issue Feb 24, 2024


Before this change, --resync was handled in three steps, and needed to do a lot
of unnecessary work to implement its own --ignore-existing logic, which also
caused problems with unicode normalization, in addition to being pretty slow.
After this change, it is refactored to produce the same result much more
efficiently, by reducing the three steps to two and letting ci.IgnoreExisting
do the work instead of reinventing the wheel.

The behavior and sync order remain unchanged for now -- just faster (but see
the ongoing lively discussions about potential future changes in rclone#5681!)
WuTofu pushed a commit to WuTofu/rclone that referenced this issue Feb 24, 2024
Before this change, the path1 version of a file always prevailed during
--resync, and many users requested options to automatically select the winner
based on characteristics such as newer, older, larger, and smaller. This change
adds support for such options.

Note that ideally this feature would have been implemented by allowing the
existing `--resync` flag to optionally accept string values such as `--resync
newer`. However, this would have been a breaking change, as the existing flag
is a `bool` and it does not seem to be possible to have a `string` flag that
accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal`
does not work for this, as it would force an `=` like `--resync=newer`.) So
instead, the best compromise to avoid a breaking change was to add a new
`--resync-mode CHOICE` flag that implies `--resync`, while maintaining the
existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both
flags are now valid, and either can be used without the other.

--resync-mode CHOICE

In the event that a file differs on both sides during a `--resync`,
`--resync-mode` controls which version will overwrite the other. The supported
options are similar to `--conflict-resolve`. For all of the following options,
the version that is kept is referred to as the "winner", and the version that
is overwritten (deleted) is referred to as the "loser". The options are named
after the "winner":

- `path1` - (the default) - the version from Path1 is unconditionally
considered the winner (regardless of `modtime` and `size`, if any). This can be
useful if one side is more trusted or up-to-date than the other, at the time of
the `--resync`.
- `path2` - same as `path1`, except the path2 version is considered the winner.
- `newer` - the newer file (by `modtime`) is considered the winner, regardless
of which side it came from. This may result in having a mix of some winners
from Path1, and some winners from Path2. (The implementation is analagous to
running `rclone copy --update` in both directions.)
- `older` - same as `newer`, except the older file is considered the winner,
and the newer file is considered the loser.
- `larger` - the larger file (by `size`) is considered the winner (regardless
of `modtime`, if any). This can be a useful option for remotes without
`modtime` support, or with the kinds of files (such as logs) that tend to grow
but not shrink, over time.
- `smaller` - the smaller file (by `size`) is considered the winner (regardless
of `modtime`, if any).

For all of the above options, note the following:
- If either of the underlying remotes lacks support for the chosen method, it
will be ignored and will fall back to the default of `path1`. (For example, if
`--resync-mode newer` is set, but one of the paths uses a remote that doesn't
support `modtime`.)
- If a winner can't be determined because the chosen method's attribute is
missing or equal, it will be ignored, and bisync will instead try to determine
whether the files differ by looking at the other `--compare` methods in effect.
(For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes
are identical, bisync will compare the sizes.) If bisync concludes that they
differ, preference is given to whichever is the "source" at that moment. (In
practice, this gives a slight advantage to Path2, as the 2to1 copy comes before
the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides
are already correct).
- These options apply only to files that exist on both sides (with the same
name and relative path). Files that exist *only* on one side and not the other
are *always* copied to the other, during `--resync` (this is one of the main
differences between resync and non-resync runs.).
- `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not
apply during `--resync`, and unlike these flags, nothing is renamed during
`--resync`. When a file differs on both sides during `--resync`, one version
always overwrites the other (much like in `rclone copy`.) (Consider using
`--backup-dir` to retain a backup of the losing version.)
- Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option
(or rather, it will be interpreted as "no resync", unless `--resync` has also
been specified, in which case it will be ignored.)
- Winners and losers are decided at the individual file-level only (there is
not currently an option to pick an entire winning directory atomically,
although the `path1` and `path2` options typically produce a similar result.)
- To maintain backward-compatibility, the `--resync` flag implies
`--resync-mode path1` unless a different `--resync-mode` is explicitly
specified. Similarly, all `--resync-mode` options (except `none`) imply
`--resync`, so it is not necessary to use both the `--resync` and
`--resync-mode` flags simultaneously -- either one is sufficient without the
other.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment