New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bisync: optimize --resync #5681
Comments
I think the core question here is whether rclonesync's asymmetrical --first-sync solves some subtle problem. I think the answer is no, none that I can recall. When I implemented the current --first-sync method I think I was too embedded in the new algorithm (building file lists for raw copies/deletes) that I missed the simpler picture. I'd say go for this method...
|
Unit tests and documentation will be updated respectively. |
…and equality check bisync: bug fixes and new features including --create-empty-src-dirs and equality check * Fixed an issue causing dry runs to inadvertently commit filter changes * Fixed an issue causing --resync to erroneously delete empty folders and duplicate files unique to Path2 * --check-access is now enforced during --resync, preventing data loss in certain user error scenarios * Fixed an issue causing bisync to consider more files than necessary due to overbroad filters during delete operations * Improved detection of false positive change conflicts (identical files are now left alone instead of renamed) * Added support for --create-empty-src-dirs * Added experimental --resilient mode to allow recovery from self-correctable errors * Added new --ignore-listing-checksum flag to distinguish from --ignore-checksum * Performance improvements for large remotes * Documentation and testing improvements Fixes rclone#6109 Also addresses: rclone#6841 rclone#5683 rclone#5681 rclone#5676 rclone#5675 rclone#5674 See also: https://forum.rclone.org/t/bisync-bugs-and-feature-requests/37636
I had the same thought recently: rclone/cmd/bisync/operations.go Lines 426 to 433 in 156c372
I tend to think it would be better to change it, as in addition to being simpler and more symmetrical, it would also make Perhaps the best compromise is to make the behavior configurable with a flag? |
@nielash I've moved my comments over from #7332 regarding bisync.. I agree with both @ivandeex @cjnaz that the correct implementation for bisync should be:
This is what would happen if I was using Dropbox, OneDrive, GDrive, etc (e.g. the newest file will always survive on init and where the local directory already has data). As bisync is still experimental I seem no harm in changing this to be the default behavior. Also, I believe this would be inline with the expectations of any new users of bisync and would also benefit existing users. This also makes a --resync far less destructive. Currently, Path1 will always take precedent which is the last thing I believe anyone would want from a bisync (existing or new users) rather they are stuck with the current implementation. If there are current users that would like this functionality perhaps either a --legacy flag or simply they can run a sync manually to have the desired effect based of your comments:
Thanks 🙏 |
I generally agree as well, although I have a slight concern about the use of
I can think of a few reasons a user might want the asymmetry -- I described one of them above. Imagine, for example, that I pull my dusty old laptop out of the closet which hasn't been synced in a long time, and I want to bring it up to date with my Google Drive which I know is totally up-to-date. In this case, I do in fact want the trusted Another use case (which I've used myself) is safer preservation of metadata that rclone doesn't currently support, such as directory metadata, macOS xattrs, and permissions. An asymmetrical sync allows you to avoid overwriting that stuff on the trusted side, when possible. Additionally, it should be noted that bisync's current approximation of I tend to think that rclone's default (which is still slightly asymmetrical) should be bisync's default:
but that this should be overridable with flags for users who think this is madness 😄 @ncw I think this is ultimately a question for you -- what is your guiding principle for breaking changes such as this, where it's not exactly a bug, but there does seem to be consensus that a different way would be better? Is it ok to change the behavior, provided it's documented clearly? (And is the calculation different because bisync is still in beta?) |
The problem here is you are using the wrong tool. Bisync should be bisync not sometimes bisync. Let me explain. For the below standard means: Dropbox, OneDrive, GDrive, Unison, etc...
So in your example you have two options, remove the data or run a With regards to directory metadata there is already a ticket raised #6685 to handle this and again would not conflate these two issues as this will be solved once metadata is added to the directory where possible or where not the use of a As we are talking about a Ultimately, there should be one way for bisync to work which is using --update flag as that would make it bidirectional and inline with expectations e.g. the same as every other bidirectional syncing tool. If someone wants to do another type of sync they should use Maybe @ncw could give his input on this matter too alongside confirming whether such a change can take place. |
I think it is an oversimplification to say that there is one "standard" way of doing this. For example, Unison has the Also, the "standard - remove dir and run" that you cite for Dropbox and GDrive would be more equivalent to:
which is still different from the
So it actually seems that you are suggesting a different default than what the above tools use (closest would be Unison with You are correct that in my example about the laptop, I could have first run Fair points about the metadata and speed.
I'm generally wary of forcing a specific flag on users without giving them a way to change it. If we go with |
I disagree there is definitely a default way of doing bidirectional file sync and it is symmetrical based on the tools that most people use day-to-day and in which sys admins maintain e.g. the main three Dropbox, Google Drive, OneDrive then others such as Amazon Drive, NextCloud, Seafile, OwnCloud, etc.. Though you are correct I was wrong to put Unison in this category; it's been a while since I last used. Though it is hard to compare as
This is untrue. Whilst it may give the illusion of a sync command because you are starting with an empty directory in essence you are still doing a But then there is the argument why use So to conclude my original point still stands bidirectional sync should be symmetrical by default if the goal is to be inline with common understanding. Further, I think it's extremely bad practice for a tool to be by default asymmetrical then symmetrical. Unison by default is asymmetrical at all times not a combination of the two. |
@crocinsocks I took some time today refreshing my memory on how Dropbox handles this scenario. It is documented here: Why does Dropbox need to “sync my files again”?, and it matches the behavior I observed when I tried it myself. In relevant part:
So, it looks like what Dropbox actually does is equivalent to this:
The In other words, Dropbox's equivalent to I also tried signing out and doing a fresh install to a location that already had existing files, to see what it would do. Every time, it refused with the following error messages: So, not only did it not give precedence to the newer file on my machine, it also didn't even keep files that were only on my machine. I don't doubt that Dropbox does behave symmetrically after it is fully set-up (as does bisync), but as we are only discussing I also continue to think that what Dropbox does is a little bit beside the point, because we are not obligated to follow exactly what Dropbox does. I use rclone and bisync instead of alternative tools because I like how it works better 🙂
To back up just slightly:
There are various bisync alternatives out there that don't have a Some tools also have some method of making an educated guess to resolve conflicts between two file versions (such as guessing that the version with the newest modtime is "best"), but a file conflict is not actually required for trouble to arise. Consider the following thought experiment: you're a bidirectional sync tool and you come across a folder you've never seen before. On
The answer is that you can't know this with certainty -- you can only guess. Most tools that engage in this type of guessing will guess that Bisync, on the other hand, just refuses to guess in such a scenario. It instead requires the user to make the decision, by refusing to run again until the user manually runs Personally, I love this about bisync, and it's the only reason I feel comfortable trusting my important files to it. The day that All of that said, I recognize that different users have different use cases and preferences, and some users may prefer convenience over safety. This could be sensible if, for example, you are just using bisync to sync your movie collection, which you could just re-download from the internet should you ever need to. This user's preference and standard for perfection will naturally be different than the user bisyncing the only surviving photos of his grandmother, or the novel he's been working on for 5 years, or other priceless data. (Yes, they should be making backups too, but the point still stands.) So, while I think bisync should choose "safe" by default, I would not be opposed to adding some optional flags to enable more of the "convenient" behavior. It actually would not be that difficult to add a
There are also some tools that get around this issue by watching filesystem events -- something that rclone cannot yet do for the
I think the goal is to provide the best bidirectional sync tool, and the one that is most consistent with rclone's design principles more generally -- and then make sure that any departures from "common understanding" are clearly articulated in the docs. (I also don't think that there is one standard "common understanding" on this particular issue. As I've pointed out, neither Dropbox nor Unison actually use something like
I think we just see this one differently. I love that bisync has two different modes for different purposes -- I think it's one of the genius things about its design, and part of why I prefer it to other bidirectional sync tools. I also think that Dropbox, as far as I can tell, is first asymmetrical and then symmetrical after that. I see no problem with that choice, as long as it's documented clearly.
If I'm understanding correctly, I think what you are saying is that because To summarize, I think what the decision comes down to is:
|
Dropbox (+ other sync tools)Hi @nielash, you are missing why Dropbox does this in the first place. Dropbox has created this feature as an add-on for specifically macos and windows user as they consider these users to be less tech knowledgeable which in general is true. This feature has been added to prevent someone who does not understand Dropbox is always symmetrical and that when syncing an old dropbox folder stuff that may have been deleted will be re-added and any changes where future changes have been made e.g. was changed but not the newest, will be lost. As you can see this does not apply to Linux as they expect power users to understand this. There is a simple workaround which many poweruser and sysadmins use on windows and mac with all these type of sync clients as many make the same assumptions as Dropbox. That is rename the folder or choose another location for "live" and then move the files from "old" into the "live" dropbox folder. As you can see, the newer files from the "old" folder or newer files from the cloud will always be chosen and unchanged files left untouched. This is because dropbox (init) sync is equivalent to ResyncLet's take the most common use case for bisync (unless you disagree?). Multiple devices/users are working on a central drive; without knowing prior state one has to assume the central drive is always correct unless you have a "new" / newer file(s) locally. Of course, for bisync to make more intelligent decisions it needs prior state. I'm struggling to find an example of bisync that is used in production where you would not want this to be the case. Otherwise, by nature you are making a destructive sync and assuming one-side is more important and if so my point is that one should not use bisync but another command of rclone in this case. At the end of the day, most people would be okay with some data they had deleted coming back (non-destructive) but not newer files being overridden with older (destructive) by default. Sure, your argument would be that conflicted files would not get created on first sync rather lost and I agree. But are most users actually going to sit there and go through every file that is different and manually handle conflicts (if they even exist) before sync or simply run with I also want to comment on the safety of data, currently bisyncs approach makes me s**t my pants at how destructive it is and I have also seen others comment on this in the forums. Currently, safe is the last word I would use for bisync with regards to --resync. Perhaps In my view the only reason for --resync should be if prior state is lost. Filter changes, crashed sync's etc should not require this though appreciate work would have to be done to make this work. Though, I can see the point that if you remove a filter rather than add perhaps a resync would be required but only if you already have said files on both sides and decision has to made and I think the above answers a lot of your statements as it again relies on the approach that most bisync tools are not symmetrical by default and as I've shown this is not the case; they are symmetrical by default and from first sync. I still would be concerned about a tool having both asymmetrical and symmetrical defaults. In almost all cases the odds that newer is not the "correct" way to handle an initial bidirectional sync is low. It seems unlikely most would want a directional and destructive sync from bisync and why I keep referring to sync or copy in these cases. However, if you believe that people really need a destructive sync under bisync (e.g. destructive because one-side is preferred in what should be a bidirectional sync) then I appreciate we will likely never agree on this and is why I have suggested |
Its good to see some expansive and passionate discussion on this issue. (And there is a diminishing return on words.) I've followed this thread at a cursory level, so I'm not prepared to weigh in on the merits and accuracy of the analysis. As a user, I want a minimum amount of damage. That's my guiding principle. So no guessing. Damage is either loss of files or (sometimes massive) reappearance of previously deleted files. Are there any problems with the originally proposed change?
An alternate, good-guess resync method could be added with a option switch. Whether the resync favors one path or the other is not a critical issue for me. What matters is to reestablish coherence with minimum damage. Is the debate about which method should be the default? My vote is always to alert the user and make it her responsibility to sort it out when there is a problem, so don't change the default resync behavior. |
@cjnaz I agree I should be better at shortening my responses (you should see the other thread between @nielash and myself - it's almost biblical 😂 but I believe in a good way and has definitely helped me as I hope @nielash). The original proposed change makes the most sense to me as by default bidirectional sync should not have a preferred side e.g. symmetrical (the only exception would be when there is no mod time available which has yet to be discussed more generally in this thread). @nielash thinks that the default should be directional which to me only makes sense under limited circumstances where one can instinctively declare your local copy is always best and all those changes made on other devices should be ignored. The problem here is this would also take precedent over newer files which is where I get worried for myself and other users of this tool and I believe would be "unusual" and fundamentally destructive. Currently, --resync works in this manner e.g. asymmetrical and is the reason for the ticket to be opened e.g. to move to symmetrical
Regardless, both have the chance of choosing the wrong files (from the user perspective) and creating already deleted directories though this should be minimized if not --resync'ing a directory that's out-of-date and more importantly contains many files/folders that have already been deleted. In my view the more destructive tool is the one that deliberately ignores newer files for a older files and therefore
And where there is no modtime it would fail without a path being specified. |
Hi @cjnaz! Great to finally meet you, and thank you for all of your great work on Bisync does tend to elicit some passionate opinions. 😄 To make sure we're all on the same page:
The questions before us now are:
100% agree on this.
From my perspective, there is a slight problem with it, but since there are good workarounds it's ultimately something I could live with. The problem, from my perspective, is that it evaluates each file independently at the file level, without considering the impact on the directory or filesystem level. It could easily cause "damage" for the kind of directory where atomicity matters -- like a source code repo or an audio/video project folder. For users that work with these kinds of directories, it's probably unlikely they'd want a result that is a "merged" directory with some files from both sides -- much more likely they'd want one whole directory from one side to overwrite the whole directory on the other. In my personal experience of using bisync, I've found that in a scenario where a Suppose that I've been working on my desktop for awhile, and something goes wrong that requires a
This works, but it's more steps, especially with making sure to get the filters right, and the It also occurs to me that in more of a "star topology" setup, an asymmetrical I don't think it is as simple as "asymmetrical = destructive", because it depends on context. I've just described some scenarios where the symmetrical option would be the more destructive. I likewise find "newer = better" too simplistic, as it fails to consider intentional rollbacks to older versions, and the damage it could be causing to the repo as a whole by prioritizing the newness of individual files, without regard for their interdependence. I also don't subscribe to the view that there is anything inherently "unsafe" about an asymmetrical tool, so long as it is clearly documented. Of course, any tool is "unsafe" if used without a full understanding of how it works. 🙂 All of that said, if there is consensus around the Ultimately, all of these concerns are substantially mitigated by the fact that |
Agreed @cjnaz thank you for providing such a brilliant tool which @nielash is now excellently managing/adding to!
I agree and you're not wrong but I've added to the below anyways 😆 Points in this thread (heavily generalised)
I agree that for project directories (code, music, video, etc..), and perhaps some single-user environments, a 'path1'/'path2' resync can make a lot of sense otherwise for non-project/multi-user directories 'newer' does.
Decisions It would seem everybody agrees that bisync could benefit from being optimised to use existing
And are these changes something @ncw (and everyone else) would be happy with. Thanks 🙏 EDIT: to show the commands of asymmetric and symmetric syncs |
I agree -- this is a sensible approach, and allows each user to choose the best option for their own use case.
I think I understand what you're getting at -- which is that a
Just to clarify: it already uses the same underlying function as
Agree. |
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, the path1 version of a file always prevailed during --resync, and many users requested options to automatically select the winner based on characteristics such as newer, older, larger, and smaller. This change adds support for such options. Note that ideally this feature would have been implemented by allowing the existing `--resync` flag to optionally accept string values such as `--resync newer`. However, this would have been a breaking change, as the existing flag is a `bool` and it does not seem to be possible to have a `string` flag that accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal` does not work for this, as it would force an `=` like `--resync=newer`.) So instead, the best compromise to avoid a breaking change was to add a new `--resync-mode CHOICE` flag that implies `--resync`, while maintaining the existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both flags are now valid, and either can be used without the other. --resync-mode CHOICE In the event that a file differs on both sides during a `--resync`, `--resync-mode` controls which version will overwrite the other. The supported options are similar to `--conflict-resolve`. For all of the following options, the version that is kept is referred to as the "winner", and the version that is overwritten (deleted) is referred to as the "loser". The options are named after the "winner": - `path1` - (the default) - the version from Path1 is unconditionally considered the winner (regardless of `modtime` and `size`, if any). This can be useful if one side is more trusted or up-to-date than the other, at the time of the `--resync`. - `path2` - same as `path1`, except the path2 version is considered the winner. - `newer` - the newer file (by `modtime`) is considered the winner, regardless of which side it came from. This may result in having a mix of some winners from Path1, and some winners from Path2. (The implementation is analagous to running `rclone copy --update` in both directions.) - `older` - same as `newer`, except the older file is considered the winner, and the newer file is considered the loser. - `larger` - the larger file (by `size`) is considered the winner (regardless of `modtime`, if any). This can be a useful option for remotes without `modtime` support, or with the kinds of files (such as logs) that tend to grow but not shrink, over time. - `smaller` - the smaller file (by `size`) is considered the winner (regardless of `modtime`, if any). For all of the above options, note the following: - If either of the underlying remotes lacks support for the chosen method, it will be ignored and will fall back to the default of `path1`. (For example, if `--resync-mode newer` is set, but one of the paths uses a remote that doesn't support `modtime`.) - If a winner can't be determined because the chosen method's attribute is missing or equal, it will be ignored, and bisync will instead try to determine whether the files differ by looking at the other `--compare` methods in effect. (For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes are identical, bisync will compare the sizes.) If bisync concludes that they differ, preference is given to whichever is the "source" at that moment. (In practice, this gives a slight advantage to Path2, as the 2to1 copy comes before the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides are already correct). - These options apply only to files that exist on both sides (with the same name and relative path). Files that exist *only* on one side and not the other are *always* copied to the other, during `--resync` (this is one of the main differences between resync and non-resync runs.). - `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not apply during `--resync`, and unlike these flags, nothing is renamed during `--resync`. When a file differs on both sides during `--resync`, one version always overwrites the other (much like in `rclone copy`.) (Consider using `--backup-dir` to retain a backup of the losing version.) - Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option (or rather, it will be interpreted as "no resync", unless `--resync` has also been specified, in which case it will be ignored.) - Winners and losers are decided at the individual file-level only (there is not currently an option to pick an entire winning directory atomically, although the `path1` and `path2` options typically produce a similar result.) - To maintain backward-compatibility, the `--resync` flag implies `--resync-mode path1` unless a different `--resync-mode` is explicitly specified. Similarly, all `--resync-mode` options (except `none`) imply `--resync`, so it is not necessary to use both the `--resync` and `--resync-mode` flags simultaneously -- either one is sufficient without the other.
Before this change, the path1 version of a file always prevailed during --resync, and many users requested options to automatically select the winner based on characteristics such as newer, older, larger, and smaller. This change adds support for such options. Note that ideally this feature would have been implemented by allowing the existing `--resync` flag to optionally accept string values such as `--resync newer`. However, this would have been a breaking change, as the existing flag is a `bool` and it does not seem to be possible to have a `string` flag that accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal` does not work for this, as it would force an `=` like `--resync=newer`.) So instead, the best compromise to avoid a breaking change was to add a new `--resync-mode CHOICE` flag that implies `--resync`, while maintaining the existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both flags are now valid, and either can be used without the other. --resync-mode CHOICE In the event that a file differs on both sides during a `--resync`, `--resync-mode` controls which version will overwrite the other. The supported options are similar to `--conflict-resolve`. For all of the following options, the version that is kept is referred to as the "winner", and the version that is overwritten (deleted) is referred to as the "loser". The options are named after the "winner": - `path1` - (the default) - the version from Path1 is unconditionally considered the winner (regardless of `modtime` and `size`, if any). This can be useful if one side is more trusted or up-to-date than the other, at the time of the `--resync`. - `path2` - same as `path1`, except the path2 version is considered the winner. - `newer` - the newer file (by `modtime`) is considered the winner, regardless of which side it came from. This may result in having a mix of some winners from Path1, and some winners from Path2. (The implementation is analagous to running `rclone copy --update` in both directions.) - `older` - same as `newer`, except the older file is considered the winner, and the newer file is considered the loser. - `larger` - the larger file (by `size`) is considered the winner (regardless of `modtime`, if any). This can be a useful option for remotes without `modtime` support, or with the kinds of files (such as logs) that tend to grow but not shrink, over time. - `smaller` - the smaller file (by `size`) is considered the winner (regardless of `modtime`, if any). For all of the above options, note the following: - If either of the underlying remotes lacks support for the chosen method, it will be ignored and will fall back to the default of `path1`. (For example, if `--resync-mode newer` is set, but one of the paths uses a remote that doesn't support `modtime`.) - If a winner can't be determined because the chosen method's attribute is missing or equal, it will be ignored, and bisync will instead try to determine whether the files differ by looking at the other `--compare` methods in effect. (For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes are identical, bisync will compare the sizes.) If bisync concludes that they differ, preference is given to whichever is the "source" at that moment. (In practice, this gives a slight advantage to Path2, as the 2to1 copy comes before the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides are already correct). - These options apply only to files that exist on both sides (with the same name and relative path). Files that exist *only* on one side and not the other are *always* copied to the other, during `--resync` (this is one of the main differences between resync and non-resync runs.). - `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not apply during `--resync`, and unlike these flags, nothing is renamed during `--resync`. When a file differs on both sides during `--resync`, one version always overwrites the other (much like in `rclone copy`.) (Consider using `--backup-dir` to retain a backup of the losing version.) - Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option (or rather, it will be interpreted as "no resync", unless `--resync` has also been specified, in which case it will be ignored.) - Winners and losers are decided at the individual file-level only (there is not currently an option to pick an entire winning directory atomically, although the `path1` and `path2` options typically produce a similar result.) - To maintain backward-compatibility, the `--resync` flag implies `--resync-mode path1` unless a different `--resync-mode` is explicitly specified. Similarly, all `--resync-mode` options (except `none`) imply `--resync`, so it is not necessary to use both the `--resync` and `--resync-mode` flags simultaneously -- either one is sufficient without the other.
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in #5681!)
Before this change, the path1 version of a file always prevailed during --resync, and many users requested options to automatically select the winner based on characteristics such as newer, older, larger, and smaller. This change adds support for such options. Note that ideally this feature would have been implemented by allowing the existing `--resync` flag to optionally accept string values such as `--resync newer`. However, this would have been a breaking change, as the existing flag is a `bool` and it does not seem to be possible to have a `string` flag that accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal` does not work for this, as it would force an `=` like `--resync=newer`.) So instead, the best compromise to avoid a breaking change was to add a new `--resync-mode CHOICE` flag that implies `--resync`, while maintaining the existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both flags are now valid, and either can be used without the other. --resync-mode CHOICE In the event that a file differs on both sides during a `--resync`, `--resync-mode` controls which version will overwrite the other. The supported options are similar to `--conflict-resolve`. For all of the following options, the version that is kept is referred to as the "winner", and the version that is overwritten (deleted) is referred to as the "loser". The options are named after the "winner": - `path1` - (the default) - the version from Path1 is unconditionally considered the winner (regardless of `modtime` and `size`, if any). This can be useful if one side is more trusted or up-to-date than the other, at the time of the `--resync`. - `path2` - same as `path1`, except the path2 version is considered the winner. - `newer` - the newer file (by `modtime`) is considered the winner, regardless of which side it came from. This may result in having a mix of some winners from Path1, and some winners from Path2. (The implementation is analagous to running `rclone copy --update` in both directions.) - `older` - same as `newer`, except the older file is considered the winner, and the newer file is considered the loser. - `larger` - the larger file (by `size`) is considered the winner (regardless of `modtime`, if any). This can be a useful option for remotes without `modtime` support, or with the kinds of files (such as logs) that tend to grow but not shrink, over time. - `smaller` - the smaller file (by `size`) is considered the winner (regardless of `modtime`, if any). For all of the above options, note the following: - If either of the underlying remotes lacks support for the chosen method, it will be ignored and will fall back to the default of `path1`. (For example, if `--resync-mode newer` is set, but one of the paths uses a remote that doesn't support `modtime`.) - If a winner can't be determined because the chosen method's attribute is missing or equal, it will be ignored, and bisync will instead try to determine whether the files differ by looking at the other `--compare` methods in effect. (For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes are identical, bisync will compare the sizes.) If bisync concludes that they differ, preference is given to whichever is the "source" at that moment. (In practice, this gives a slight advantage to Path2, as the 2to1 copy comes before the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides are already correct). - These options apply only to files that exist on both sides (with the same name and relative path). Files that exist *only* on one side and not the other are *always* copied to the other, during `--resync` (this is one of the main differences between resync and non-resync runs.). - `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not apply during `--resync`, and unlike these flags, nothing is renamed during `--resync`. When a file differs on both sides during `--resync`, one version always overwrites the other (much like in `rclone copy`.) (Consider using `--backup-dir` to retain a backup of the losing version.) - Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option (or rather, it will be interpreted as "no resync", unless `--resync` has also been specified, in which case it will be ignored.) - Winners and losers are decided at the individual file-level only (there is not currently an option to pick an entire winning directory atomically, although the `path1` and `path2` options typically produce a similar result.) - To maintain backward-compatibility, the `--resync` flag implies `--resync-mode path1` unless a different `--resync-mode` is explicitly specified. Similarly, all `--resync-mode` options (except `none`) imply `--resync`, so it is not necessary to use both the `--resync` and `--resync-mode` flags simultaneously -- either one is sufficient without the other.
Before this change, the path1 version of a file always prevailed during --resync, and many users requested options to automatically select the winner based on characteristics such as newer, older, larger, and smaller. This change adds support for such options. Note that ideally this feature would have been implemented by allowing the existing `--resync` flag to optionally accept string values such as `--resync newer`. However, this would have been a breaking change, as the existing flag is a `bool` and it does not seem to be possible to have a `string` flag that accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal` does not work for this, as it would force an `=` like `--resync=newer`.) So instead, the best compromise to avoid a breaking change was to add a new `--resync-mode CHOICE` flag that implies `--resync`, while maintaining the existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both flags are now valid, and either can be used without the other. --resync-mode CHOICE In the event that a file differs on both sides during a `--resync`, `--resync-mode` controls which version will overwrite the other. The supported options are similar to `--conflict-resolve`. For all of the following options, the version that is kept is referred to as the "winner", and the version that is overwritten (deleted) is referred to as the "loser". The options are named after the "winner": - `path1` - (the default) - the version from Path1 is unconditionally considered the winner (regardless of `modtime` and `size`, if any). This can be useful if one side is more trusted or up-to-date than the other, at the time of the `--resync`. - `path2` - same as `path1`, except the path2 version is considered the winner. - `newer` - the newer file (by `modtime`) is considered the winner, regardless of which side it came from. This may result in having a mix of some winners from Path1, and some winners from Path2. (The implementation is analagous to running `rclone copy --update` in both directions.) - `older` - same as `newer`, except the older file is considered the winner, and the newer file is considered the loser. - `larger` - the larger file (by `size`) is considered the winner (regardless of `modtime`, if any). This can be a useful option for remotes without `modtime` support, or with the kinds of files (such as logs) that tend to grow but not shrink, over time. - `smaller` - the smaller file (by `size`) is considered the winner (regardless of `modtime`, if any). For all of the above options, note the following: - If either of the underlying remotes lacks support for the chosen method, it will be ignored and will fall back to the default of `path1`. (For example, if `--resync-mode newer` is set, but one of the paths uses a remote that doesn't support `modtime`.) - If a winner can't be determined because the chosen method's attribute is missing or equal, it will be ignored, and bisync will instead try to determine whether the files differ by looking at the other `--compare` methods in effect. (For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes are identical, bisync will compare the sizes.) If bisync concludes that they differ, preference is given to whichever is the "source" at that moment. (In practice, this gives a slight advantage to Path2, as the 2to1 copy comes before the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides are already correct). - These options apply only to files that exist on both sides (with the same name and relative path). Files that exist *only* on one side and not the other are *always* copied to the other, during `--resync` (this is one of the main differences between resync and non-resync runs.). - `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not apply during `--resync`, and unlike these flags, nothing is renamed during `--resync`. When a file differs on both sides during `--resync`, one version always overwrites the other (much like in `rclone copy`.) (Consider using `--backup-dir` to retain a backup of the losing version.) - Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option (or rather, it will be interpreted as "no resync", unless `--resync` has also been specified, in which case it will be ignored.) - Winners and losers are decided at the individual file-level only (there is not currently an option to pick an entire winning directory atomically, although the `path1` and `path2` options typically produce a similar result.) - To maintain backward-compatibility, the `--resync` flag implies `--resync-mode path1` unless a different `--resync-mode` is explicitly specified. Similarly, all `--resync-mode` options (except `none`) imply `--resync`, so it is not necessary to use both the `--resync` and `--resync-mode` flags simultaneously -- either one is sufficient without the other.
* master: (86 commits) fs: add more detailed logging for file includes/excludes bisync: add --resync-mode for customizing --resync - fixes rclone#5681 bisync: fix --colors flag bisync: factor resync to separate file bisync: skip empty test case dirs bisync: add options to auto-resolve conflicts - fixes rclone#7471 bisync: check for syntax errors in path args - fixes rclone#7511 bisync: add overlapping paths check bisync: allow lock file expiration/renewal with --max-lock - rclone#7470 bisync: Graceful Shutdown, --recover from interruptions without --resync - fixes rclone#7470 bisync: full support for comparing checksum, size, modtime - fixes rclone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675 bisync: document beta status more clearly - fixes rclone#6082 bisync: normalize session name to non-canonical - fixes rclone#7423 bisync: update version number in docs bisync: account for differences in backend features on integration tests - see rclone#5679 operations: fix renaming a file on macOS bisync: fallback to cryptcheck or --download when can't check hash local: fix cleanRootPath on Windows after go1.21.4 stdlib update bisync: support two --backup-dir paths on different remotes bisync: support files with unknown length, including Google Docs - fixes rclone#5696 ...
Before this change, --resync was handled in three steps, and needed to do a lot of unnecessary work to implement its own --ignore-existing logic, which also caused problems with unicode normalization, in addition to being pretty slow. After this change, it is refactored to produce the same result much more efficiently, by reducing the three steps to two and letting ci.IgnoreExisting do the work instead of reinventing the wheel. The behavior and sync order remain unchanged for now -- just faster (but see the ongoing lively discussions about potential future changes in rclone#5681!)
Before this change, the path1 version of a file always prevailed during --resync, and many users requested options to automatically select the winner based on characteristics such as newer, older, larger, and smaller. This change adds support for such options. Note that ideally this feature would have been implemented by allowing the existing `--resync` flag to optionally accept string values such as `--resync newer`. However, this would have been a breaking change, as the existing flag is a `bool` and it does not seem to be possible to have a `string` flag that accepts both `--resync newer` and `--resync` (with no argument.) (`NoOptDefVal` does not work for this, as it would force an `=` like `--resync=newer`.) So instead, the best compromise to avoid a breaking change was to add a new `--resync-mode CHOICE` flag that implies `--resync`, while maintaining the existing behavior of `--resync` (which implies `--resync-mode path1`. i.e. both flags are now valid, and either can be used without the other. --resync-mode CHOICE In the event that a file differs on both sides during a `--resync`, `--resync-mode` controls which version will overwrite the other. The supported options are similar to `--conflict-resolve`. For all of the following options, the version that is kept is referred to as the "winner", and the version that is overwritten (deleted) is referred to as the "loser". The options are named after the "winner": - `path1` - (the default) - the version from Path1 is unconditionally considered the winner (regardless of `modtime` and `size`, if any). This can be useful if one side is more trusted or up-to-date than the other, at the time of the `--resync`. - `path2` - same as `path1`, except the path2 version is considered the winner. - `newer` - the newer file (by `modtime`) is considered the winner, regardless of which side it came from. This may result in having a mix of some winners from Path1, and some winners from Path2. (The implementation is analagous to running `rclone copy --update` in both directions.) - `older` - same as `newer`, except the older file is considered the winner, and the newer file is considered the loser. - `larger` - the larger file (by `size`) is considered the winner (regardless of `modtime`, if any). This can be a useful option for remotes without `modtime` support, or with the kinds of files (such as logs) that tend to grow but not shrink, over time. - `smaller` - the smaller file (by `size`) is considered the winner (regardless of `modtime`, if any). For all of the above options, note the following: - If either of the underlying remotes lacks support for the chosen method, it will be ignored and will fall back to the default of `path1`. (For example, if `--resync-mode newer` is set, but one of the paths uses a remote that doesn't support `modtime`.) - If a winner can't be determined because the chosen method's attribute is missing or equal, it will be ignored, and bisync will instead try to determine whether the files differ by looking at the other `--compare` methods in effect. (For example, if `--resync-mode newer` is set, but the Path1 and Path2 modtimes are identical, bisync will compare the sizes.) If bisync concludes that they differ, preference is given to whichever is the "source" at that moment. (In practice, this gives a slight advantage to Path2, as the 2to1 copy comes before the 1to2 copy.) If the files _do not_ differ, nothing is copied (as both sides are already correct). - These options apply only to files that exist on both sides (with the same name and relative path). Files that exist *only* on one side and not the other are *always* copied to the other, during `--resync` (this is one of the main differences between resync and non-resync runs.). - `--conflict-resolve`, `--conflict-loser`, and `--conflict-suffix` do not apply during `--resync`, and unlike these flags, nothing is renamed during `--resync`. When a file differs on both sides during `--resync`, one version always overwrites the other (much like in `rclone copy`.) (Consider using `--backup-dir` to retain a backup of the losing version.) - Unlike for `--conflict-resolve`, `--resync-mode none` is not a valid option (or rather, it will be interpreted as "no resync", unless `--resync` has also been specified, in which case it will be ignored.) - Winners and losers are decided at the individual file-level only (there is not currently an option to pick an entire winning directory atomically, although the `path1` and `path2` options typically produce a similar result.) - To maintain backward-compatibility, the `--resync` flag implies `--resync-mode path1` unless a different `--resync-mode` is explicitly specified. Similarly, all `--resync-mode` options (except `none`) imply `--resync`, so it is not necessary to use both the `--resync` and `--resync-mode` flags simultaneously -- either one is sufficient without the other.
Synposis
Clearly explain in documentation whether
--resync
is asymmetrical.(@cjnaz could you comment?)
Investigate whether
--resync
can be re-implemented by the equivalent of:Prior discussions
optimize --resync
cjnaz/rclonesync-V2#66 (eric-void) -> (cjnaz)
The first-sync optimization is deferred for now since its not broke and should be rarely run.
#5164 (comment) (ncw)
So
--resync
is asymetrical?Perhaps it would be better implemented by the equivalent of
How to use GitHub
The text was updated successfully, but these errors were encountered: