Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bisync: honor --checksum and --ignore-checksum #5683

Closed
ivandeex opened this issue Oct 8, 2021 · 1 comment · Fixed by #7498
Closed

bisync: honor --checksum and --ignore-checksum #5683

ivandeex opened this issue Oct 8, 2021 · 1 comment · Fixed by #7498

Comments

@ivandeex
Copy link
Member

ivandeex commented Oct 8, 2021

Synopsis

Delta engine in the bisync beta can only use modtime to detect changes.
We should take hashsums into consideration too, respect the --checksum flag (and probably --ignore-checksum, but this has to be confirmed).

Prior discussions

See cjnaz/rclonesync-V2#5 and...

#5164 (comment) (cjnaz)

Data point...
The TiBU directory has 19GB of files. Path2 is a local dir. 2+ minutes for the Path2 checking for diffs step.

I tried the --ignore-checksum switch to no avail.

2021/06/10 14:36:26 INFO  : Synching Path1 "owncloud:TiBU/" with Path2 "/.../TiBU/"
2021/06/10 14:36:26 INFO  : Path1 checking for diffs
2021/06/10 14:36:26 INFO  : Path2 checking for diffs
2021/06/10 14:38:37 INFO  : No changes found

#5164 (comment) (ivandeex) -> (ncw)

Current beta contains a quick fix: in presence of --ignore-checksum the hashes will never be calculated (be it local fs, remote fs, or whatever). This is not a real solution but a kludge for your test.
For a real solution we have to develop a number of rclonic rules for all case combinations: what to do if listings contain hash but user flagged ignore, how future name tracker will interact with this flag, should we treat backends with fast and slow hashsums differently, how --checksum and --ignore-checksum interact, etc, etc... It's a separate discussion and I'll drop this quick fix after all questions get solved.
For now you can try --ignore-checksum and evaluate the speed.

#5164 (comment) (cjnaz)

--ignore-checksum brought the 19GB no-changes run down from 2 minutes to less than 1 second. Thanks for the interim hack.

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@ivandeex ivandeex added this to the v1.58 milestone Oct 8, 2021
@ivandeex ivandeex self-assigned this Oct 8, 2021
@ivandeex ivandeex added this to To do in bisync Oct 8, 2021
@ivandeex ivandeex modified the milestones: v1.58, v1.59 Jan 14, 2022
@ivandeex ivandeex modified the milestones: v1.59, v1.60 Feb 13, 2022
@ncw ncw modified the milestones: v1.60, Help Wanted Dec 5, 2022
nielash added a commit to nielash/rclone that referenced this issue Apr 23, 2023
…and equality check

bisync: bug fixes and new features including --create-empty-src-dirs and equality check

* Fixed an issue causing dry runs to inadvertently commit filter changes
* Fixed an issue causing --resync to erroneously delete empty folders and duplicate files unique to Path2
* --check-access is now enforced during --resync, preventing data loss in certain user error scenarios
* Fixed an issue causing bisync to consider more files than necessary due to overbroad filters during delete operations
* Improved detection of false positive change conflicts (identical files are now left alone instead of renamed)
* Added support for --create-empty-src-dirs
* Added experimental --resilient mode to allow recovery from self-correctable errors
* Added new --ignore-listing-checksum flag to distinguish from --ignore-checksum
* Performance improvements for large remotes
* Documentation and testing improvements

Fixes rclone#6109

Also addresses: rclone#6841 rclone#5683 rclone#5681 rclone#5676 rclone#5675 rclone#5674

See also: https://forum.rclone.org/t/bisync-bugs-and-feature-requests/37636
@nielash
Copy link
Collaborator

nielash commented Aug 23, 2023

Current beta contains a quick fix: in presence of --ignore-checksum the hashes will never be calculated (be it local fs, remote fs, or whatever). This is not a real solution but a kludge for your test.

I think the --ignore-checksum part of this ticket is mostly addressed by 66ccc7c.

The --checksum part remains an important to-do. Among other things, it will help us lift the ban on backends that don't support modtime.

@nielash nielash moved this from To do to In progress in bisync Sep 5, 2023
@nielash nielash self-assigned this Sep 5, 2023
@nielash nielash modified the milestones: Help Wanted, Soon Sep 5, 2023
nielash added a commit to nielash/rclone that referenced this issue Dec 9, 2023
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
nielash added a commit to nielash/rclone that referenced this issue Dec 9, 2023
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
nielash added a commit to nielash/rclone that referenced this issue Dec 15, 2023
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
nielash added a commit to nielash/rclone that referenced this issue Jan 20, 2024
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
@nielash nielash moved this from In progress to Done in bisync Jan 20, 2024
miku added a commit to internetarchive/rclone that referenced this issue Jan 23, 2024
* master: (86 commits)
  fs: add more detailed logging for file includes/excludes
  bisync: add --resync-mode for customizing --resync - fixes rclone#5681
  bisync: fix --colors flag
  bisync: factor resync to separate file
  bisync: skip empty test case dirs
  bisync: add options to auto-resolve conflicts - fixes rclone#7471
  bisync: check for syntax errors in path args - fixes rclone#7511
  bisync: add overlapping paths check
  bisync: allow lock file expiration/renewal with --max-lock - rclone#7470
  bisync: Graceful Shutdown, --recover from interruptions without --resync - fixes rclone#7470
  bisync: full support for comparing checksum, size, modtime - fixes rclone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675
  bisync: document beta status more clearly - fixes rclone#6082
  bisync: normalize session name to non-canonical - fixes rclone#7423
  bisync: update version number in docs
  bisync: account for differences in backend features on integration tests - see rclone#5679
  operations: fix renaming a file on macOS
  bisync: fallback to cryptcheck or --download when can't check hash
  local: fix cleanRootPath on Windows after go1.21.4 stdlib update
  bisync: support two --backup-dir paths on different remotes
  bisync: support files with unknown length, including Google Docs - fixes rclone#5696
  ...
WuTofu pushed a commit to WuTofu/rclone that referenced this issue Feb 24, 2024
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment