Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bisync: add modtime support #5679

Closed
ivandeex opened this issue Oct 8, 2021 · 1 comment · Fixed by #7498
Closed

bisync: add modtime support #5679

ivandeex opened this issue Oct 8, 2021 · 1 comment · Fixed by #7498

Comments

@ivandeex
Copy link
Member

ivandeex commented Oct 8, 2021

What problem are you are trying to solve?

bisync delta engine and unit tests do not take backend time tolerance into account. this should be fixed.

modtime in unit tests

#5164 (comment) (ivandeex)

the unit test runner is based on mangleListing which is a very far relative of fstest.CompareItems. It already has ability to ignore hashsums on demand. It still needs ability to compare times with per-backend tolerance or even ignore times if unsupported. Having that would let us run tests against backends without time support and probably lift the corresponding ban.

Tentatively this can be implemented as:

  1. parse out the time filed and convert to time.Time
  2. round time down to doubled tolerance
  3. put time field back into string
  4. feed the result to diff.

Why not just parse and compare with tolerance?
Because mangleResult is two-fold. Besides comparing it can prepare a test state snapshot for storing in golden directory, which later can provide a bigger picture of what's going on.

#5164 (comment) (ivandeex)

With a feature must come a method of testing it.

Rclone has a comprehensive fstest framework dedicated to peculiarities of the plethora of supported cloud providers. Planning to work on this patch I had the heart to keep the project test engine's nature (while translating it from python to golang and refactoring it to work natively/unattended on Windows and with a wider set of backends) and propose it as a new kid on the set of rclone approaches (see my inline comment above), instead of rewriting it from the grounds up for fstest.

The bisync test suite lacks the ability to compare imprecise/absent timestamps as of the time of this writing (I hope I can say it, absent timestamps should be "comparable" too). It works correctly mostly because it compares logs from a particular filesystem with itself. Unless this ward is fixed, we can't run tests against backends without modtime and consequently can't declare them supported. I have an idea on how to fix that (see another inline comment), but the effort will need time.

modtime in delta engine

Try to extend operations.Equal into operations.Compare with more returned conditions like Newer, Older, ... (related to #4810) for direct use in bisync delta engine

#5164 (comment) (JWink3101)

The fact that this only supports a small subset of remotes is disappointing, esp. some popular ones not supporting mod time. Others, like S3, are way more performant if you disable it.

This is an issue with the overall algorithm of some sync tools but there is no reason you need modtime to decide what to sync. You may need it for conflicts but if you have past state, then you can tell if a changed file-size is a modification on one side only.

#5164 (comment) (ivandeex)

This is an issue with the overall algorithm of some sync tools but there is no reason you need modtime to decide what to sync.

Theoretically, no. The bisync change detection in its current state is all about bit masks (as opposed to "rclonic" Equal). But it needs more testing.

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@ivandeex ivandeex added the bisync label Oct 8, 2021
@ivandeex ivandeex added this to the v1.58 milestone Oct 8, 2021
@ivandeex ivandeex self-assigned this Oct 8, 2021
@ivandeex ivandeex modified the milestones: v1.58, v1.59 Jan 14, 2022
@ncw ncw modified the milestones: v1.59, v1.60 Jul 9, 2022
@ncw ncw modified the milestones: v1.60, Known Problem Dec 5, 2022
@nielash
Copy link
Collaborator

nielash commented Aug 23, 2023

Related to this ticket (and also #5676), --check-sync should also support comparing modtime (and eventually, when checksum and size are supported, it should use the same sync key used for the bisync run.)

Currently, --check-sync only compares filenames, not modtime, size, or hash. It can detect whether a file from one Path is missing on the other, but not whether both Paths have different versions of a file. This can cause issues such as this one.

Alternatively, perhaps this would be better handled by outsourcing it to the more robust rclone check, rather than attempting to duplicate the same functionality in --check-sync.

Here's where the trouble is:

for _, file := range files1.list {
if !files2.has(file) {
b.indent("ERROR", file, "Path1 file not found in Path2")
ok = false
}
}

// fileList represents a listing
type fileList struct {
list []string
info map[string]*fileInfo
hash hash.Type
}

@nielash nielash self-assigned this Sep 5, 2023
@nielash nielash modified the milestones: Known Problem, Soon Sep 5, 2023
nielash added a commit to nielash/rclone that referenced this issue Nov 22, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Nov 22, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Nov 22, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Nov 23, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Nov 26, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Nov 30, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Dec 4, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Dec 8, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Dec 8, 2023
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Dec 9, 2023
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
nielash added a commit to nielash/rclone that referenced this issue Dec 9, 2023
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
nielash added a commit to nielash/rclone that referenced this issue Dec 15, 2023
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
nielash added a commit that referenced this issue Jan 20, 2024
…sts - see #5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
nielash added a commit to nielash/rclone that referenced this issue Jan 20, 2024
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
miku added a commit to internetarchive/rclone that referenced this issue Jan 23, 2024
* master: (86 commits)
  fs: add more detailed logging for file includes/excludes
  bisync: add --resync-mode for customizing --resync - fixes rclone#5681
  bisync: fix --colors flag
  bisync: factor resync to separate file
  bisync: skip empty test case dirs
  bisync: add options to auto-resolve conflicts - fixes rclone#7471
  bisync: check for syntax errors in path args - fixes rclone#7511
  bisync: add overlapping paths check
  bisync: allow lock file expiration/renewal with --max-lock - rclone#7470
  bisync: Graceful Shutdown, --recover from interruptions without --resync - fixes rclone#7470
  bisync: full support for comparing checksum, size, modtime - fixes rclone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675
  bisync: document beta status more clearly - fixes rclone#6082
  bisync: normalize session name to non-canonical - fixes rclone#7423
  bisync: update version number in docs
  bisync: account for differences in backend features on integration tests - see rclone#5679
  operations: fix renaming a file on macOS
  bisync: fallback to cryptcheck or --download when can't check hash
  local: fix cleanRootPath on Windows after go1.21.4 stdlib update
  bisync: support two --backup-dir paths on different remotes
  bisync: support files with unknown length, including Google Docs - fixes rclone#5696
  ...
Fornax96 pushed a commit to Fornaxian/rclone that referenced this issue Jul 30, 2024
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
Fornax96 pushed a commit to Fornaxian/rclone that referenced this issue Jul 30, 2024
…lone#5679 fixes rclone#5683 fixes rclone#5684 fixes rclone#5675

Before this change, bisync could only detect changes based on modtime, and
would refuse to run if either path lacked modtime support. This made bisync
unavailable for many of rclone's backends. Additionally, bisync did not account
for the Fs's precision when comparing modtimes, meaning that they could only be
reliably compared within the same side -- not against the opposite side. Size
and checksum (even when available) were ignored completely for deltas.

After this change, bisync now fully supports comparing based on any combination
of size, modtime, and checksum, lifting the prior restriction on backends
without modtime support. The comparison logic considers the backend's
precision, hash types, and other features as appropriate.

The comparison features optionally use a new --compare flag (which takes any
combination of size,modtime,checksum) and even supports some combinations not
otherwise supported in `sync` (like comparing all three at the same time.) By
default (without the --compare flag), bisync inherits the same comparison
options as `sync` (that is: size and modtime by default, unless modified with
flags such as --checksum or --size-only.) If the --compare flag is set, it will
override these defaults.

If --compare includes checksum and both remotes support checksums but have no
hash types in common with each other, checksums will be considered only for
comparisons within the same side (to determine what has changed since the prior
sync), but not for comparisons against the opposite side. If one side supports
checksums and the other does not, checksums will only be considered on the side
that supports them. When comparing with checksum and/or size without modtime,
bisync cannot determine whether a file is newer or older -- only whether it is
changed or unchanged. (If it is changed on both sides, bisync still does the
standard equality-check to avoid declaring a sync conflict unless it absolutely
has to.)

Also included are some new flags to customize the checksum comparison behavior
on backends where hashes are slow or unavailable. --no-slow-hash and
--slow-hash-sync-only allow selectively ignoring checksums on backends such as
local where they are slow. --download-hash allows computing them by downloading
when (and only when) they're otherwise not available. Of course, this option
probably won't be practical with large files, but may be a good option for
syncing small-but-important files with maximum accuracy (for example, a source
code repo on a crypt remote.) An additional advantage over methods like
cryptcheck is that the original file is not required for comparison (for
example, --download-hash can be used to bisync two different crypt remotes with
different passwords.)

Additionally, all of the above are now considered during the final --check-sync
for much-improved accuracy (before this change, it only compared filenames!)

Many other details are explained in the included docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment