Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bisync: logger, march, and several big bug fixes #7410

Merged
merged 26 commits into from Jan 20, 2024

Conversation

nielash
Copy link
Collaborator

@nielash nielash commented Nov 8, 2023

What is the purpose of this change?

A suite of several related fixes tackling bisync's biggest pain points.

Highlights:

Was the change discussed in an issue or in the forum before?

Yes. See this post and the individual issues linked above.

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

@nielash
Copy link
Collaborator Author

nielash commented Nov 9, 2023

(I think the windows error is unrelated, possibly a side-effect of recent Go security fix.)
(EDIT: yup. golang/go@eda42f7)

@nielash nielash force-pushed the bisync-logger branch 3 times, most recently from aa1da82 to f4efc4b Compare November 15, 2023 16:29
@nielash
Copy link
Collaborator Author

nielash commented Nov 15, 2023

@ncw Thought I'd bring f65dc5f to your attention related to your fix from yesterday regarding the filepath.VolumeName change in go1.21.4. The same issue appeared to be breaking NewFs on Windows in a way that wasn't previously showing up on tests. (I added one here that catches it.)

Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that was a lot of commits! It contains some excellent work - lovely logging features, loads of bisync fixes and features - well done :-)

I'd like to release v1.65 within the next week or so, so how much of this do you want in that?

I have put some comments inline but there isn't anything serious in there. I'm happy to work on re-arranging the flags later. I don't want to merge the GUI now, but I will consider it for later.

So have a think about which commits you want in v1.65

From my point of view I think we should try to get the logging stuff in, and as many bisync fixes as we can.

PS smaller PRs in future please :-)

fs/operations/operations.go Outdated Show resolved Hide resolved
fs/operations/operations.go Outdated Show resolved Hide resolved
fs/operations/operations.go Outdated Show resolved Hide resolved
cmd/lsf/lsf_test.go Outdated Show resolved Hide resolved
cmd/sync/sync.go Outdated Show resolved Hide resolved
backend/drive/drive.go Outdated Show resolved Hide resolved
cmd/bisync/deltas.go Outdated Show resolved Hide resolved
backend/local/local.go Show resolved Hide resolved
cmd/bisync/cmd.go Outdated Show resolved Hide resolved
@nielash
Copy link
Collaborator Author

nielash commented Nov 20, 2023

Thanks! And sorry about the massive number of commits! 😅 Nearly all of them depend on earlier ones, so I wasn't sure how I could separate them into different PRs.

I can take out the GUI for now, that's fine. As for the others, I'd like to get in as many as you'll allow! I can make some quick revisions here based on your comments. (I'll add some responses inline.)

@nielash nielash force-pushed the bisync-logger branch 2 times, most recently from 71bb263 to 67cbbed Compare November 22, 2023 01:48
@nielash
Copy link
Collaborator Author

nielash commented Nov 22, 2023

@ncw Here's a revision. I think I followed all of your instructions, but let me know if I misunderstood anything! Also, don't hate me but I added a commit to fix integration tests, and two others that were pre-requisites for that.

I also rebased and resolved conflicts.

@acegene
Copy link

acegene commented Nov 22, 2023

bisync: merge copies and deletes, support --track-renames and --backup-dir -- fixes #5690 and #5685

Regarding mostly the above (and to a lesser extent any bisync changes), I am trying to beta test. So far im using the version specified in tag/v1.65.0-beta.7496.004c12643.bisync-logger-alpha, any chance you could generate a more up-to-date delivery? Regardless, thanks for your work!

@nielash
Copy link
Collaborator Author

nielash commented Nov 22, 2023

@acegene thanks! If you're comfortable installing from source, you can download the latest version from the nielash:bisync-logger branch and compile it. Otherwise I'm happy to upload some updated binaries later today or tomorrow!

@nielash
Copy link
Collaborator Author

nielash commented Nov 23, 2023

@acegene Here are some updated binaries if you'd like to test it out. Feedback is welcome :)
Bisync Feature Preview 2023-11-23

@crocinsocks
Copy link

@nielash you are a legend, thank you!! Normal sync without changes takes around 30s whereas before it could take anywhere between 10-20 minutes.. Been using for a few days now without issue.

Reading the comments of --resilient changes I wasn't sure whether this would mean that an exited sync would be able to recover? From my tests it does not however I not using the changes from 3 days ago rather bisync-logger-alpha-2. Am I wrong in thinking this and therefore should I open an issue specifically for this case? n.b. this is so a computer can be shutdown / rclone exited without breaking and requiring a --resync..

Also, just off our chats in the past should I open and start an issue for conflict renaming so it is not so ambiguous?

Thank you 🙏

@nielash
Copy link
Collaborator Author

nielash commented Nov 29, 2023

@nielash you are a legend, thank you!! Normal sync without changes takes around 30s whereas before it could take anywhere between 10-20 minutes.. Been using for a few days now without issue.

So glad to hear it is working well for you! 😃

Reading the comments of --resilient changes I wasn't sure whether this would mean that an exited sync would be able to recover?

It depends on the reason for the exit. --resilient allows retrying when bisync has chosen to abort itself due to safety features such as failing --check-access or detecting a filter change. It does not currently cover external interruptions such as a user shutting down their computer in the middle of a sync. That said, it's something I would like to add support for, so yes, please open an issue!

Also, just off our chats in the past should I open and start an issue for conflict renaming so it is not so ambiguous?

I was actually just sketching out a plan for this the other day! I'll open an issue today and tag you so we can discuss it.

@nielash
Copy link
Collaborator Author

nielash commented Dec 4, 2023

Hi @ncw, question for you -- I am working on some other bisync issues now and would like to submit a new PR for them since this one is too big already 😅 However, they build on the work I did here, so I'm not sure if there's a way to avoid duplicating all of these commits, as I don't think GH supports "dependent" PRs yet. Is there a better way to handle this?

Refactored the case / unicode normalization logic to be much more efficient,
 and fix the last outstanding issue from rclone#7270. Before this change, we were
 doing lots of for loops and re-normalizing strings we had already normalized
 earlier. Now, we leave the normalizing entirely to March and avoid
 re-transforming later, which seems to make a large difference in terms of
 performance.
Before this change, bisync had no ability to retry in the event of sync errors.
After this change, bisync will retry if --resilient is passed, but only in one
direction at a time. We can safely retry in one direction because the source is
still intact, even if the dest was left in a messy state. If the first
direction still fails after our final retry, we abort and do NOT continue in
the other direction, to prevent the messy dest from polluting the source. If
the first direction succeeds, we do then allow retries in the other direction.

The number of retries is controllable by --retries (default 3)

bisync: high-level retries if --resilient

Before this change, bisync had no ability to retry in the event of sync errors.
After this change, bisync will retry if --resilient is passed, but only in one
direction at a time. We can safely retry in one direction because the source is
still intact, even if the dest was left in a messy state. If the first
direction still fails after our final retry, we abort and do NOT continue in
the other direction, to prevent the messy dest from polluting the source. If
the first direction succeeds, we do then allow retries in the other direction.

The number of retries is controllable by --retries (default 3)
 rclone#5696

Before this change, bisync intentionally ignored Google Docs (albeit in a
buggy way that caused problems during --resync.) After this change, Google Docs
(including Google Sheets, Slides, etc.) are now supported in bisync, subject to
the same options, defaults, and limitations as in `rclone sync`. When bisyncing
drive with non-drive backends, the drive -> non-drive direction is controlled
by `--drive-export-formats` (default `"docx,xlsx,pptx,svg"`) and the non-drive
-> drive direction is controlled by `--drive-import-formats` (default none.)

For example, with the default export/import formats, a Google Sheet on the
drive side will be synced to an `.xlsx` file on the non-drive side. In the
reverse direction, `.xlsx` files with filenames that match an existing Google
Sheet will be synced to that Google Sheet, while `.xlsx` files that do NOT
match an existing Google Sheet will be copied to drive as normal `.xlsx` files
(without conversion to Sheets, although the Google Drive web browser UI may
still give you the option to open it as one.)

If `--drive-import-formats` is set (it's not, by default), then all of the
specified formats will be converted to Google Docs, if there is no existing
Google Doc with a matching name. Caution: such conversion can be quite lossy,
and in most cases it's probably not what you want!

To bisync Google Docs as URL shortcut links (in a manner similar to "Drive for
Desktop"), use: `--drive-export-formats url` (or alternatives.)

Note that these link files cannot be edited on the non-drive side -- you will
get errors if you try to sync an edited link file back to drive. They CAN be
deleted (it will result in deleting the corresponding Google Doc.) If you
create a `.url` file on the non-drive side that does not match an existing
Google Doc, bisyncing it will just result in copying the literal `.url` file
over to drive (no Google Doc will be created.) So, as a general rule of thumb,
think of them as read-only placeholders on the non-drive side, and make all
your changes on the drive side.

Likewise, even with other export-formats, it is best to only move/rename Google
Docs on the drive side. This is because otherwise, bisync will interpret this
as a file deleted and another created, and accordingly, it will delete the
Google Doc and create a new file at the new path. (Whether or not that new file
is a Google Doc depends on `--drive-import-formats`.)

Lastly, take note that all Google Docs on the drive side have a size of `-1`
and no checksum. Therefore, they cannot be reliably synced with the
`--checksum` or `--size-only` flags. (To be exact: they will still get
created/deleted, and bisync's delta engine will notice changes and queue them
for syncing, but the underlying sync function will consider them identical and
skip them.) To work around this, use the default (modtime and size) instead of
`--checksum` or `--size-only`.

To ignore Google Docs entirely, use `--drive-skip-gdocs`.

Nearly all of the Google Docs logic is outsourced to the Drive backend, so
future changes should also be supported by bisync.
Before this change, bisync supported `--backup-dir` only when `Path1` and
`Path2` were different paths on the same remote. With this change, bisync
introduces new `--backup-dir1` and `--backup-dir2` flags to support separate
backup-dirs for `Path1` and `Path2`.

`--backup-dir1` and `--backup-dir2` can use different remotes from each other,
but `--backup-dir1` must use the same remote as `Path1`, and `--backup-dir2`
must use the same remote as `Path2`. Each backup directory must not overlap its
respective bisync Path without being excluded by a filter rule.

The standard `--backup-dir` will also work, if both paths use the same remote
(but note that deleted files from both paths would be mixed together in the
same dir). If either `--backup-dir1` and `--backup-dir2` are set, they will
override `--backup-dir`.
Similar to
rclone@acf1e2d,
go1.21.4 appears to have broken sync.MoveDir on Windows because
filepath.VolumeName() returns `\\?` instead of `\\?\C:` in cleanRootPath. It
looks like the Go team is aware of the issue and planning a fix, so this may
only be needed temporarily.
Bisync checks file equality before renaming sync conflicts by comparing
checksums. Before this change, backends without checksum support (notably
Crypt) would fall back to --size-only for these checks, which is not a very
safe method (differing files can sometimes have the same size, especially if
they're small.) After this change, Crypt remotes fallback to using Cryptcheck
so that checksums can be compared. As a last resort when neither Check nor
Cryptcheck are available, files are compared using --download so that we can be
certain the files are identical regardless of checksum support.
Before this change, a file would sometimes be silently deleted instead of
renamed on macOS, due to its unique handling of unicode normalization. Rclone
already had a SameObject check in place for case insensitivity before deleting
the source (for example if "hello.txt" was renamed to "HELLO.txt"), but had no
such check for unicode normalization. After this change, the delete is skipped
on macOS if the src and dst filenames normalize to the same NFC string.

Example of the previous behavior:

 ~ % rclone touch /Users/nielash/rename_test/ö
 ~ % rclone lsl /Users/nielash/rename_test/ö
        0 2023-11-21 17:28:06.170486000 ö
 ~ % rclone moveto /Users/nielash/rename_test/ö /Users/nielash/rename_test/ö -vv
2023/11/21 17:28:51 DEBUG : rclone: Version "v1.64.0" starting with parameters ["rclone" "moveto" "/Users/nielash/rename_test/ö" "/Users/nielash/rename_test/ö" "-vv"]
2023/11/21 17:28:51 DEBUG : Creating backend with remote "/Users/nielash/rename_test/ö"
2023/11/21 17:28:51 DEBUG : Using config file from "/Users/nielash/.config/rclone/rclone.conf"
2023/11/21 17:28:51 DEBUG : fs cache: adding new entry for parent of "/Users/nielash/rename_test/ö", "/Users/nielash/rename_test"
2023/11/21 17:28:51 DEBUG : Creating backend with remote "/Users/nielash/rename_test/"
2023/11/21 17:28:51 DEBUG : fs cache: renaming cache item "/Users/nielash/rename_test/" to be canonical "/Users/nielash/rename_test"
2023/11/21 17:28:51 DEBUG : ö: Size and modification time the same (differ by 0s, within tolerance 1ns)
2023/11/21 17:28:51 DEBUG : ö: Unchanged skipping
2023/11/21 17:28:51 INFO  : ö: Deleted
2023/11/21 17:28:51 INFO  :
Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Deleted:                1 (files), 0 (dirs)
Elapsed time:         0.0s

2023/11/21 17:28:51 DEBUG : 5 go routines active
 ~ % rclone lsl /Users/nielash/rename_test/
 ~ %
…sts - see rclone#5679

Before this change, integration tests often could not be run on backends with
differing features from the local system that goldenized them. In particular,
differences in modtime precision, checksum support, and encoding would cause
false positives. After this change, the tests more accurately account for the
features of the backend being tested, which allows us to see true positives
more clearly, and more meaningfully assess whether a backend is supported.
Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this is a monster :-)

I think we should merge it now so we can unblock the dev process. We've got time to fix up breakage before the next release.

I'll approve it and you can merge it!

Thank you and many apologies for the backlog.

@nielash
Copy link
Collaborator Author

nielash commented Jan 20, 2024

Awesome, thank you!! I will do the honors shortly and slay this "monster" 😆

@nielash nielash merged commit 5762462 into rclone:master Jan 20, 2024
9 of 10 checks passed
@nielash nielash moved this from In progress to Done in bisync Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment