New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bisync: logger, march, and several big bug fixes #7410
Conversation
cba36bd
to
fc8da21
Compare
(I think the windows error is unrelated, possibly a side-effect of recent Go security fix.) |
aa1da82
to
f4efc4b
Compare
@ncw Thought I'd bring f65dc5f to your attention related to your fix from yesterday regarding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that was a lot of commits! It contains some excellent work - lovely logging features, loads of bisync fixes and features - well done :-)
I'd like to release v1.65 within the next week or so, so how much of this do you want in that?
I have put some comments inline but there isn't anything serious in there. I'm happy to work on re-arranging the flags later. I don't want to merge the GUI now, but I will consider it for later.
So have a think about which commits you want in v1.65
From my point of view I think we should try to get the logging stuff in, and as many bisync fixes as we can.
PS smaller PRs in future please :-)
Thanks! And sorry about the massive number of commits! 😅 Nearly all of them depend on earlier ones, so I wasn't sure how I could separate them into different PRs. I can take out the GUI for now, that's fine. As for the others, I'd like to get in as many as you'll allow! I can make some quick revisions here based on your comments. (I'll add some responses inline.) |
71bb263
to
67cbbed
Compare
@ncw Here's a revision. I think I followed all of your instructions, but let me know if I misunderstood anything! Also, don't hate me but I added a commit to fix integration tests, and two others that were pre-requisites for that. I also rebased and resolved conflicts. |
Regarding mostly the above (and to a lesser extent any bisync changes), I am trying to beta test. So far im using the version specified in tag/v1.65.0-beta.7496.004c12643.bisync-logger-alpha, any chance you could generate a more up-to-date delivery? Regardless, thanks for your work! |
@acegene thanks! If you're comfortable installing from source, you can download the latest version from the nielash:bisync-logger branch and compile it. Otherwise I'm happy to upload some updated binaries later today or tomorrow! |
@acegene Here are some updated binaries if you'd like to test it out. Feedback is welcome :) |
67cbbed
to
eafb403
Compare
@nielash you are a legend, thank you!! Normal sync without changes takes around 30s whereas before it could take anywhere between 10-20 minutes.. Been using for a few days now without issue. Reading the comments of --resilient changes I wasn't sure whether this would mean that an exited sync would be able to recover? From my tests it does not however I not using the changes from 3 days ago rather bisync-logger-alpha-2. Am I wrong in thinking this and therefore should I open an issue specifically for this case? n.b. this is so a computer can be shutdown / rclone exited without breaking and requiring a --resync.. Also, just off our chats in the past should I open and start an issue for conflict renaming so it is not so ambiguous? Thank you 🙏 |
So glad to hear it is working well for you! 😃
It depends on the reason for the exit.
I was actually just sketching out a plan for this the other day! I'll open an issue today and tag you so we can discuss it. |
Hi @ncw, question for you -- I am working on some other bisync issues now and would like to submit a new PR for them since this one is too big already 😅 However, they build on the work I did here, so I'm not sure if there's a way to avoid duplicating all of these commits, as I don't think GH supports "dependent" PRs yet. Is there a better way to handle this? |
Refactored the case / unicode normalization logic to be much more efficient, and fix the last outstanding issue from rclone#7270. Before this change, we were doing lots of for loops and re-normalizing strings we had already normalized earlier. Now, we leave the normalizing entirely to March and avoid re-transforming later, which seems to make a large difference in terms of performance.
Before this change, bisync had no ability to retry in the event of sync errors. After this change, bisync will retry if --resilient is passed, but only in one direction at a time. We can safely retry in one direction because the source is still intact, even if the dest was left in a messy state. If the first direction still fails after our final retry, we abort and do NOT continue in the other direction, to prevent the messy dest from polluting the source. If the first direction succeeds, we do then allow retries in the other direction. The number of retries is controllable by --retries (default 3) bisync: high-level retries if --resilient Before this change, bisync had no ability to retry in the event of sync errors. After this change, bisync will retry if --resilient is passed, but only in one direction at a time. We can safely retry in one direction because the source is still intact, even if the dest was left in a messy state. If the first direction still fails after our final retry, we abort and do NOT continue in the other direction, to prevent the messy dest from polluting the source. If the first direction succeeds, we do then allow retries in the other direction. The number of retries is controllable by --retries (default 3)
rclone#5696 Before this change, bisync intentionally ignored Google Docs (albeit in a buggy way that caused problems during --resync.) After this change, Google Docs (including Google Sheets, Slides, etc.) are now supported in bisync, subject to the same options, defaults, and limitations as in `rclone sync`. When bisyncing drive with non-drive backends, the drive -> non-drive direction is controlled by `--drive-export-formats` (default `"docx,xlsx,pptx,svg"`) and the non-drive -> drive direction is controlled by `--drive-import-formats` (default none.) For example, with the default export/import formats, a Google Sheet on the drive side will be synced to an `.xlsx` file on the non-drive side. In the reverse direction, `.xlsx` files with filenames that match an existing Google Sheet will be synced to that Google Sheet, while `.xlsx` files that do NOT match an existing Google Sheet will be copied to drive as normal `.xlsx` files (without conversion to Sheets, although the Google Drive web browser UI may still give you the option to open it as one.) If `--drive-import-formats` is set (it's not, by default), then all of the specified formats will be converted to Google Docs, if there is no existing Google Doc with a matching name. Caution: such conversion can be quite lossy, and in most cases it's probably not what you want! To bisync Google Docs as URL shortcut links (in a manner similar to "Drive for Desktop"), use: `--drive-export-formats url` (or alternatives.) Note that these link files cannot be edited on the non-drive side -- you will get errors if you try to sync an edited link file back to drive. They CAN be deleted (it will result in deleting the corresponding Google Doc.) If you create a `.url` file on the non-drive side that does not match an existing Google Doc, bisyncing it will just result in copying the literal `.url` file over to drive (no Google Doc will be created.) So, as a general rule of thumb, think of them as read-only placeholders on the non-drive side, and make all your changes on the drive side. Likewise, even with other export-formats, it is best to only move/rename Google Docs on the drive side. This is because otherwise, bisync will interpret this as a file deleted and another created, and accordingly, it will delete the Google Doc and create a new file at the new path. (Whether or not that new file is a Google Doc depends on `--drive-import-formats`.) Lastly, take note that all Google Docs on the drive side have a size of `-1` and no checksum. Therefore, they cannot be reliably synced with the `--checksum` or `--size-only` flags. (To be exact: they will still get created/deleted, and bisync's delta engine will notice changes and queue them for syncing, but the underlying sync function will consider them identical and skip them.) To work around this, use the default (modtime and size) instead of `--checksum` or `--size-only`. To ignore Google Docs entirely, use `--drive-skip-gdocs`. Nearly all of the Google Docs logic is outsourced to the Drive backend, so future changes should also be supported by bisync.
Before this change, bisync supported `--backup-dir` only when `Path1` and `Path2` were different paths on the same remote. With this change, bisync introduces new `--backup-dir1` and `--backup-dir2` flags to support separate backup-dirs for `Path1` and `Path2`. `--backup-dir1` and `--backup-dir2` can use different remotes from each other, but `--backup-dir1` must use the same remote as `Path1`, and `--backup-dir2` must use the same remote as `Path2`. Each backup directory must not overlap its respective bisync Path without being excluded by a filter rule. The standard `--backup-dir` will also work, if both paths use the same remote (but note that deleted files from both paths would be mixed together in the same dir). If either `--backup-dir1` and `--backup-dir2` are set, they will override `--backup-dir`.
Similar to rclone@acf1e2d, go1.21.4 appears to have broken sync.MoveDir on Windows because filepath.VolumeName() returns `\\?` instead of `\\?\C:` in cleanRootPath. It looks like the Go team is aware of the issue and planning a fix, so this may only be needed temporarily.
Bisync checks file equality before renaming sync conflicts by comparing checksums. Before this change, backends without checksum support (notably Crypt) would fall back to --size-only for these checks, which is not a very safe method (differing files can sometimes have the same size, especially if they're small.) After this change, Crypt remotes fallback to using Cryptcheck so that checksums can be compared. As a last resort when neither Check nor Cryptcheck are available, files are compared using --download so that we can be certain the files are identical regardless of checksum support.
Before this change, a file would sometimes be silently deleted instead of renamed on macOS, due to its unique handling of unicode normalization. Rclone already had a SameObject check in place for case insensitivity before deleting the source (for example if "hello.txt" was renamed to "HELLO.txt"), but had no such check for unicode normalization. After this change, the delete is skipped on macOS if the src and dst filenames normalize to the same NFC string. Example of the previous behavior: ~ % rclone touch /Users/nielash/rename_test/ö ~ % rclone lsl /Users/nielash/rename_test/ö 0 2023-11-21 17:28:06.170486000 ö ~ % rclone moveto /Users/nielash/rename_test/ö /Users/nielash/rename_test/ö -vv 2023/11/21 17:28:51 DEBUG : rclone: Version "v1.64.0" starting with parameters ["rclone" "moveto" "/Users/nielash/rename_test/ö" "/Users/nielash/rename_test/ö" "-vv"] 2023/11/21 17:28:51 DEBUG : Creating backend with remote "/Users/nielash/rename_test/ö" 2023/11/21 17:28:51 DEBUG : Using config file from "/Users/nielash/.config/rclone/rclone.conf" 2023/11/21 17:28:51 DEBUG : fs cache: adding new entry for parent of "/Users/nielash/rename_test/ö", "/Users/nielash/rename_test" 2023/11/21 17:28:51 DEBUG : Creating backend with remote "/Users/nielash/rename_test/" 2023/11/21 17:28:51 DEBUG : fs cache: renaming cache item "/Users/nielash/rename_test/" to be canonical "/Users/nielash/rename_test" 2023/11/21 17:28:51 DEBUG : ö: Size and modification time the same (differ by 0s, within tolerance 1ns) 2023/11/21 17:28:51 DEBUG : ö: Unchanged skipping 2023/11/21 17:28:51 INFO : ö: Deleted 2023/11/21 17:28:51 INFO : Transferred: 0 B / 0 B, -, 0 B/s, ETA - Checks: 1 / 1, 100% Deleted: 1 (files), 0 (dirs) Elapsed time: 0.0s 2023/11/21 17:28:51 DEBUG : 5 go routines active ~ % rclone lsl /Users/nielash/rename_test/ ~ %
…sts - see rclone#5679 Before this change, integration tests often could not be run on backends with differing features from the local system that goldenized them. In particular, differences in modtime precision, checksum support, and encoding would cause false positives. After this change, the tests more accurately account for the features of the backend being tested, which allows us to see true positives more clearly, and more meaningfully assess whether a backend is supported.
fe42471
to
786302f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this is a monster :-)
I think we should merge it now so we can unblock the dev process. We've got time to fix up breakage before the next release.
I'll approve it and you can merge it!
Thank you and many apologies for the backlog.
Awesome, thank you!! I will do the honors shortly and slay this "monster" 😆 |
What is the purpose of this change?
A suite of several related fixes tackling bisync's biggest pain points.
Highlights:
--track-renames
and--backup-dir
-- fixes bisync: implement--track-renames
#5690 and bisync: respect --backup-dir #5685--fix-case
flag to rename case insensitive dest -- fixes New--fix-case
flag to workaround case insensitive filesystem (macOS/Windows) #4854--resync
performance -- partially addresses bisync: optimize --resync #5681Was the change discussed in an issue or in the forum before?
Yes. See this post and the individual issues linked above.
Checklist