Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] rclone wait [fs:/] / Monitor Subsystem / Notify Events #4381

Open
tcf909 opened this issue Jun 22, 2020 · 3 comments
Open

[Proposal] rclone wait [fs:/] / Monitor Subsystem / Notify Events #4381

tcf909 opened this issue Jun 22, 2020 · 3 comments

Comments

@tcf909
Copy link
Contributor

tcf909 commented Jun 22, 2020

Proposal: rclone wait [fs:/] (+monitor subsystem)

Problem Statement:

Currently there is no clear API / approach to watch for changes and act on them when using rclone. Filesystems each have an event notification system, but given that rclone transparently works with remote backends as well as local backends one cannot simply use inotify (on linux) for example.

rclone faces a unique challenge in that it abstracts the complexities of working with many types of backends (including many different types of local filesystems).

To make matters worse, the linux kernel / libfuse (and I'm assuming other kernels) does not offer any way (without patching the kernel) to inject events into the kernel for a FUSE file system. This negates the ability for rclone to work with the existing ecosystem of tools and approaches as a simple producer of events (while letting existing tools that typically rely on the kernel to consume events).

Proposed Solution:

  1. Consolidate change events internally from backends to allow rclone functions to monitor for events while rclone is running (mount, copy, sync, move, etc..). In general present the internal code with a standard API to monitor for events at a directory level (recursive) or file level across any supported backend (remotes that support delta change updates, local filesystems that support notification events).
    • Provide --monitor=BOOL
      • For ongoing operations (mount, etc.) this would default to true.
      • For one-time operations (cp, ls, mv, etc.) this would default to false, but could be really useful for long running operations given the inefficiency of having to run a command twice.
    • Provide --monitor-after=TIME_INT
      • This would be primarily for operations in which we are not entirely sure how long they will take.
    • A monitor would be shared between running rclone instances as to not duplicate the effort / resources of monitoring something.
      • ASIDE: It looks like quite a bit of work was done for the RC subsystem to allow a single instance of rclone to run with parallel execution of operations below the single instance. I'm guessing some of that work will be relevant here.
  2. Present a simple rclone wait [/file, fs:, fs:/dir] command for use by scripts and other external tools (also allow access via RC subsystem). This would be the simplest implementation of the consolidated monitoring interface and would allow the rclone ecosystem to start taking advantage of the internal work.
    • Model the interface / output after inotifywait
      • Provide --monitor-format="%template %template %template"
      • Provide --monitor-recursive=BOOL
  3. Start looking for instances where active monitoring could really enhance existing rclone functionality:
    • rclone [copy,move,sync] --continuous ...
      • perform operation and block while waiting for changes at source which will be handled by the same operation.
    • rclone [copy,move,sync] --monitor
      • While the operation is running, build a list of files of files that change which have already being operated on or were added while running and handle that list at the end
    • rclone mount --vfs-cache-pre-fetch
      • Proactively download files to the local vfs cache
    • I'm sure there are many others...

Current State:

  • rclone currently polls remotes for changes to invalidate caches
    • This could possibly move to a push connection vs a delta poll connection.
  • when using rclone mount local operations (chmod, chown, mv, cp, touch, etc.) do seem to present those events existing kernel watchers.
  • rclone seems to have some functionality built under the RC efforts that allows a single rclone process to handle multiple concurrent sub-operations.
  • Many more current state details to be added as code discovery is done.

Feedback:

Looking for community feedback to stage for a future implementor.

@tcf909
Copy link
Contributor Author

tcf909 commented Jun 22, 2020

Initial investigation #2882

@tcf909
Copy link
Contributor Author

tcf909 commented Jun 22, 2020

Ideally, if we could push events to the kernel that would simplify everything, but:

  • Can we be sure this will work across all kernels / fuse implementations
  • Does the functionality event exist today?
  • Do we want to be reliant on upstream and if so is it worth it to put the work in upstream first?

Ultimately, the kernel events are consumed by tools like inotifywait which are just outputting a delimited stream of events. rclone wait could easily be a drop in replacement (and a standard) for outputting a delimited stream across a diverse set of backends.

@tcf909 tcf909 changed the title [Proposal] rclone wait [fs:/] (+monitor subsystem) [Proposal] rclone wait [fs:/] / Monitor Subsystem / Notify Events Jun 22, 2020
@ncw
Copy link
Member

ncw commented Jun 26, 2020

The only backends supporting ChangeNotify today are

  • amazon cloud drive
  • google drive
  • the wrapping backends crypt/union/etc

For this proposal to be useful we need more backends to support it. Some backends (eg dropbox) have the cabability but it isnt supported. Some backends (eg most of the others) don't have the capability.

So for this to be worth putting the effort in, we need a ChangeNotify fallback for backends which don't support it.

I think the most important backend to support would be the local backend - #4152

I think your point 2 an rclone wait command is a good idea. Point 3 has come up before.

Point 1 would be quite tricky.

I'm not particularly happy with the ChangeNotify internal interface

rclone/fs/fs.go

Lines 559 to 562 in 61e4b4d

// ChangeNotify calls the passed function with a path
// that has had changes. If the implementation
// uses polling, it should adhere to the given interval.
ChangeNotify func(context.Context, func(string, EntryType), <-chan time.Duration)

It is hard to use and the events it generates aren't well defined so I think this could do with a rethink problably!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants