Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INotifyWatcher recursively adds watches but does not recursively remove #60

Closed
jwilm opened this issue Apr 30, 2016 · 5 comments
Closed
Labels
Milestone

Comments

@jwilm
Copy link
Contributor

jwilm commented Apr 30, 2016

Here's where the INotifyHandler gets requests to add/remove watches.

            EventLoopMsg::AddWatch(path, tx) => {
                let _ = tx.send(self.add_watch_recursively(path));
            }
            EventLoopMsg::RemoveWatch(path, tx) => {
                let _ = tx.send(self.remove_watch(path));
            }

Watches are definitely done recursively (that function does a WalkDir and add_watch for each entry). Remove watch only attempts to remove the given path.

    fn remove_watch(&mut self, path: PathBuf) -> Result<(), Error> {
        match self.watches.remove(&path) {
            None => Err(Error::WatchNotFound),
            Some(p) => {
                let w = p.0;
                match self.inotify.rm_watch(w) {
                    Err(e) => Err(Error::Io(e)),
                    Ok(_) => {
                        // Nothing depends on the value being gone
                        // from here now that inotify isn't watching.
                        (*self.paths).write().unwrap().remove(&w);
                        Ok(())
                    }
                }
            }
        }
    }

Just for completeness, here's the add_watch_recursively implementation.

    fn add_watch_recursively(&mut self, path: PathBuf) -> Result<(), Error> {
        match metadata(&path) {
            Err(e) => return Err(Error::Io(e)),
            Ok(m) => {
                if !m.is_dir() {
                    return self.add_watch(path);
                }
            }
        }

        for entry in WalkDir::new(path)
                         .follow_links(true)
                         .into_iter()
                         .filter_map(|e| e.ok()) {
            try!(self.add_watch(entry.path().to_path_buf()));
        }

        Ok(())
    }
@blaenk
Copy link
Contributor

blaenk commented Apr 30, 2016

Nice catch. I haven't had a chance to work on rsnotify in a while, but one thing that may be important to keep in mind is that, to remove the watches, we probably shouldn't just recursively traverse the "root" and delete every watch for every entry we encounter.

For example, we perhaps could have recursively traversed the root and for each directory found, check if it's in watches, if so, remove_watch(that_path). However, one problem is that as far as I know, we make no distinction as to whether a watch is recursive or not. So given some path that we want to remove, we won't know whether we should recursively remove it or not.

Also I'm not sure what would happen if one of the paths watched as a result of a recursive watch has since been removed from the file system. In that case it would not be found during the traversal and it would remain/leak/linger in the watches map. Although, I guess if that's an issue, it already is. Perhaps we can watch for the IN_DELETE_SELF event for each watched path, and if triggered, we remove the entries? Not sure if that'd work.

           IN_DELETE_SELF
                  Watched file/directory was itself deleted.  (This event
                  also occurs if an object is moved to another filesystem,
                  since mv(1) in effect copies the file to the other
                  filesystem and then deletes it from the original
                  filesystem.)  In addition, an IN_IGNORED event will
                  subsequently be generated for the watch descriptor.

Also from TLPI:

An IN_IGNORED event is generated when a watch is removed. This can occur for two reasons: the application used an inotify_rm_watch() call to explicitly remove the watch, or the watch was implicitly removed by the kernel because the monitored object was deleted or the file system where it resides was unmounted.

So I think monitoring for IN_IGNORED should be enough. When we receive it, we remove the entries from our own data structures. Perhaps this should also be a separate issue. I'll file.

Currently the watches map is watches: HashMap<PathBuf, (Watch, flags::Mask)>. One approach is to have watches's value be or contain an enum specifying whether it's a single watch or a recursive watch. If it's a recursive watch, then it should store the collection of Watches for that PathBuf representing a recursive watch. Then when a recursive watch is created, it'll store all of the paths that it's recursively watching. When removing a watch, we check if it's a WatchKind::Recursive and if so go through deleting all of the associated watches.

enum WatchKind {
  Single(Watch),
  Recursive(Vec<Watch>),
}

struct WatchEntry {
  watch_kind: WatchKind,
  mask: flags::Mask,
}

struct INotifyHandler {
  ...
  watches: HashMap<PathBuf, WatchEntry>,
  paths: Arc<RwLock<HashMap<Watch, PathBuf>>>
}

As I understand it, INotifyhandler::paths already correctly stores the reverse, that is, it maps a given Watch to its corresponding PathBuf.

Alternatively, we can perhaps just store the root of the recursive path, i.e. WatchKind::Recursive(PathBuf), and then recursively traverse that and remove any path we find for which there is an associated watch. One problem with this approach, although messy and probably unlikely in practice, is that when you recursively watch a path, you're watching the entries that existed at that path at that point in time. What if someone later adds another entry (file for example) and watches that separately. Then there will be a path-watch key-value pair for that path, in which case the recursive removal of the recursive watch will also inadvertently remove this. So perhaps the Recursive(Vec<PathBuf>) approach is better because it accurately represents the fact that the recursive watch is based on the snapshot/point-in-time in which the recursive watch was made.

So to recap: differentiate between recursive and normal watches, and for recursive watches, store all of the watches that are associated with the given root, so for a given path you can access all of the watches added as a result of the recursive watch. The paths for these watches can easily be retrieved by cross-referencing with paths: Arc<RwLock<HashMap<Watch, PathBuf>>>.

Or am I overthinking this?

@jwilm
Copy link
Contributor Author

jwilm commented Apr 30, 2016

Another idea I've been toying with is returning a Token of some sort from Watcher::watch which can be passed to Watcher::unwatch. The token could be mapped to whatever was initially watched in the call to watch. It doesn't even need to care about paths necessarily.. In the case of inotify, mapping the token to a list of fds should be sufficient to unwatch everything. In the process of unwatching things, it seems fine to just ignore any errors removing items that aren't being watched (perhaps because they no longer exist or were otherwise removed).

Continuing with your thoughts, you mention that

What if someone later adds another entry (file for example) and watches that separately.

We could automatically watch that file (would inotify give a CREATE notification from the parent directory?) so that another call to watch is not necessary for it. After all, when you ask to watch a directory, you probably want to know about all activity in that directory even if it pertains to files that didn't exist on the initial watch. Just assuming watch on a directory is always recursive should be sufficient.

The real problem I see with this approach is that if I watch /tmp/thing.txt and /tmp, unwatching /tmp would result in /tmp/thing.txt no longer generating events even if it was registered separately.

@blaenk
Copy link
Contributor

blaenk commented Apr 30, 2016

Another idea I've been toying with is returning a Token of some sort from Watcher::watch which can be passed to Watcher::unwatch. The token could be mapped to whatever was initially watched in the call to watch. It doesn't even need to care about paths necessarily.

Yeah I was considering something like that as well. IIRC one of the reasons we keep the paths is so that when we get an event, we can send the Path to the user so they know what path the event corresponds to. I guess we can offload this book-keeping to them (they'd get the watch descriptor and they'd need to maintain a map of wds to paths), but I think the objective of this package is to be a bit higher level than that.

The real problem I see with this approach is that if I watch /tmp/thing.txt and /tmp, unwatching /tmp would result in /tmp/thing.txt no longer generating events even if it was registered separately.

Yep that's what I was getting at. I think it goes back to what you said:

After all, when you ask to watch a directory, you probably want to know about all activity in that directory even if it pertains to files that didn't exist on the initial watch.

I think right here we simply have to decide what behavior we prefer. Intuitively I agree that if the user specifically wants a recursive watch, it should continuously watch any new additions/changes, even though that's not what we're currently doing. I'm not sure if anyone would ever want the current behavior, where we recursively set up watches at the point-in-time at which the recursive watch was placed. If so I guess they could do it themselves manually by traversing the tree. Or perhaps they can control this with some sort of Recursive::Live marker which keeps absorbing any new entries, or Recursive::Static which does what we currently do.

If we go that route, perhaps we can have a simple check where, if after a recursive watch on /tmp a user wants to specifically watch /tmp/something we either ignore it or let them know that it's already covered by the recursive watch. Alternatively if the user was already watching /tmp/something and wants to recursively watch /tmp, it either gets "absorbed" into the new recursive watch or the user is notified that there's a conflicting watch.

@passcod
Copy link
Member

passcod commented Apr 30, 2016

I'm not sure if anyone would ever want the current behavior, where we recursively set up watches at the point-in-time at which the recursive watch was placed.

The (bad) reason for this, iirc, is that when I was creating this I was reading through other notify implementations in other languages, and some did just that, for simplicity perhaps. I think the logic went like: 1) glob $path/**, 2) Filter for dirs, 3) Add watches. Which, uh, is pretty close to what we do. But I do agree it should be changed.

Does this also concern other backends? I'd think it does.

Generally, I think there backends should be a lot more low-level, and rsnotify should do a lot more itself than just selecting a backend and delegating everything to that. I outlined this in another thread, but revisiting for this:

  • Backends should implement a small, well-defined interface:
    • ::new should return a Result, failing if, say, INotify isn't available on this kernel. Should also take either a channel if we want to still do that, or a closure; to send synthesised events back to Notify.
    • ::watchPath required
    • ::unwatchPath required
    • ::watchPathRecursively optional, only if backend can do it more effectively
    • ::unwatchPathRecursively optional, reverse for the above
    • Should all take a Path and return a Result, I think
  • Notify itself should manage everything else about watches, as discussed above, to the general goal of:
    • Recursive watches implemented generically (either using tree traversal and ::watchPath or delegating to the backend's ::watchPathRecursively if present) and unwatching properly
    • Optimising the watches, as in @blaenk's last paragraph
    • Marking paths/watches wanted by the user, so those aren't ::unwatched when we unwatch parent dirs (and vice versa)
    • Debouncing some events, perhaps, like for Change Linux to report modify when close_write is detected. #48
    • Behaviour switches for Recursive::Live or ::Static, if we want to
    • Dynamic fallbacks (from FANotify to INotify, and from [any backend] to polling, etc) instead of at-compile-time

Thoughts?

@blaenk
Copy link
Contributor

blaenk commented Apr 30, 2016

The (bad) reason for this, iirc, is that when I was creating this I was reading through other notify implementations in other languages, and some did just that, for simplicity perhaps. I think the logic went like: 1) glob $path/**, 2) Filter for dirs, 3) Add watches. Which, uh, is pretty close to what we do. But I do agree it should be changed.

I agree completely. I've also seen this be done in other implementations and I probably would've done the same thing. It's just that this issue made me question whether perhaps there's a better way. Then again, perhaps they do it this way for a good reason? Perhaps there are some problems down the line. That said, I think having behavior switches like ::Live and ::Static would be a good way to go about it. That way users can choose whichever way they prefer.

Does this also concern other backends? I'd think it does.

Yup. I labeled this one os-linux since this particular issue (unable to unwatch recursive watches) seems to affect the inotify back-end. I couldn't find any mention of recursive watches in the source files for the other back-ends.

I agree with your ideas, perhaps they should go into its own separate tracking issue so everyone can join in on the discussion, since it's not specific to inotify as you say. That way we could also track progress towards that goal.

I agree that perhaps we should slim down the amount of work the back-ends do. It seems like it's very possible for the implementations to diverge easily in quality and features, which is already very possible given the nature of this project in trying to normalize these very os-dependent features, and is perhaps further compounded when each back-end seems to need to recreate its own book-keeping etc. I'm not sure to what extent we can avoid that though. Perhaps we can learn something from std, for example how it handles fs::Metadata even though the underlying types are system-dependent.

Also I agree that it would be nice to have some optional debouncing functionality. I've seen other notification libraries in other languages include this. Currently it seems like everyone's forced to do it on their end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants