Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accommodating FSEvent's unique API #147

Closed
cmyr opened this issue Jan 9, 2018 · 6 comments

Comments

@cmyr
Copy link
Contributor

@cmyr cmyr commented Jan 9, 2018

Thought this might be better as a separate issue. I've continued to dig into the FSEvents API in more detail, and it has some peculiar characteristics.

Event Buffering

FSEvent does not emit events immediately. Instead, there are apparently two types of buffering.

In the first case, no events are delievered until the associated file descriptor has been closed. This means that if you create a file and hold on to the file descriptor, no events will be sent.

However, even if you do close the file, events are still buffered, presumably using a timer. This means that if you open a file, write to it, close the file, and repeat, you will receive a single event unless there is some delay between the two opens; in very simple testing this delay seems to be about 500ms.

Sticky Flags

FSEvent has another interesting feature: when a flag is set on a path, subsequent events for that path will include the previously set flag for some interval of time.

This means that if you create a file, wait, and then modify that file, you may get an event with flags [create, is_file] and then subsequently an event with flags [create, modify, is_file].

More interestingly, this interval is not constant; instead, it appears that fseventsd (a system daemon that brokers events for streams) starts a repeating timer on launch (for 30s, as far as I've seen) and and on each tick it clears the active flags for each path, or something along these lines.

You can verify this by running a test program available here; you will notice that the time to the first flag-clear varies between runs, but all subsequent clears are offset from that initial timeout by a constant factor.

Problem

If multiple modifications of the same type (to data, metadata, xattrs, ownership, or finder attributes) happen within the same window, there is no way to determine what exactly has gone on without looking at the actual state of the file, and comparing it to some previous state.

There are heuristics that can apply in some situations, but not consistently. For instance, if we receive two events with the flags [create, modify] [create, modify] we can intuit that the second represents a modification, because a file cannot be created twice. (overwriting an existing file with a new file would be an inode modificiation, which has its own flag). However, if we get the flags [create, modify, mod_metadata], [create, modify, mod_metadata], we cannot know if the second event was a change to data, metadata, or both.

Some solutions

As far as I can tell (and this is backed up by Apple documentation, and posts on the filesystem-dev mailing list) the only way to get an approximately fine-grained set of events from the FSEvent API is to cache current state of the watched paths, and to use FSEvent events as essentially a mechanism for efficiently directing polling. This still won't capture everything; in particular it won't capture chunked writes where the writing process holds open a file descriptor; and it won't capture multiple events that happen in ~500ms of one another. This is the approach suggested by Apple:

"What's your high-level goal here? If you're just interested in a single file, James's suggestion would work. If you're trying to monitor an entire hierarchy of files, you're going to have to rethink you're assumptions. FSEvents is designed to notify you about events in the file system so that you can then go look in the file system to find the current state of things and sync based on that. It's not designed to feed you a stream of events with sufficient fidelity to reconstruct how the file system go into its current state."

https://lists.apple.com/archives/filesystem-dev/2016/Mar/msg00004.html

If we were to avoid this, then we can offer more coarse-grained events. Where applicable, we will be able to send concrete events; in cases of ambiguity we would send generic "modify_any" events. I'm not sure what the architecture for the rest of the notify crate is, and how far it intends to go to fill in missing functionality; but if it was going to be capable of some filesystem caching and lookup than this might be enough to direct it appropriately.

@passcod

This comment has been minimized.

Copy link
Member

@passcod passcod commented Jan 9, 2018

Thanks for the write up, Colin.

I'm still processing and thinking about it, but leaning towards your second approach: get coarse-grained events and let (one of the higher layers of) notify figure out if it wants to enrich the events with extra lookups from the filesystem.

(One of the reasons why I haven't written much of that architecture is that I was waiting on exactly this kind of detail from the backends, to figure out just what is needed to be filled in.)

@dfaust

This comment has been minimized.

Copy link
Collaborator

@dfaust dfaust commented Jan 9, 2018

I would also prefer the coarse-grained approach. There are many differences between fsevents/inotify/etc. and they aren't completely reliable, so it will be necessary for some applications to poll the fs anyway. notify may very well provide such functionality, but it should be optional if it is expensive like caching the entire file tree.

@passcod

This comment has been minimized.

Copy link
Member

@passcod passcod commented Jan 9, 2018

An additional thing:

FSEvent as described in that mailing looks to be very much a "oh hey, something happened in this subtree" thing. That's actually how most notify consumers use the library. As much as I wish to make richer events available, I am quite aware that most consumers, in the end, won't really care. That was one of the driving concerns behind the hierarchical EventKind system, so that 90% of people can subscribe to and handle the 3-4 high level event kinds, and be done with it.

In other words, it's a bit annoying that FSEvent doesn't behave as we'd first expected it to, but it's not terrible, either.

@cmyr

This comment has been minimized.

Copy link
Contributor Author

@cmyr cmyr commented Jan 11, 2018

@passcod let me know if you think it makes sense for me to go ahead with the coarse-grained approach, or if I should hold on this for a bit?

@passcod

This comment has been minimized.

Copy link
Member

@passcod passcod commented Jan 11, 2018

Yeah, go ahead with it :)

@passcod

This comment has been minimized.

Copy link
Member

@passcod passcod commented Feb 9, 2019

I think this was mostly decided as using kqueue instead for Next, with perhaps an optional fsevent backend later on. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.