Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Thought this might be better as a separate issue. I've continued to dig into the FSEvents API in more detail, and it has some peculiar characteristics.
FSEvent does not emit events immediately. Instead, there are apparently two types of buffering.
In the first case, no events are delievered until the associated file descriptor has been closed. This means that if you create a file and hold on to the file descriptor, no events will be sent.
However, even if you do close the file, events are still buffered, presumably using a timer. This means that if you open a file, write to it, close the file, and repeat, you will receive a single event unless there is some delay between the two opens; in very simple testing this delay seems to be about 500ms.
FSEvent has another interesting feature: when a flag is set on a path, subsequent events for that path will include the previously set flag for some interval of time.
This means that if you create a file, wait, and then modify that file, you may get an event with flags [create, is_file] and then subsequently an event with flags [create, modify, is_file].
More interestingly, this interval is not constant; instead, it appears that fseventsd (a system daemon that brokers events for streams) starts a repeating timer on launch (for 30s, as far as I've seen) and and on each tick it clears the active flags for each path, or something along these lines.
You can verify this by running a test program available here; you will notice that the time to the first flag-clear varies between runs, but all subsequent clears are offset from that initial timeout by a constant factor.
If multiple modifications of the same type (to data, metadata, xattrs, ownership, or finder attributes) happen within the same window, there is no way to determine what exactly has gone on without looking at the actual state of the file, and comparing it to some previous state.
There are heuristics that can apply in some situations, but not consistently. For instance, if we receive two events with the flags [create, modify] [create, modify] we can intuit that the second represents a modification, because a file cannot be created twice. (overwriting an existing file with a new file would be an inode modificiation, which has its own flag). However, if we get the flags [create, modify, mod_metadata], [create, modify, mod_metadata], we cannot know if the second event was a change to data, metadata, or both.
As far as I can tell (and this is backed up by Apple documentation, and posts on the filesystem-dev mailing list) the only way to get an approximately fine-grained set of events from the FSEvent API is to cache current state of the watched paths, and to use FSEvent events as essentially a mechanism for efficiently directing polling. This still won't capture everything; in particular it won't capture chunked writes where the writing process holds open a file descriptor; and it won't capture multiple events that happen in ~500ms of one another. This is the approach suggested by Apple:
If we were to avoid this, then we can offer more coarse-grained events. Where applicable, we will be able to send concrete events; in cases of ambiguity we would send generic "modify_any" events. I'm not sure what the architecture for the rest of the notify crate is, and how far it intends to go to fill in missing functionality; but if it was going to be capable of some filesystem caching and lookup than this might be enough to direct it appropriately.
Thanks for the write up, Colin.
I'm still processing and thinking about it, but leaning towards your second approach: get coarse-grained events and let (one of the higher layers of) notify figure out if it wants to enrich the events with extra lookups from the filesystem.
(One of the reasons why I haven't written much of that architecture is that I was waiting on exactly this kind of detail from the backends, to figure out just what is needed to be filled in.)
I would also prefer the coarse-grained approach. There are many differences between fsevents/inotify/etc. and they aren't completely reliable, so it will be necessary for some applications to poll the fs anyway.
An additional thing:
FSEvent as described in that mailing looks to be very much a "oh hey, something happened in this subtree" thing. That's actually how most notify consumers use the library. As much as I wish to make richer events available, I am quite aware that most consumers, in the end, won't really care. That was one of the driving concerns behind the hierarchical EventKind system, so that 90% of people can subscribe to and handle the 3-4 high level event kinds, and be done with it.
In other words, it's a bit annoying that FSEvent doesn't behave as we'd first expected it to, but it's not terrible, either.