Skip to content
Zygmunt Krynicki edited this page Oct 8, 2023 · 6 revisions

Version

Saturday 2 April 2016: first release.

Introduction

For some time now I've been asking myself why notify systems of the Linux kernel do not work with filesystems in userspace and networkfilesystems like cifs and nfs. The reason is described later in this document, but it took some time for to understand. I've been trying several methods to get this working, like a daemon in userspace which does the fs change notification on behalf of clients, and a connection via a socket with FUSE based filesystems and networkfilesystems like cifs and nfs. These were all too complicated, and right now I've found a solution which is simple and intuitive, reason for me to try to implement this in FUSE ans possible set the first steps for cifs and nfs.

Why doesn't it work now?

Fsnotify does not work right now with FUSE based filesystems and networkfilesystems. Why? In short: the individual filesystems do not "know" a watch has been set, and thus cannot react on that.
In more detail, consider the following example: suppose you borrow books at a public library, and you are the only user of that library. After informing once what the library has in store, you know exactly what is in the library at any time, just by looking at what you've borrowed: the library has in stock is the initial state minus what you've borrowed. This is not possible anymore when you are not the only user anymore. Suppose there is somebody else also borrowing books at this library (which is of course realistic). You cannot know what the library has in store anymore as simple as it was when you were the only user. You can solve this by going to library very often, which is an intensive task. Easier is to ask someone who is working in the library (only if there is somebody!) to inform you when what books are returned and borrowed.
The public library of books is comparable with a directory with files on a server. When you are the only user, you don't need extra information what the directory contains, you know just by tracking every action you do: creating a file, modifying and/or removing it. The situation becomes complicated when some other system is also using this directory. A method to stay up-to-date is polling the directory frequently, but requires extra work. Another method is asking the server to inform you by sending information when somethings in the directory changes. With the current situation in Linux there is no way you as user can contact the remote server to ask to inform you. This is not because the protocol and the server do not support fschange notify. See for example in fs/cifs/cifssmb.c, line 6465 where cifs uses the NT_TRANSACT_NOTIFY_CHANGE method part of the SMB protocol. Windows and Samba fileservers also support this. The problem is that individual filesystems do not know a watch has been set, and thus cannot act upon it, for example by sending a message to the server to watch a directory for certain events, and to send a proper message back when something happens.
The changes I suggest is:

  1. fsnotify informs filesystems like FUSE a watch with a certain mask has been set on an inode.
  2. FUSE kernel module sends a message to the userspace daemon about this watch.
  3. when the FUSE userspace daemon receives this message it can send a protocol specific message (like above in SMB) to the server/backend. (it can also ignore it).
  4. when the FUSE userspace daemon receives a protocol specific fs change message, translate it into something Linux understands, and notify the FUSE kernel module.
  5. the FUSE kernel module notifies the fsnotify subsystem about the (remote) event.

As you see there are many steps necessary to make this work. Before going further I think it's good to define what fsnotify for FUSE based filesystems is.

What is FUSE fsnotify?

In the example above about the public library, and asking someone working there to inform you, you only want to be informed about changes not initiated by you. This may look obvious to you, but a computer system may inform you about any event, regardless about who initiated it, unless you program it to do it different. So fsnotify for filesystems with a backend shared with others is a fs notify system to inform you about events initiated by others.

Important definitions and notes

In this document I describe the integration of FUSE and fsnotify, while most developers are interested in fanotify, inotify and dnotify. Since some time these fs change notify methods are based upon one and the same subsystem: fsnotify. And it's simple: when you make FUSE work with fsnotify under the hood, you automatically have support for inotify, fanotify and dnotify.
Only watches on directories. With the tools inotify you can set a watch on a directory and a file. It's a good choice to only support watches on directories. When watching a file, set a watch on the parent directory, or poll it frequently.
Only support a subset of the possible events: creation and removal of entries in the watched directories and modifications of files looks like a good start, with of course adding support for more events in mind.
Support for fsnotify in the kernel module and the userspace daemon should be completely voluntary: the setting of a watch for example should never block because the userspace daemon "forgets" to give a reply, or the connection with the backend is down.
Removal of the watch: besides the explicit removal of a watch, the userspace daemon should also remove the watch when the inode is removed, it should not rely on receiving the explicit command to remove the watch from the kernel.

Support in well known filesystems

As described above SMB has support for fs change notify.
I'm not sure but have heard NFS 4 also supports it.
Webdav does not support it yet, due to limitations in the HTTP protocol (version 2 can support it, server can "push" content).

Support in FUSE

Initialization

At initialization (FUSE_INIT) kernel module and userspace daemon should know the other side supports it. The best way to achieve that here is adding a capability bit. With the current fuse module (3.0) that will be something like:

FUSE_CAP_FSNOTIFY_SUPPORT (1<<18)

When the userspace daemon receives this bit set in the fuse_conn_info struct, field capable, it knows the kernel does support it (and only then). The userspace daemon sends the same fuse_conn_info struct back, but then the desired capability bit in the field want. Only when the kernel module does support fsnotify, and the userspace daemon replies it wants this feature, the kernel should send the fsnotify related messages to the userspace daemon.

Sending a watch

To send watch information to the userspace, three things are required:
. a new opcode: FUSE_FSNOTIFY
. a struct to hold the information about the watch (eg mask).
. bits defining the mask, eg FUSE equivalents for IN_CREATE, IN_ATTRIB and IN_DELETE et cetera (or FS_CREATE, FS_ATTRIB and FS_DELETE).
Maybe you miss information about the inode (the inode number), this is standard part of the header, so no need to specify extra.

With the current fuse module the new opcode looks like:

FUSE_FSNOTIFY = 46
(part of enum fuse_opcode)

struct fuse_fsnotify_in {
uint64_t mask;
uint32_t action;
uint32_t padding;
};
where: mask decribing which events should be notified
action what happens to the watch (0=remove, 1=change, 2=new)

The set of defines for describing the individual events are required since the bits used by fsnotify are not available in the userspace. And second, it's a good thing to define a subset of events supported by fsnotify which are supported by FUSE. As I've mentioned earlier, only basic events are supported by FUSE: create, delete and modify, with extending it to attrib and others in mind.
Note by the way that inotify uses bits like IN_CREATE, IN_DELETE and IN_MODIFY which look different from the corresponding values used by fsnotify like FS_CREATE, FS_DELETE and FS_MODIFY, but have the exact same value. The same counts for the fanotify.

To start with the events:

#define FUSE_FSNOTIFY_MODIFY 0x00000002
#define FUSE_FSNOTIFY_ATTRIB 0x00000004
#define FUSE_FSNOTIFY_MOVED_FROM 0x00000040
#define FUSE_FSNOTIFY_MOVED_TO 0x00000080
#define FUSE_FSNOTIFY_CREATE 0x00000100
#define FUSE_FSNOTIFY_DELETE 0x00000200
#define FUSE_FSNOTIFY_DELETE_SELF 0x00000400
#define FUSE_FSNOTIFY_MOVE_SELF 0x00000800

#define FUSE_FSNOTIFY_ISDIR 0x40000000

#define FUSE_FSNOTIFY_SUPPORTED ( FUSE_FSNOTIFY_MODIFY | FUSE_FSNOTIFY_ATTRIB | FUSE_FSNOTIFY_MOVED_FROM | FUSE_FSNOTIFY_MOVED_TO | FUSE_FSNOTIFY_CREATE | FUSE_FSNOTIFY_DELETE | FUSE_FSNOTIFY_DELETE_SELF | FUSE_FSNOTIFY_MOVE_SELF )

Note that there totally no bits for opening, access, write and close events. I think these are not required with FUSE with a remote backend.

The kernel module should test the mask of the desired watch it's supported:

fuse_mask = fsnotify_mask & FUSE_FSNOTIFY_SUPPORTED

only if this value is changed compared to the previous value, a watch message should be send to the userspace.
(previous=0, new>0 : new watch, previous>0, new>0 : changed watch, previous>0, new=0 : removed watch)

The userspace daemon has to translate these FUSE_FSNOTIFY_ bits into a mask the backend understands when sending a the requested watch to the backend using protocol specific messages.

Receiving events

When the userspace receives fs change/notify event messages from the backend, it should translate this into a FUSE_FSNOTIFY_ bits event. Again only when it's supported action should be taken:

fuse_mask = translate_mask_from_backend(backend_mask) & FUSE_FSNOTIFY_SUPPORTED

Only if this value is not zero, a message should be send to the kernel module.

Sending of fsnotify events to the fuse kernel module is not yet supported. There is already a framework to notify the kernel module about various events, like readiness for poll, invalidate an inode and delete an entry. Making the sending of fs events to the kernel module is simply making use of that by extending it.

I've implemented it like:

FUSE_NOTIFY_FSNOTIFY = 7
(part of enum fuse_notify_code)

The corresponding struct:

struct fuse_notify_fsnotify_out {
uint64_t parent;
uint64_t mask;
uint32_t namelen;
uint32_t padding;
};

If there is an event on child, namelen>0 and this struct is followed by the name of the entry.

Extra bit required?

I've got fsnotify for fuse working on my own system, and introduced here a new bit to the mask in the inotify_add_watch call, IN_REMOTE. If this bit is set, and only then, the underlying filesystem is notified.
I think it's a good thing to give users the choice to monitor the backend also for changes. But on the maillist of fuse-dev someone commented that this is undesired, cause the applications used are closed source, and it's very unlikely the vendor will modify these. This is of course a problem.

It's also possible to add a kernel option under fsnotify, giving the choice to support the forwarding of the watch to the filesystem. And every filesystem for which this is a feature (cifs, nfs, fuse,...) has this option also (only if fsnotify supports it), simular like the support for fs-cache.

Futher when an event happens on the backend, and is reported to the kernel module through the userspace daemon, this extra bit should be added to the event mask, reporting to the listning clients via inotify and fsnotify (and possibly dnotify) that the event is remote.

Filter events initiated by this host

Fs change notify methods will report anything what happens. Since listners are only interested in events initiated by others, the backend and userspace daemon should filter out events caused by this host. For example:

suppose you have an inotify watch set on "a map", and you create a file in it:

cd "a map"
touch "a file"

when this operation is successful, this will cause an inotify event describing the creation of the file. But also the backend will create an event. This should not be send to this host, so first the backend should filter this out. I do not know backends have these feature, but fanotify has the ability to check the pid of the process causing the event, and it's not too difficult for any backend using fanotify to compare this value with the pid of the process handling the fs operations for this host (if the backend uses a process per client...).
The userspace daemon should also filter events, it cannot rely on the backend. This filtering is only possible when it has it's own cache of inodes and entries.

Add more info like who

Someone suggested that in a network, with a backend shared with others, it would be very nice to have information about who made the change. This looks like:

sbon@remotehost
jan@mydomain

This may work with FUSE, but fsnotify cannot add this info to the events as far as I can see.

Clone this wiki locally