-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use interrupts instead of closing /dev/fuse socket when a request takes too long #261
Comments
The I think sending interrupts instead of killing the file system could introduce data inconsistencies. OS X's caching layer (or at least the process that started the timed-out request) thinks the request has been aborted even though it might have been completed. We might end up with a situation where the data "on-disk" and "in-cache" are not in sync. It would be interesting to see how NFS or SMB deal with timeouts. What about the following modified approach? After |
I'm not sure what inconsistency would exposed that isn't already there from just hitting control-C. The userspace FUSE server may ignore the interrupt request and just respond to the original request, or it may respect the interrupt; whether that is triggered by a global timeout or signal shouldn't change the data consistency rules. |
Oh and having some ultimate "you must respond in this time" |
Let's assume a process is overwriting a file on a FUSE volume. The data moves through the kernel's caching layer and then to the FUSE user space daemon. Let's assume further that the daemon does not respond within This means the file cache still contains the original file content from before the interrupted write request. The user space daemon on the other hand ignores the interrupt and completes the write operation. At this point the actual "on-disk" data is different from the "in-cache" data. Following read/write operations will operate on "in-cache" data, not the actual "on-disk" data. This could lead to file corruption. |
I don't know much about OS X. On Linux, I'd expect one of two possibilities:
If there's an error flushing a dirty page in the page cache, that's exactly the same scenario as a disk reporting a block write error. That's what the FUSE flush request is for, and that's why userspace needs to check Relevant bits of Linux kernel code:
|
Send an interrupt request to give the file system daemon a chance to handle the timeout. If the daemon does not respond in time (daemon_timeout) the file system will be marked dead. Addresses osxfuse/osxfuse#261
In the newly released osxfuse 3.2.0 the kernel extension sends an interrupt request to give the file system daemon a chance to handle the timeout. If the daemon does not respond to the interrupt request within |
This is great news! |
I just tested this and it makes all well-behaved http://bazil.org/fuse -using filesystems do the right thing on slow requests. This means even aborting outgoing network requests and all. This is great, thank you very much! (Well-behaved = respects Context cancellation: http://blog.golang.org/context) |
Also mention that OSXFUSE 3.2.0 no longer kills the whole mount on `daemon_timeout` (osxfuse/osxfuse#261)
To the best of my understanding, currently if any FUSE operation takes too long (longer than
daemon_timeout
, default 60s), the whole filesystem goes into "broken" mode, and needs to be unmounted. I understand this is because of OS X limitations that require the kext to respond to the request, or the kext itself will be declared broken.How about, instead of wrecking the whole mount, behave just as if the request was interrupted at the
daemon_timeout
time. As far as OS X is concerned, it fails withEINTR
; as far the userspace FUSE server is concerned, it sees afuse_interrupt_in
request cancelling the earlier request. If the userspace FUSE server doesn't implement interrupting, it'll just process the earlier request to completion, at which point its result will get discarded -- just like if the response raced with the interrupt message.This would avoid a lot of trouble in writing networked filesystems with OSXFUSE. Currently everyone must implement their own timeout logic underneath, so that the kext
daemon_timeout
never triggers.The text was updated successfully, but these errors were encountered: