Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rclone FUSE mount does not reliably unmount when operations are in flight, leaving broken mount behind #7766

Open
nh2 opened this issue Apr 15, 2024 · 2 comments

Comments

@nh2
Copy link
Contributor

nh2 commented Apr 15, 2024

Often, when I rclone mount the SFTP backend, write some file into the mount, and then Ctrl+C rclone mount, it fails with the below error:

rclone mount 'my-sftp-remote:/testdir' mymountdir
\^C
2024/04/15 01:14:35 ERROR : mymountdir: Failed to unmount: exit status 1: fusermount3: failed to unmount /root/mymountdir: Device or resource busy

Then the rclone process exits without unmounting.

Device or resource busy can be expected to happen during normal usage, because Linux has caches that it flushes asynchronously, so I think rclone should handle that case correctly.

Any operation on the mount dir will then fail:

# ls -l mymountdir
ls: cannot access 'mymountdir': Transport endpoint is not connected

Transport endpoint is not connected is the expected error message for this, when the process that created a FUSE mount quits.

This behaviour is explained in the docs, and was originally documented here in the PR that added the fuse.Unmount(mountpoint) call.

Even re-mounting with rclone mount will fail:

# rclone mount 'unre-benaco-server-sftp-test:/root/testdir' unre-mnt
2024/04/15 01:14:54 Fatal error: failed to mount FUSE fs: directory already mounted, use --allow-non-empty to mount anyway: /root/unre-mnt

The workaround here is to manually fusermount -u testdir, but that is not great.
In particular, it makes scripting/automating rclone mount more difficult.

What is your rclone version (output from rclone version)

rclone 1.64.0
- os/version: nixos 23.05 (Stoat) (64 bit)
- os/kernel: 6.1.51 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.20.8
- go/linking: dynamic
- go/tags: cmount

What sshfs does

The issue does not appear with sshfs -f: When it gets Ctrl+C'd or SIGTERMed, it calls `umount2(..., MNT_DETACH).

This is done by libfuse's fuse_kern_unmount():

	res = umount2(mountpoint, 2);  // 2 is `MNT_DETACH`, the equivalent to `fusermount -uz` ("lazy" unmount)

Note sshfs does not retry, but use a lazy unmount:

       MNT_DETACH (since Linux 2.4.11)
              Perform a lazy unmount: make the mount unavailable for new
              accesses, immediately disconnect the filesystem and all
              filesystems mounted below it from each other and from the
              mount table, and actually perform the unmount when the
              mount ceases to be busy.

Lazy unmounts aren't great in all cases:

https://unix.stackexchange.com/questions/390056/why-is-lazy-mnt-detach-or-umount-l-unsafe-dangerous

Proposed solution:

Either of:

  1. rclone should retry the fuse.Unmount(mountpoint) until it succeeds.

    I believe this should be easy: the fuse.Umount() implementation tries 5 times here

    https://github.com/hanwen/go-fuse/blob/90eabd702c26eaaf0d07ca57efd4df8e4b82ed45/fuse/server.go#L127-L154

    and if it fails, it early-returns, skipping the line ms.mountPoint = "". So the caller can call fuse.Umount() again. This is what I think rclone should do if the error is Device or resource busy (of course rclone should try to check this based on an errno code instead of the error message, as that may be affected by the user's locale).

  2. Do what sshfs does and use a lazy unmount.

    That would require making the line err := syscall.Unmount(mountPoint, 0) configureable in go-fuse, so that MNT_DETACH can be passed instead of , 0.

@Animosity022
Copy link
Collaborator

The best way around that is to make sure all IO is stopped as if you have a network mount, it's going to be gritty to make anything stop 'nicely'.

I use systemd and ensure all processes connected are stopped before trying to stop the mount.

Even 5 times or 50 times, I don't think most people would figure out they need to stop processes connected to it.

Most folks already do a lazy unmount as well.

@ncw
Copy link
Member

ncw commented Apr 15, 2024

@nh2 which mount command are you using?

rclone mount uses bazil.org/fuse whereas rclone mount2 uses github.com/hanwen/go-fuse/v2?

If you are using rclone mount but investigating github.com/hanwen/go-fuse/v2 then you are investigating the wrong library :-(

There is also rclone cmount which you get if you build with -tags cmount on linux but it the default mount on macOS and Windows. This uses github.com/winfsp/cgofuse. This understands libfuse options - maybe there is a mount option for lazy unmount?

Can you try all 3 mounts and see if they have the same behaviour?

If we want to make lazy unmount an option then we need to add it to all 3 mounts ideally.

If you want to try the retry 5 times then try rclone mount2 and it should call the hanwen fuse code you linked above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants