Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeBSD 13+ : NFS access of snapshot returns stale file handle; server zfs commands hang #13974

Closed
eborisch opened this issue Sep 30, 2022 · 0 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@eborisch
Copy link

eborisch commented Sep 30, 2022

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 13.1-RELEASE-p2
Kernel Version 13.1-RELEASE-p2
Architecture amd64
OpenZFS Version 2.1.4

Attempting to access to a snapshot over NFS fails (stale file handle error); deleting the snapshot fails at this point and blocks usage of the zfs tools. The filesystem itself is still alive and well, fulfilling requests from NFS and locally, but any attempt to issue a zfs commands fails (hangs).

On systems with snapshots being created/deleted, like many with automated frequent/hourly/... snapshots, and remote NFS users, this means a remote user can wedge the server's ZFS management interfaces (for any purpose, not just on the particular dataset) just by listing the contents of a snapshot that is later scheduled for deletion.

I initially (June) ran into this with automated snapshot expiration and (attempted) deletion, where I directly observed the issue due to zfs sends no longer working; I didn't connect the dots between the failed NFS access, later snapshot deletion, and subsequent wedging of the server's zfs commands until Michel's [bug report].(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266236).

Reproducing

  1. Mount NFS-exported zfs filesystem on client.
  2. Try to enter a snapshot (.zfs/snapshot/foo) directory on client -> Stale file handle error.
  3. Additional steps I checked at this point to see if it illuminated anything; not required to reproduce:
    a) Unmount on client; stop nfsd on server
    b) mount -v on server shows the requested snapshot as mounted
    c) Try explicit unmount of the snapshot path -> umount hangs (but the snapshot path no longer shows up in mount -v)
  4. Try deleting snapshot -> zfs hangs
procstat -k $hung_unmount_pid: 

  PID    TID COMM                TDNAME              KSTACK                       
 5260 101043 umount              -                   mi_switch _sleep rms_wlock zfsvfs_teardown zfs_umount dounmount kern_unmount amd64_syscall fast_syscall_common 


procstat -k $hung_zfs_destroy:

  PID    TID COMM                TDNAME              KSTACK                       
 5826 101058 zfs                 -                   mi_switch _sleep vfs_busy zfs_vfs_ref getzfsvfs_impl getzfsvfs zfsctl_snapshot_unmount zfs_ioc_destroy_snaps zfsdev_ioctl_common zfsdev_ioctl devfs_ioctl vn_ioctl devfs_ioctl_f kern_ioctl sys_ioctl amd64_syscall fast_syscall_common

At this point no zfs or zpool commands succeed. (Or at least, none that I tried; all hang.)

Restart required to unwedge.

Edit: This system had been running (prior to 13.1 upgrade) on 12.1 (and earlier) with these actions (user NFS snapshot access, which is very useful for users to be able to recover files, snapshot rotations; etc.) all working beautifully for years.

Additional context

I initially experienced this on a custom kernel, but have reproduced with GENERIC; users on irc have reproduced on CURRENT. Reported by multiple other users as well on the FreeBSD bug report.

A suggestion was made on the FreeBSD bugzilla to have an OpenZFS bug report, so here I am.

@eborisch eborisch added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 30, 2022
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Oct 12, 2022
- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check.  That appears to be Linux-specific:
  zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
  snapshot directory is mounted.  On FreeBSD it fails, making snapshot
  dirs inaccessible via NFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf88 ("FreeBSD: vfsops: use setgen for error case")
Closes openzfs#14001
Closes openzfs#13974
ghost pushed a commit to truenas/zfs that referenced this issue Oct 21, 2022
- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check.  That appears to be Linux-specific:
  zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
  snapshot directory is mounted.  On FreeBSD it fails, making snapshot
  dirs inaccessible via NFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf88 ("FreeBSD: vfsops: use setgen for error case")
Closes openzfs#14001
Closes openzfs#13974
(cherry picked from commit ed566bf)
behlendorf pushed a commit that referenced this issue Oct 26, 2022
- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check.  That appears to be Linux-specific:
  zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
  snapshot directory is mounted.  On FreeBSD it fails, making snapshot
  dirs inaccessible via NFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf88 ("FreeBSD: vfsops: use setgen for error case")
Closes #14001
Closes #13974
(cherry picked from commit ed566bf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

1 participant