Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in NFS filehandle decoding for snapshots [code issue identified and reproducible] #7764

Closed
BonganiMalalane opened this issue Aug 3, 2018 · 7 comments · Fixed by #7864
Projects

Comments

@BonganiMalalane
Copy link

BonganiMalalane commented Aug 3, 2018

Case Title : Kernel Panic BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
Case Number : 02151530
Case Open Date : 2018-07-30 18:05:51
Severity : 3 (Normal)
Problem Type : Account / Customer Service Request

Most recent comment: On 2018-08-03 09:48:15, Red Hat Engineer commented:
"Hello,

I have analyzed the vmcore and my findings to outline the details of the cause of the crash and the discrepancies.
The next steps mentioned towards the end.

Summary of the crash context:

The crash occurred due to an invalid dentry deference in the context of panic. The control of execution was attempting the deference the inode off the dentry structure and obtain its mode for a condition check.
The dentry corruption has happened a few instructions prior to the crash. The function pointer responsible for returning the dentry was pointing to a zfs file system. The ZFS module is involved in the crash as its returning a dentry which is invalid and thereafter a deference of that dentry caused a crash.

------ vmcore analysis --

crash> sys
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.21.1.el7.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/290988954/crash/vmcore [PARTIAL DUMP]
CPUS: 20
DATE: Mon Jul 30 06:48:55 2018
UPTIME: 4 days, 23:11:30
LOAD AVERAGE: 1.21, 1.41, 1.70
TASKS: 1292
NODENAME: xxxxxxxxxx
RELEASE: 3.10.0-693.21.1.el7.x86_64
VERSION: #1 SMP Fri Feb 23 18:54:16 UTC 2018
MACHINE: x86_64 (2599 Mhz)
MEMORY: 32 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000058"

xxxxxxxxxx:root:/hubvol> modinfo zfs | grep -iw version
version: 0.7.7-1

xxxxxxxxxx:root:/hubvol> modinfo spl | grep -iw version
version: 0.7.7-1

Backtrace of the crashed task:

crash> bt
PID: 9151 TASK: ffff8807c8bedee0 CPU: 16 COMMAND: "nfsd"
#0 [ffff8808085f7898] machine_kexec at ffffffff8105d77b
#1 [ffff8808085f78f8] __crash_kexec at ffffffff81108732
#2 [ffff8808085f79c8] crash_kexec at ffffffff81108820
#3 [ffff8808085f79e0] oops_end at ffffffff816b8778
#4 [ffff8808085f7a08] no_context at ffffffff816a7c7a
#5 [ffff8808085f7a58] __bad_area_nosemaphore at ffffffff816a7d10
#6 [ffff8808085f7aa0] bad_area_nosemaphore at ffffffff816a7e7a
#7 [ffff8808085f7ab0] __do_page_fault at ffffffff816bb68e
#8 [ffff8808085f7b10] do_page_fault at ffffffff816bb835
#9 [ffff8808085f7b40] page_fault at ffffffff816b7768
[exception RIP: exportfs_decode_fh+0x8d]
RIP: ffffffff812947fd RSP: ffff8808085f7bf0 RFLAGS: 00010207
RAX: 0000000000000028 RBX: ffff8808172b3aa0 RCX: 0000000006f457e4
RDX: 0000000000000028 RSI: 0000000000001000 RDI: ffff88017fc03300
RBP: ffff8808085f7d38 R8: 000000000001ba40 R9: ffffffffc0444f65
R10: ffff88081ea1ba40 R11: ffffea001f179800 R12: ffffffffc03c6540
R13: 0000000000000028 R14: ffffffffc08365e0 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff8808085f7d40] fh_verify at ffffffffc03c72b8 [nfsd]
#11 [ffff8808085f7db0] nfsd3_proc_fsinfo at ffffffffc03d13e5 [nfsd]
#12 [ffff8808085f7de0] nfsd_dispatch at ffffffffc03c35a0 [nfsd]
#13 [ffff8808085f7e18] svc_process_common at ffffffffc0366646 [sunrpc]
#14 [ffff8808085f7e78] svc_process at ffffffffc03669f3 [sunrpc]
#15 [ffff8808085f7ea0] nfsd at ffffffffc03c2eff [nfsd]
#16 [ffff8808085f7ec8] kthread at ffffffff810b4031
#17 [ffff8808085f7f50] ret_from_fork at ffffffff816c0577
crash>

== Disassembly of crash IP
crash> dis -lr exportfs_decode_fh+0x8d | tail -15
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 392
0xffffffff812947d3 <exportfs_decode_fh+0x63>: mov -0x144(%rbp),%edx
0xffffffff812947d9 <exportfs_decode_fh+0x69>: mov -0x140(%rbp),%rsi
0xffffffff812947e0 <exportfs_decode_fh+0x70>: callq 0xffffffff81337db0 <__x86_indirect_thunk_rax>
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 393
0xffffffff812947e5 <exportfs_decode_fh+0x75>: test %rax,%rax
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 392
0xffffffff812947e8 <exportfs_decode_fh+0x78>: mov %rax,%r13
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 393
0xffffffff812947eb <exportfs_decode_fh+0x7b>: je 0xffffffff812948b0 <exportfs_decode_fh+0x140>
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 395
0xffffffff812947f1 <exportfs_decode_fh+0x81>: cmp $0xfffffffffffff000,%rax
0xffffffff812947f7 <exportfs_decode_fh+0x87>: ja 0xffffffff812948a2 <exportfs_decode_fh+0x132>
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 398
0xffffffff812947fd <exportfs_decode_fh+0x8d>: mov 0x30(%rax),%rax << Crashed here

=== source code of crash context
struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
int fh_len, int fileid_type,
int (*acceptable)(void *, struct dentry *), void *context)
{
const struct export_operations *nop = mnt->mnt_sb->s_export_op; << I:[1] export operations of zfs
struct dentry *result, *alias;
char nbuf[NAME_MAX+1];
int err;

/*

  • Try to get any dentry for the given file handle from the filesystem.
    */
    if (!nop || !nop->fh_to_dentry)
    return ERR_PTR(-ESTALE);
    result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); << I:[2] to return the dentry of zfs
    if (!result)
    result = ERR_PTR(-ESTALE);
    if (IS_ERR(result))
    return result;

if (S_ISDIR(result->d_inode->i_mode)) { << Crashed here I:[3] Crashed here
/*

  • This request is for a directory.
  • On the positive side there is only one dentry for each
  • directory inode. On the negative side this implies that we
  • to ensure our dentry is connected all the way up to the
  • filesystem root.
    */

<..>


Source code with assembly snippet for the checks

387 /*
388 * Try to get any dentry for the given file handle from the filesystem.
389 */
390 if (!nop || !nop->fh_to_dentry)
391 return ERR_PTR(-ESTALE);
392 result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); = > mov %rax,%r13

=> The dentry returned for this zfs filesystem was corrupted and held value shown in RAX which is result
R13: 0000000000000028 RAX: 0000000000000028

393 if (!result) << condition check passed as it was not 0x0
394 result = ERR_PTR(-ESTALE);
395 if (IS_ERR(result))
396 return result;
397
398 if (S_ISDIR(result->d_inode->i_mode)) { << Crashed

crashed here as the (dentry*) in result is invalid and hence an invalid dereference.

<exportfs_decode_fh+0x8d>: mov 0x30(%rax),%rax

RAX: 0000000000000028
RAX + 0x30 = 0x58 which was the address tried to deference and hence the crash as this in invalid addr

PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000058


== Obtain the export_operations in this context and the
crash> dis -lr exportfs_decode_fh+0x8d | tail -30
0xffffffff81294797 <exportfs_decode_fh+0x27>: mov %gs:0x28,%rax
0xffffffff812947a0 <exportfs_decode_fh+0x30>: mov %rax,-0x30(%rbp)
0xffffffff812947a4 <exportfs_decode_fh+0x34>: xor %eax,%eax
0xffffffff812947a6 <exportfs_decode_fh+0x36>: mov %edx,-0x144(%rbp)
0xffffffff812947ac <exportfs_decode_fh+0x3c>: mov %r9,-0x138(%rbp)
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 382
0xffffffff812947b3 <exportfs_decode_fh+0x43>: mov 0x48(%rdi),%r14

crash> bt | grep R14
R13: 0000000000000028 R14: ffffffffc08365e0 R15: 0000000000000001
crash>

The export operations passed to exportfs_decode_fh() was ffffffffc08365e0
which points to zpl_export_operations [zfs] from zfs module.
The zfs module used in this context is tainting the kernel.

crash> sym ffffffffc08365e0
ffffffffc08365e0 (r) zpl_export_operations [zfs] << I:[1] as mentioned above

crash> export_operations ffffffffc08365e0
struct export_operations {
encode_fh = 0xffffffffc07b94b0,
fh_to_dentry = 0xffffffffc07b9560, << I:[2] Function ptr invoked for dentry
fh_to_parent = 0x0,
get_name = 0x0,
get_parent = 0xffffffffc07b93f0,
commit_metadata = 0xffffffffc07b9370,
get_uuid = 0x0,
map_blocks = 0x0,
commit_blocks = 0x0
}

crash> sym 0xffffffffc07b9560 << I:[2]
ffffffffc07b9560 (t) zpl_fh_to_dentry [zfs]

This zpl_fh_to_dentry() returned an invalid dentry while called in exportfs_decode_fh() from the export_operations of this zfs filesystem, which nfsd tried to invoke. This zpl_fh_to_dentry() is called because the operation was related to the dentry of the zfs file from the given file handle (nop->fh_to_dentry)

crash> sym zpl_export_operations
ffffffffc08365e0 (r) zpl_export_operations [zfs]

crash> rd -S ffffffffc08365e0 32
ffffffffc08365e0: zpl_encode_fh zpl_fh_to_dentry
ffffffffc08365f0: 0000000000000000 0000000000000000
ffffffffc0836600: zpl_get_parent zpl_commit_metadata
ffffffffc0836610: 0000000000000000 0000000000000000
ffffffffc0836620: 0000000000000000 0000000000000000
ffffffffc0836630: 0000000000000000 0000000000000000
ffffffffc0836640: 746972775f6c707a 0000736567617065
ffffffffc0836650: 0000000000000000 0000000000000000
ffffffffc0836660: 0000000000000000 generic_file_llseek
ffffffffc0836670: generic_read_dir 0000000000000000
ffffffffc0836680: 0000000000000000 0000000000000000
ffffffffc0836690: zpl_readdir 0000000000000000
ffffffffc08366a0: zpl_ioctl zpl_compat_ioctl
ffffffffc08366b0: 0000000000000000 0000000000000000
ffffffffc08366c0: 0000000000000000 0000000000000000
ffffffffc08366d0: zpl_fsync 0000000000000000

As you note the zpl_export_operations ffffffffc08365e0 in the stack of the crashed task 9151

crash> search -t ffffffffc08365e0
PID: 9151 TASK: ffff8807c8bedee0 CPU: 16 COMMAND: "nfsd"
ffff8808085f77a8: ffffffffc08365e0
ffff8808085f7908: ffffffffc08365e0
ffff8808085f7a88: ffffffffc08365e0
ffff8808085f7af8: ffffffffc08365e0
ffff8808085f7b50: ffffffffc08365e0

crash> bt -f | grep fh_verify -A10
#10 [ffff8808085f7d40] fh_verify at ffffffffc03c72b8 [nfsd]
ffff8808085f7d48: ffff880807d82018 ffff880800000006
ffff8808085f7d58: 00000100bada3ac0 ffff8808085f7dd8
ffff8808085f7d68: ffffffffc03ced94 ffff8808085f7e00
ffff8808085f7d78: 00000000af705346 ffff880807d81000
ffff8808085f7d88: 0000000000100000 ffff880807d82000
ffff8808085f7d98: ffff8800bada2000 000000000000001c
ffff8808085f7da8: ffff8808085f7dd8 ffffffffc03d13e5
#11 [ffff8808085f7db0] nfsd3_proc_fsinfo at ffffffffc03d13e5 [nfsd]

r14: svc_rqst ffff8800bada2000
r13: svc_fh ffff880807d82000

=== The ZFS file in context of the crash was

crash> files -d ffff8807f48a5800
DENTRY INODE SUPERBLK TYPE PATH
ffff8807f48a5800 ffff880433360a00 ffff880819935800 DIR /hubvol/Staging/Hub/Dynamic/Apps/ZA/.zfs/snapshot

crash> p_dentry ffff8807f48a5800
(struct dentry *)0xffff8807f48a5800
/.zfs/snapshot
.d_flags = 0x01000084 - negative dentry - OP_REVALIDATE | RCUACCESS (undecoded bits: 0x0000000001000000)
.d_parent - (struct dentry *)0xffff8807f48a46c0
/.zfs
.d_flags = 0x01000084 - negative dentry - OP_REVALIDATE | RCUACCESS (undecoded bits: 0x0000000001000000)
.d_parent - (struct dentry *)0xffff88081e189800
.d_inode - (struct inode *)0xffff8807f258a6a8
.d_sb - (struct super_block *)0xffff880819935800 zfs
.d_inode - (struct inode *)0xffff880433360a00
.i_mode - 40777, .i_size - 0, .i_uid - 0, .i_uid - 0
.i_state = (nil) -
.i_flags = 0x00000000 -
.i_sb - (struct super_block *)0xffff880819935800 zfs

=== ZFS Mount details:

crash> mount | grep "/hubvol/Staging/Hub/Dynamic/Apps/ZA"
ffff8808172b3a80 ffff880819935800 zfs hubvol/Staging/ZA /hubvol/Staging/Hub/Dynamic/Apps/ZA

== ZFS module version details:

crash> module.version,name,srcversion ffffffffc0873220
version = 0xffff880819e01330 "0.7.7-1"
name = "zfs

=== Tainted modules on this system:
crash> sys -t
TAINTED_MASK: 3001 POE

crash> mod -t
NAME TAINTS
spl OE
zcommon POE
zavl POE
znvpair POE
icp POE
zunicode POE
zfs POE

@siebenmann
Copy link
Contributor

I believe that this is a bug in zfs_snapdir_vget(). The erroneous dentry returned is 0x28, or decimal 40, which is ELOOP, which is very suggestive of a confusion between the ZFS code (where error returns are positive values, eg ELOOP) and the Linux kernel code (where error returns are negative values, eg -ELOOP). In zpl_fh_to_dentry(), ZFS converts any non-zero return from zfs_vget(), assumed to be a ZFS error (positive) to a kernel error (negative) by negating it, in the 0.7 code here. I believe there is a path through zfs_vget() that will return a kernel (negative) error, which will then be negated into a small positive number by this code.

This path is zfs_vget() -> zfs_snapdir_vget() -> a failed call to kern_path() with the comment 'trigger automount' (in the 0.7 code here). As the comment in zfsctl_snapshot_mount() covers, in highly contended situations the VFS lookup can return ELOOP, which is really -ELOOP because this is a kernel (negative) error. This negative kernel error will be returned as-is up through zfs_vget() and then we explode.

(Any other kernel errors returned from kern_path() would cause similar explosions.)

I think the direct fix would be to change zfsctl_snapdir_vget() to do:

        /* Trigger automount */
        error = -kern_path(mnt, LOOKUP_FOLLOW|LOOKUP_DIRECTORY, &path);
        if (error)
                goto out;

This should turn a negative kernel error into a positive ZFS error if there is one.

(As an additional safety check, one could make zpl_fh_to_dentry() explicitly verify that the error is not negative before it negates it.)

A quick grep of the source code suggests that this is the only use of kern_path() in the ZoL code, but similar issues with directly returning kernel errors as ZFS errors may be lurking in other spots.

@brandonhaberfeld
Copy link

brandonhaberfeld commented Aug 28, 2018 via email

@brandonhaberfeld
Copy link

This of course doesn't explain why the ELOOP error occurs in the first place..... There is definitely a behaviour under high load and concurrent access to snaps via samba/nfs that causes the ELOOP - we are seeing this current all the time. Interestingly - an fuser -c shows the same prcoesses accessing all the snaps in the .zfs/snapshot directory and it itself causing the ELOOP -

The act of testing the directory causes the loop (as in the Heisenberg principle).

@siebenmann
Copy link
Contributor

By the way, it turns out that there is a simple reproduction for this, because we can artificially induce an ELOOP error by mounting the snapshot by hand:

  1. NFS export a filesystem, say /tank/test, with the crossmnt option. Mount it on a client machine somewhere, say at /mnt.
  2. make a snapshot: zfs snapshot tank/test@boom
  3. manually mount the snapshot on the NFS server at a convenient but non-default location: mount -t zfs -o ro,nosuid,nodev,relatime,xattr,noacl tank/test@boom /mnt
  4. On the client machine, attempt to access the snapshot: cd /mnt/.zfs/snapshot/boom

The NFS server will immediately panic with this error. This happens completely reliably because ZFS refuses to mount an already-mounted snapshot a second time, so the kernel's upcall to try to mount the snapshot on its normal location of /tank/test/.zfs/snapshot/boom will always fail.

@brandonhaberfeld
Copy link

brandonhaberfeld commented Aug 28, 2018 via email

@BonganiMalalane BonganiMalalane changed the title ZFS Kernel Panic BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 Panic in NFS filehandle decoding for snapshots [code issue identified and reproducible] Aug 29, 2018
@brandonhaberfeld
Copy link

This is perhaps better:

  /* Trigger automount */
        error = abs(kern_path(mnt, LOOKUP_FOLLOW|LOOKUP_DIRECTORY, &path));
        if (error)
                goto out;

@behlendorf
Copy link
Contributor

@siebenmann I agree, this looks like it's entirely due to failing to properly negate the return value of kern_path() in the error case. The fix you proposed above is the right fix. The -ELOOP error will get returned up the stack and converted to a generic -ESTALE in exportfs_decode_fh(). Would you mind opening a PR with the fix so we can get this included.

siebenmann added a commit to siebenmann/zfs that referenced this issue Sep 4, 2018
As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.

Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes openzfs#7764
behlendorf pushed a commit that referenced this issue Sep 5, 2018
As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes #7764
Closes #7864
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 5, 2018
As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes openzfs#7764
Closes openzfs#7864
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 5, 2018
As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes openzfs#7764
Closes openzfs#7864
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.7.10
  
Awaiting triage
Development

Successfully merging a pull request may close this issue.

4 participants