Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS diff causes hard lockups when file has invalid parent object #2597

Closed
gbooker opened this issue Aug 14, 2014 · 31 comments
Closed

ZFS diff causes hard lockups when file has invalid parent object #2597

gbooker opened this issue Aug 14, 2014 · 31 comments
Milestone

Comments

@gbooker
Copy link

gbooker commented Aug 14, 2014

This may be the same as #2139 but my dmesg looks different so I'm posting in another bug just in case. Also, my FS did require #1927 before it was able to start doing any diffs.

If I run a zfs diff between two snapshots on my FS, it will start performing the diff and then hard lock in the same place. It seems that when it locks, the entire storage system is inaccessible. I described much of the process in the mailing list: https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/qKH9-uw8eTk I'll repeat the most important aspects here as well as additional findings.

I run "zfs diff zroot/media@20121010 zroot/media@20121201" and it starts the process of the diff. Part of the way through the diff, the process hangs. Once, and only once, I was able to get the output of dmesg while it was hung. Most of the time I can't do anything at all during the hang.
https://gist.github.com/gbooker/9efd05157114b5e8365d

The disk mentioned in the dmesg output, ata8.00/sdd, is the boot drive, a 60GB Intel SSD. As a test, I boot from an Xubuntu USB install with this drive disconnected. When I ran the diff, it locked again, only this time it complained about ata5, which is the BD drive (which had no media in the drive and hadn't been used).

Since my motherboard had 6 SATA ports on it, I moved all drives to those ports and removed my 4 port SATA card. Repeated the test with the USB drive and it locked again (saw no information about which drive it complained about this time).

Finally, I fashioned a USB drive with FreeBSD 10 on it (with all the hardware connected again). Ran the same diff and it finished normally. It ended the diff with the line:
Unable to determine path or stats for object 14941 in zroot/media@20121010: Invalid argument

I ran zdb on this object ID in Linux, and it CPU spun before the line printing out the path. The machine was not locked and top reported zdb at 100% CPU usage. Ctrl-C did kill the zdb process. I was able to run zdb in FreeBSD and it reported:
https://gist.github.com/gbooker/901249aad60bfba08c0c

The parent object, 14940, is another file. Google yielded another who had the same problem, which appears to not actually be problem:
http://lists.freebsd.org/pipermail/freebsd-fs/2012-November/015654.html
which was then followed up by:
https://www.mail-archive.com/zfs-discuss@opensolaris.org/msg50086.html
The latter thread describes a process in which this can occur. While I don't use hard links often, I have used it from time to time on this FS.

If I understand this correctly, it appears that using hard links results in a possibly invalid parent object id. When zfs diff encounters such an id, in my case the parent id was a different file object, it hard locks the machine. So, zfs diff should be more tolerant of this situation and print the path as unknown or try to ascertain the real path or paths.

@Ringdingcoder
Copy link

Same situation here. My system locks up completely, so I cannot get any meaningful information. After a while a stack trace almost completely identical to the one from https://gist.github.com/gbooker/9efd05157114b5e8365d appears on the console, but nothing works. Nor the keyboard, nor the network.

The one additional bit of information I can contribute: When I boot the system into Illumos and try to run zfs diff from there, the result is this: Unable to determine path or stats for object 21 in <snapshot>: Invalid argument. It doesn't hang, but fails to produce a diff output.

Where 21 seems to be the file system at the root of the pool, which only contains child datasets, nothing else:

[sr@dumpl ~]$ sudo zdb -d dpool
Dataset mos [META], ID 0, cr_txg 4, 179M, 1242 objects
...
Dataset dpool/dump@zfs-auto-snap_daily-2014-08-05-1856 [ZPL], ID 5740, cr_txg 76525, 1.42T, 1083556 objects
...
Dataset dpool [ZPL], ID 21, cr_txg 1, 256K, 11 objects

@behlendorf
Copy link
Contributor

This issue may be the same as #2602. Both end up blocked in zap_get_leaf_byblk() and it's unclear what process is holding the conflicting lock.

@gbooker
Copy link
Author

gbooker commented Aug 20, 2014

@Ringdingcoder If you run zdb -dddd dpool 21, it will print out information on object 21. Note: when I did this on my object in Ubuntu, zdb froze, but I could Ctrl-C it. I ran it under FreeBSD to yield the information I needed.

@Ringdingcoder
Copy link

Ok, it was very well behaved:

Dataset mos [META], ID 0, cr_txg 4, 116M, 1320 objects, rootbp DVA[0]=<0:3403a74a000:2000> DVA[1]=<0:4d005a72000:2000> DVA[2]=<0:64002cb4000:2000> [L0 DMU objset] fletcher4 lzjb LE contiguous unique triple size=800L/200P birth=80105L/80105P fill=1320 cksum=12f86884e8:6e6ab149b07:14cb89bd7e2c3:2b389c2d302290

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
        21    1    16K    512      0    512    0.00  DSL dataset
                                        320   bonus  DSL dataset
    dnode flags:
    dnode maxblkid: 0
        dir_obj = 2
        prev_snap_obj = 18
        prev_snap_txg = 1
        next_snap_obj = 0
        snapnames_zapobj = 22
        num_children = 0
        userrefs_obj = 0
        creation_time = Tue Jul 15 21:25:59 2014
        creation_txg = 1
        deadlist_obj = 50
        used_bytes = 234K
        compressed_bytes = 20.5K
        uncompressed_bytes = 20.5K
        unique = 234K
        fsid_guid = 3240303165571712
        guid = 4995769017485270647
        flags = 4
        next_clones_obj = 0
        props_obj = 0
        bp = DVA[0]=<0:3402add6000:2000> DVA[1]=<0:4d004f80000:2000> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=79826L/79826P fill=9 cksum=16b985440a:7f16133632f:1778cdd6ecbb5:3082514c25d9f9

@gbooker
Copy link
Author

gbooker commented Aug 22, 2014

@behlendorf Since we both seem to be able to reproduce this bug, is there anything we can do to provide more information?

@Ringdingcoder
Copy link

I have tried to get anything out of the patch from #2602, but I could not see printk's output. I then wrongly assumed that this was because I needed to build a debug kernel, which was an annoyingly laborious endeavor. When I finally had managed to do this, and insmod of the zfs module failed because of missing symbols (spl_mutex_spin_max, task_curr), I noted that the printk invocation was simply missing a log level :(. However, even after adding KERN_WARNING, for the life of me, I cannot get it to show the printk output.

@Ringdingcoder
Copy link

Trying to correlate the address shown in the stack trace with the assembly and the source code, it seems like the rw_exit right after the call to dmu_buf_set_user in zap_open_leaf is the one hanging. I'm not so much into kernel development. Can the exit call hang? At least the generated assembly shows calls to _raw_spin_lock_irqsave and _raw_spin_unlock_irqrestore for the rw_exit line.

@behlendorf behlendorf added this to the 0.6.4 milestone Aug 25, 2014
@behlendorf behlendorf added the Bug label Aug 25, 2014
@behlendorf
Copy link
Contributor

@Ringdingcoder Thanks for looking in to this. Yes, an unlock can potentially hang if the spinlock in question somehow gets damaged. I've been chasing a similar issue over in #2523. Are you able to reproduce the issue fairly easily?

@Ringdingcoder
Copy link

@behlendorf Yes, I can trigger it any time I want to. I just need to zfs diff two arbitrary snapshots, and within half a second everything hangs.

@gbooker
Copy link
Author

gbooker commented Aug 26, 2014

@behlendorf Same. Mine takes about a minute instead of half a second, but hangs 100% of the time.

@behlendorf
Copy link
Contributor

@Ringdingcoder @gbooker can you try building the spl and zfs with the --enable-debug option. This will enable the ASSERTs sprinkled throughout the code which are normally dormant. That might give us a little more detail. If one of them triggers it will halt the thread and log a message to the console.

@Ringdingcoder
Copy link

@behlendorf Nothing :( – That is, same behavior as without --enable-debug.

@gbooker
Copy link
Author

gbooker commented Aug 28, 2014

What is the best way to build with --enable-debug? Do I use dkms or is it a configure/make kind of process?

@gbooker
Copy link
Author

gbooker commented Aug 29, 2014

I asked how to build with --enable-debug and there was asked to make a documentation ticket: #2642

I got additional debug information this time:

Aug 29 13:03:57 Mira kernel: [  181.324507] SPLError: 3053:0:(zap_micro.c:464:zap_lockdir()) VERIFY3((((doi.doi_type) & 0x80) ? ((doi.doi_type) & 0x3f) : dmu_ot[(int)(doi.doi_type)].ot_byteswap) == DMU_BSWAP_ZAP) failed (0 == 4)
Aug 29 13:03:57 Mira kernel: [  181.325649] SPLError: 3053:0:(zap_micro.c:464:zap_lockdir()) SPL PANIC
Aug 29 13:03:57 Mira kernel: [  181.326233] SPL: Showing stack for process 3053
Aug 29 13:03:57 Mira kernel: [  181.326235] CPU: 0 PID: 3053 Comm: zfs Tainted: PF          O 3.13.0-34-generic #60-Ubuntu
Aug 29 13:03:57 Mira kernel: [  181.326236] Hardware name:                  /DQ67OW, BIOS SWQ6710H.86A.0067.2014.0313.1347 03/13/2014
Aug 29 13:03:57 Mira kernel: [  181.326237]  ffff8803149de800 ffff8802e6199a30 ffffffff8171bd94 0000000000000000
Aug 29 13:03:57 Mira kernel: [  181.326241]  ffff8802e6199a40 ffffffffa0156487 ffff8802e6199a68 ffffffffa01576c8
Aug 29 13:03:57 Mira kernel: [  181.326242]  ffffffffa0170ed1 0000000000000001 0000000000000002 ffff8802e6199b40
Aug 29 13:03:57 Mira kernel: [  181.326244] Call Trace:
Aug 29 13:03:57 Mira kernel: [  181.326250]  [<ffffffff8171bd94>] dump_stack+0x45/0x56
Aug 29 13:03:57 Mira kernel: [  181.326265]  [<ffffffffa0156487>] spl_debug_dumpstack+0x27/0x40 [spl]
Aug 29 13:03:57 Mira kernel: [  181.326270]  [<ffffffffa01576c8>] spl_debug_bug+0x78/0xe0 [spl]
Aug 29 13:03:57 Mira kernel: [  181.326301]  [<ffffffffa02bb00a>] zap_lockdir+0x10a/0xd60 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326321]  [<ffffffffa0277610>] ? sa_spill_free+0x20/0x20 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326344]  [<ffffffffa02bbf74>] zap_cursor_retrieve+0x314/0x450 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326362]  [<ffffffffa027c5b7>] ? sa_bulk_lookup+0x57/0x110 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326380]  [<ffffffffa0277d9d>] ? sa_find_idx_tab+0x1fd/0x330 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326401]  [<ffffffffa02b5b21>] zap_value_search+0x91/0xd0 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326425]  [<ffffffffa02ebde1>] zfs_obj_to_path_impl+0x1d1/0x380 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326443]  [<ffffffffa027c4bd>] ? sa_bulk_lookup_locked+0x3d/0xe0 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326461]  [<ffffffffa027c5b7>] ? sa_bulk_lookup+0x57/0x110 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326478]  [<ffffffffa027c264>] ? sa_handle_get+0x44/0x60 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326501]  [<ffffffffa02eb601>] ? zfs_obj_to_stats_impl+0xb1/0xc0 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326523]  [<ffffffffa02f1b6e>] zfs_obj_to_stats+0x9e/0xf0 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326545]  [<ffffffffa02ce074>] zfs_ioc_obj_to_stats+0x84/0xa0 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326566]  [<ffffffffa02d4826>] zfsdev_ioctl+0x4d6/0x550 [zfs]
Aug 29 13:03:57 Mira kernel: [  181.326570]  [<ffffffff811cfd10>] do_vfs_ioctl+0x2e0/0x4c0
Aug 29 13:03:57 Mira kernel: [  181.326572]  [<ffffffff8109ddf4>] ? vtime_account_user+0x54/0x60
Aug 29 13:03:57 Mira kernel: [  181.326575]  [<ffffffff811cff71>] SyS_ioctl+0x81/0xa0
Aug 29 13:03:57 Mira kernel: [  181.326577]  [<ffffffff8172c97f>] tracesys+0xe1/0xe6

behlendorf added a commit to behlendorf/zfs that referenced this issue Aug 29, 2014
If a non-ZAP object is passed to zap_lockdir() it will be treated
as a valid ZAP object.  This can result in zap_lockdir() attempting
to read what it believes are leaf blocks from invalid disk locations.
The SCSI layer will eventually generate errors for these bogus IOs
but the caller will hang in zap_get_leaf_byblk().

The good news is that is a situation which can not occur unless the
pool has been damaged.  The bad news is that there are reports from
both FreeBSD and Solaris of damaged pools.  Specifically, there are
normal files in the filesystem which reference another normal file
as their parent.

Since pools like this are known to exist the zap_lockdir() function
has been updated to verify the type of the object.  If a non-ZAP
object has been passed it EINVAL will be returned immediately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2597
Issue openzfs#2602
@behlendorf
Copy link
Contributor

@gbooker could you apply the patch in 2644 to your dkms sources. It won't fix the on disk damage but it should prevent the hang so it will behave like FreeBSD.

@gbooker
Copy link
Author

gbooker commented Aug 30, 2014

@behlendorf After applying the patch, the zfs diff no longer deadlocks the machine. However, the diff process hangs, and it cannot be killed. Furthermore, the machine hung on reboot.

Is this actually file system damage? What I read seemed to indicate that the parent ID is not guaranteed to be correct in the case of files. One means by which it can be incorrect is the result of using hard links and then deleting the link but not the original file. I did do this in the past on this file system.

@Ringdingcoder
Copy link

I noticed that the network continues to function, which gave me the idea of configuring kdump and taking a crash dump while it is hung. It boots into the kdump image, but unfortunately, instead of a crash dump, I get this every 11 seconds:

system-udevd[227]: timeout: killing '/sbin/zpool list' [230]

@Ringdingcoder
Copy link

What I've written in my previous reply is utter nonsense. It seems that the various debug patches have modified the behavior so that it did not hang the entire machine anymore, just the zfs process, and I did not notice the difference at first.

The thing about kdump hanging is still valid, but is another issue entirely. In fact kdump started working once I removed the zfs modules and executables.

FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 5, 2014
If a non-ZAP object is passed to zap_lockdir() it will be treated
as a valid ZAP object.  This can result in zap_lockdir() attempting
to read what it believes are leaf blocks from invalid disk locations.
The SCSI layer will eventually generate errors for these bogus IOs
but the caller will hang in zap_get_leaf_byblk().

The good news is that is a situation which can not occur unless the
pool has been damaged.  The bad news is that there are reports from
both FreeBSD and Solaris of damaged pools.  Specifically, there are
normal files in the filesystem which reference another normal file
as their parent.

Since pools like this are known to exist the zap_lockdir() function
has been updated to verify the type of the object.  If a non-ZAP
object has been passed it EINVAL will be returned immediately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2597
Issue openzfs#2602
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 5, 2014
If a non-ZAP object is passed to zap_lockdir() it will be treated
as a valid ZAP object.  This can result in zap_lockdir() attempting
to read what it believes are leaf blocks from invalid disk locations.
The SCSI layer will eventually generate errors for these bogus IOs
but the caller will hang in zap_get_leaf_byblk().

The good news is that is a situation which can not occur unless the
pool has been damaged.  The bad news is that there are reports from
both FreeBSD and Solaris of damaged pools.  Specifically, there are
normal files in the filesystem which reference another normal file
as their parent.

Since pools like this are known to exist the zap_lockdir() function
has been updated to verify the type of the object.  If a non-ZAP
object has been passed it EINVAL will be returned immediately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2597
Issue openzfs#2602
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 5, 2014
If a non-ZAP object is passed to zap_lockdir() it will be treated
as a valid ZAP object.  This can result in zap_lockdir() attempting
to read what it believes are leaf blocks from invalid disk locations.
The SCSI layer will eventually generate errors for these bogus IOs
but the caller will hang in zap_get_leaf_byblk().

The good news is that is a situation which can not occur unless the
pool has been damaged.  The bad news is that there are reports from
both FreeBSD and Solaris of damaged pools.  Specifically, there are
normal files in the filesystem which reference another normal file
as their parent.

Since pools like this are known to exist the zap_lockdir() function
has been updated to verify the type of the object.  If a non-ZAP
object has been passed it EINVAL will be returned immediately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2597
Issue openzfs#2602
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 5, 2014
If a non-ZAP object is passed to zap_lockdir() it will be treated
as a valid ZAP object.  This can result in zap_lockdir() attempting
to read what it believes are leaf blocks from invalid disk locations.
The SCSI layer will eventually generate errors for these bogus IOs
but the caller will hang in zap_get_leaf_byblk().

The good news is that is a situation which can not occur unless the
pool has been damaged.  The bad news is that there are reports from
both FreeBSD and Solaris of damaged pools.  Specifically, there are
normal files in the filesystem which reference another normal file
as their parent.

Since pools like this are known to exist the zap_lockdir() function
has been updated to verify the type of the object.  If a non-ZAP
object has been passed it EINVAL will be returned immediately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2597
Issue openzfs#2602
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Sep 7, 2014
If a non-ZAP object is passed to zap_lockdir() it will be treated
as a valid ZAP object.  This can result in zap_lockdir() attempting
to read what it believes are leaf blocks from invalid disk locations.
The SCSI layer will eventually generate errors for these bogus IOs
but the caller will hang in zap_get_leaf_byblk().

The good news is that is a situation which can not occur unless the
pool has been damaged.  The bad news is that there are reports from
both FreeBSD and Solaris of damaged pools.  Specifically, there are
normal files in the filesystem which reference another normal file
as their parent.

Since pools like this are known to exist the zap_lockdir() function
has been updated to verify the type of the object.  If a non-ZAP
object has been passed it EINVAL will be returned immediately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2597
Issue openzfs#2602
@behlendorf
Copy link
Contributor

@Ringdingcoder Were you able to get a back track from the hung zfs process? cat /proc/pid/stack.

@behlendorf
Copy link
Contributor

Is this actually file system damage?

@gbooker it's my understanding that the parent object id of a ZPL file or directory object must always reference a ZAP object. This should be true even for hard links. If you've read something different can you point me to the post so I can have a look.

@gbooker
Copy link
Author

gbooker commented Sep 8, 2014

@behlendorf I ran across a reference to this in an OpenSolaris thread: https://www.mail-archive.com/zfs-discuss@opensolaris.org/msg50101.html I do not remember for sure, but the sequence of events described is at least very similar to actions I've taken in the past.

@behlendorf
Copy link
Contributor

@gbooker Yes, that's an entirely plausible scenario for causing this. In the example given it would result in the object referencing a now removed directory. Eventually that object ID will be reused and it could easily be reused for a normal file which would cause this. This definitely looks like a very long standing issue. We'll have to give some thought on how to handle it.

behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 8, 2014
If a non-ZAP object is passed to zap_lockdir() it will be treated
as a valid ZAP object.  This can result in zap_lockdir() attempting
to read what it believes are leaf blocks from invalid disk locations.
The SCSI layer will eventually generate errors for these bogus IOs
but the caller will hang in zap_get_leaf_byblk().

The good news is that is a situation which can not occur unless the
pool has been damaged.  The bad news is that there are reports from
both FreeBSD and Solaris of damaged pools.  Specifically, there are
normal files in the filesystem which reference another normal file
as their parent.

Since pools like this are known to exist the zap_lockdir() function
has been updated to verify the type of the object.  If a non-ZAP
object has been passed it EINVAL will be returned immediately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2597
Issue openzfs#2602
@Ringdingcoder
Copy link

Here you go:

[<ffffffff811eea30>] pipe_wait+0x70/0xc0
[<ffffffff811eecf5>] pipe_write+0x215/0x590
[<ffffffff811e6027>] do_sync_write+0x67/0xa0
[<ffffffff811e67ea>] vfs_write+0xba/0x1e0
[<ffffffffa000e5f1>] vn_rdwr+0xe1/0x3f0 [spl]
[<ffffffffa00d7c5d>] write_record.part.0+0x6d/0xa0 [zfs]
[<ffffffffa00d7ea1>] diff_cb+0x161/0x250 [zfs]
[<ffffffffa00e2175>] traverse_visitbp+0x2f5/0x890 [zfs]
[<ffffffffa00e22ef>] traverse_visitbp+0x46f/0x890 [zfs]
[<ffffffffa00e22ef>] traverse_visitbp+0x46f/0x890 [zfs]
[<ffffffffa00e22ef>] traverse_visitbp+0x46f/0x890 [zfs]
[<ffffffffa00e22ef>] traverse_visitbp+0x46f/0x890 [zfs]
[<ffffffffa00e22ef>] traverse_visitbp+0x46f/0x890 [zfs]
[<ffffffffa00e22ef>] traverse_visitbp+0x46f/0x890 [zfs]
[<ffffffffa00e2f85>] traverse_dnode+0x85/0x150 [zfs]
[<ffffffffa00e254b>] traverse_visitbp+0x6cb/0x890 [zfs]
[<ffffffffa00e28c1>] traverse_impl+0x1b1/0x4a0 [zfs]
[<ffffffffa00e2bf4>] traverse_dataset+0x44/0x50 [zfs]
[<ffffffffa00d8150>] dmu_diff+0x1c0/0x210 [zfs]
[<ffffffffa0164d4c>] zfs_ioc_diff+0x5c/0xb0 [zfs]
[<ffffffffa016b7d6>] zfsdev_ioctl+0x4c6/0x520 [zfs]
[<ffffffff811f9ad0>] do_vfs_ioctl+0x2e0/0x4a0
[<ffffffff811f9d11>] SyS_ioctl+0x81/0xa0
[<ffffffff816ff9e9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

@behlendorf
Copy link
Contributor

@Ringdingcoder Any chance you can get a stack for whatever is on the other side of the pipe. This looks like we're correctly blocking waiting for the write to complete.

@Ringdingcoder
Copy link

I don't know. What is this pipe? Who creates it? Why would it want to write to a pipe? Is the zfs diff command multi-threaded? Does the kernel side communicate with user space over this pipe?

@lundman
Copy link
Contributor

lundman commented Sep 10, 2014

userland opens stdout as an fd, and passes this to the kernel. Kernel calls vn_rdwr to write to the fd.
So normally, stdout, is just that. But since pipe_write appears, it seems you used zfs diff | into some other command, which is blocked. So what is on the right side of the |.

@Ringdingcoder
Copy link

No, I didn't use a pipe at the shell. The only thing I can think of is that this is the pipe connecting stdout to the sshd process running the shell, but why would that block? I can rerun the same thing from a tty later today and compare the stack trace then.

@Ringdingcoder
Copy link

As it turns out, the zfs process does indeed have two threads, and the other thread's stack is this:

[<ffffffffa00eaa82>] dnode_special_close+0x32/0x80 [zfs]
[<ffffffffa00dbeb4>] dmu_objset_evict+0xf4/0x780 [zfs]
[<ffffffffa00f2ee5>] dsl_dataset_evict+0x35/0x190 [zfs]
[<ffffffffa00c4104>] dbuf_evict_user+0x54/0xb0 [zfs]
[<ffffffffa00c66df>] dbuf_rele_and_unlock+0x10f/0x300 [zfs]
[<ffffffffa00c6bb8>] dmu_buf_rele+0x38/0x70 [zfs]
[<ffffffffa00f2ea2>] dsl_dataset_rele+0x12/0x20 [zfs]
[<ffffffffa00d9e68>] dmu_objset_rele+0x28/0x40 [zfs]
[<ffffffffa0164cde>] zfs_ioc_obj_to_stats+0xae/0xc0 [zfs]
[<ffffffffa016b7d6>] zfsdev_ioctl+0x4c6/0x520 [zfs]
[<ffffffff811f9ad0>] do_vfs_ioctl+0x2e0/0x4a0
[<ffffffff811f9d11>] SyS_ioctl+0x81/0xa0
[<ffffffff816ff9e9>] system_call_fastpath+0x16/0x1b

I don't know where the pipe comes from, but it seems to be something internal to zfs. The /proc/pid/fd directory looks like this:

lrwx------ 1 root root 64 Sep 10 19:56 0 -> /dev/tty3
lrwx------ 1 root root 64 Sep 10 19:56 1 -> /dev/tty3
lrwx------ 1 root root 64 Sep 10 19:56 2 -> /dev/tty3
lrwx------ 1 root root 64 Sep 10 19:56 3 -> /dev/zfs
lr-x------ 1 root root 64 Sep 10 19:56 4 -> /proc/2122/mounts
lr-x------ 1 root root 64 Sep 10 19:59 5 -> /etc/dfs/sharetab
lrwx------ 1 root root 64 Sep 10 19:59 6 -> /dev/zfs
lrwx------ 1 root root 64 Sep 10 19:56 7 -> /dev/zfs
lr-x------ 1 root root 64 Sep 10 19:59 8 -> pipe:[19329]
l-wx------ 1 root root 64 Sep 10 19:59 9 -> pipe:[19329]

ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
If a non-ZAP object is passed to zap_lockdir() it will be treated
as a valid ZAP object.  This can result in zap_lockdir() attempting
to read what it believes are leaf blocks from invalid disk locations.
The SCSI layer will eventually generate errors for these bogus IOs
but the caller will hang in zap_get_leaf_byblk().

The good news is that is a situation which can not occur unless the
pool has been damaged.  The bad news is that there are reports from
both FreeBSD and Solaris of damaged pools.  Specifically, there are
normal files in the filesystem which reference another normal file
as their parent.

Since pools like this are known to exist the zap_lockdir() function
has been updated to verify the type of the object.  If a non-ZAP
object has been passed it EINVAL will be returned immediately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2597
Issue openzfs#2602
@behlendorf behlendorf modified the milestones: 0.6.5, 0.6.4 Feb 6, 2015
@behlendorf behlendorf added Bug - Minor and removed Bug labels Feb 6, 2015
@behlendorf behlendorf modified the milestones: 0.7.0, 0.6.5 Jul 16, 2015
@Ringdingcoder
Copy link

It turns out that the previous zdb output was misleading because I used it on the wrong file system – the root of the pool instead of the child on which I ran the diff :)

Having read the opensolaris thread from 2012 linked above, it seems like a wrong parent pointer is indeed causing this. Object 21 is a file object created in 2009, shortly after I originally created the pool. Its parent points to an object from 2013. It is not supposed to be that way, I guess.

I’ll upgrade my machine to kernel 4.2.0 and zol 0.6.5 and see what happens.

@Ringdingcoder
Copy link

Very good, it does not hang anymore after the upgrade. It behaves the same as (a somewhat outdated) illumos now. That still does not make zfs diff usable for me, but at least it does not lock up everything.

@behlendorf
Copy link
Contributor

Since we've resolved the original reason this issue was opened, the hard lockup, I'm going to close this. @Ringdingcoder could you open a new issue and clearly describe what's still preventing zfs diff from being usable for you.

@behlendorf behlendorf modified the milestones: 0.6.5, 0.7.0 Sep 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants