-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS diff causes hard lockups when file has invalid parent object #2597
Comments
Same situation here. My system locks up completely, so I cannot get any meaningful information. After a while a stack trace almost completely identical to the one from https://gist.github.com/gbooker/9efd05157114b5e8365d appears on the console, but nothing works. Nor the keyboard, nor the network. The one additional bit of information I can contribute: When I boot the system into Illumos and try to run zfs diff from there, the result is this: Where 21 seems to be the file system at the root of the pool, which only contains child datasets, nothing else:
|
This issue may be the same as #2602. Both end up blocked in |
@Ringdingcoder If you run zdb -dddd dpool 21, it will print out information on object 21. Note: when I did this on my object in Ubuntu, zdb froze, but I could Ctrl-C it. I ran it under FreeBSD to yield the information I needed. |
Ok, it was very well behaved:
|
@behlendorf Since we both seem to be able to reproduce this bug, is there anything we can do to provide more information? |
I have tried to get anything out of the patch from #2602, but I could not see printk's output. I then wrongly assumed that this was because I needed to build a debug kernel, which was an annoyingly laborious endeavor. When I finally had managed to do this, and insmod of the zfs module failed because of missing symbols (spl_mutex_spin_max, task_curr), I noted that the printk invocation was simply missing a log level :(. However, even after adding KERN_WARNING, for the life of me, I cannot get it to show the printk output. |
Trying to correlate the address shown in the stack trace with the assembly and the source code, it seems like the rw_exit right after the call to dmu_buf_set_user in zap_open_leaf is the one hanging. I'm not so much into kernel development. Can the exit call hang? At least the generated assembly shows calls to _raw_spin_lock_irqsave and _raw_spin_unlock_irqrestore for the rw_exit line. |
@Ringdingcoder Thanks for looking in to this. Yes, an unlock can potentially hang if the spinlock in question somehow gets damaged. I've been chasing a similar issue over in #2523. Are you able to reproduce the issue fairly easily? |
@behlendorf Yes, I can trigger it any time I want to. I just need to zfs diff two arbitrary snapshots, and within half a second everything hangs. |
@behlendorf Same. Mine takes about a minute instead of half a second, but hangs 100% of the time. |
@Ringdingcoder @gbooker can you try building the spl and zfs with the |
@behlendorf Nothing :( – That is, same behavior as without |
What is the best way to build with |
I asked how to build with --enable-debug and there was asked to make a documentation ticket: #2642 I got additional debug information this time:
|
If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2597 Issue openzfs#2602
@gbooker could you apply the patch in 2644 to your dkms sources. It won't fix the on disk damage but it should prevent the hang so it will behave like FreeBSD. |
@behlendorf After applying the patch, the zfs diff no longer deadlocks the machine. However, the diff process hangs, and it cannot be killed. Furthermore, the machine hung on reboot. Is this actually file system damage? What I read seemed to indicate that the parent ID is not guaranteed to be correct in the case of files. One means by which it can be incorrect is the result of using hard links and then deleting the link but not the original file. I did do this in the past on this file system. |
I noticed that the network continues to function, which gave me the idea of configuring kdump and taking a crash dump while it is hung. It boots into the kdump image, but unfortunately, instead of a crash dump, I get this every 11 seconds:
|
What I've written in my previous reply is utter nonsense. It seems that the various debug patches have modified the behavior so that it did not hang the entire machine anymore, just the zfs process, and I did not notice the difference at first. The thing about kdump hanging is still valid, but is another issue entirely. In fact kdump started working once I removed the zfs modules and executables. |
If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2597 Issue openzfs#2602
If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2597 Issue openzfs#2602
If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2597 Issue openzfs#2602
If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2597 Issue openzfs#2602
If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2597 Issue openzfs#2602
@Ringdingcoder Were you able to get a back track from the hung zfs process? |
@gbooker it's my understanding that the parent object id of a ZPL file or directory object must always reference a ZAP object. This should be true even for hard links. If you've read something different can you point me to the post so I can have a look. |
@behlendorf I ran across a reference to this in an OpenSolaris thread: https://www.mail-archive.com/zfs-discuss@opensolaris.org/msg50101.html I do not remember for sure, but the sequence of events described is at least very similar to actions I've taken in the past. |
@gbooker Yes, that's an entirely plausible scenario for causing this. In the example given it would result in the object referencing a now removed directory. Eventually that object ID will be reused and it could easily be reused for a normal file which would cause this. This definitely looks like a very long standing issue. We'll have to give some thought on how to handle it. |
If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2597 Issue openzfs#2602
Here you go:
|
@Ringdingcoder Any chance you can get a stack for whatever is on the other side of the pipe. This looks like we're correctly blocking waiting for the write to complete. |
I don't know. What is this pipe? Who creates it? Why would it want to write to a pipe? Is the zfs diff command multi-threaded? Does the kernel side communicate with user space over this pipe? |
userland opens |
No, I didn't use a pipe at the shell. The only thing I can think of is that this is the pipe connecting stdout to the sshd process running the shell, but why would that block? I can rerun the same thing from a tty later today and compare the stack trace then. |
As it turns out, the zfs process does indeed have two threads, and the other thread's stack is this:
I don't know where the pipe comes from, but it seems to be something internal to zfs. The
|
If a non-ZAP object is passed to zap_lockdir() it will be treated as a valid ZAP object. This can result in zap_lockdir() attempting to read what it believes are leaf blocks from invalid disk locations. The SCSI layer will eventually generate errors for these bogus IOs but the caller will hang in zap_get_leaf_byblk(). The good news is that is a situation which can not occur unless the pool has been damaged. The bad news is that there are reports from both FreeBSD and Solaris of damaged pools. Specifically, there are normal files in the filesystem which reference another normal file as their parent. Since pools like this are known to exist the zap_lockdir() function has been updated to verify the type of the object. If a non-ZAP object has been passed it EINVAL will be returned immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2597 Issue openzfs#2602
It turns out that the previous zdb output was misleading because I used it on the wrong file system – the root of the pool instead of the child on which I ran the diff :) Having read the opensolaris thread from 2012 linked above, it seems like a wrong parent pointer is indeed causing this. Object 21 is a file object created in 2009, shortly after I originally created the pool. Its parent points to an object from 2013. It is not supposed to be that way, I guess. I’ll upgrade my machine to kernel 4.2.0 and zol 0.6.5 and see what happens. |
Very good, it does not hang anymore after the upgrade. It behaves the same as (a somewhat outdated) illumos now. That still does not make zfs diff usable for me, but at least it does not lock up everything. |
Since we've resolved the original reason this issue was opened, the hard lockup, I'm going to close this. @Ringdingcoder could you open a new issue and clearly describe what's still preventing |
This may be the same as #2139 but my dmesg looks different so I'm posting in another bug just in case. Also, my FS did require #1927 before it was able to start doing any diffs.
If I run a zfs diff between two snapshots on my FS, it will start performing the diff and then hard lock in the same place. It seems that when it locks, the entire storage system is inaccessible. I described much of the process in the mailing list: https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/qKH9-uw8eTk I'll repeat the most important aspects here as well as additional findings.
I run "zfs diff zroot/media@20121010 zroot/media@20121201" and it starts the process of the diff. Part of the way through the diff, the process hangs. Once, and only once, I was able to get the output of dmesg while it was hung. Most of the time I can't do anything at all during the hang.
https://gist.github.com/gbooker/9efd05157114b5e8365d
The disk mentioned in the dmesg output, ata8.00/sdd, is the boot drive, a 60GB Intel SSD. As a test, I boot from an Xubuntu USB install with this drive disconnected. When I ran the diff, it locked again, only this time it complained about ata5, which is the BD drive (which had no media in the drive and hadn't been used).
Since my motherboard had 6 SATA ports on it, I moved all drives to those ports and removed my 4 port SATA card. Repeated the test with the USB drive and it locked again (saw no information about which drive it complained about this time).
Finally, I fashioned a USB drive with FreeBSD 10 on it (with all the hardware connected again). Ran the same diff and it finished normally. It ended the diff with the line:
Unable to determine path or stats for object 14941 in zroot/media@20121010: Invalid argument
I ran zdb on this object ID in Linux, and it CPU spun before the line printing out the path. The machine was not locked and top reported zdb at 100% CPU usage. Ctrl-C did kill the zdb process. I was able to run zdb in FreeBSD and it reported:
https://gist.github.com/gbooker/901249aad60bfba08c0c
The parent object, 14940, is another file. Google yielded another who had the same problem, which appears to not actually be problem:
http://lists.freebsd.org/pipermail/freebsd-fs/2012-November/015654.html
which was then followed up by:
https://www.mail-archive.com/zfs-discuss@opensolaris.org/msg50086.html
The latter thread describes a process in which this can occur. While I don't use hard links often, I have used it from time to time on this FS.
If I understand this correctly, it appears that using hard links results in a possibly invalid parent object id. When zfs diff encounters such an id, in my case the parent id was a different file object, it hard locks the machine. So, zfs diff should be more tolerant of this situation and print the path as unknown or try to ascertain the real path or paths.
The text was updated successfully, but these errors were encountered: