Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS invalid mode 0x7 and SPL PANIC in zfs_inode_set_ops() #2228

Closed
owahlberg opened this issue Apr 1, 2014 · 13 comments
Closed

ZFS invalid mode 0x7 and SPL PANIC in zfs_inode_set_ops() #2228

owahlberg opened this issue Apr 1, 2014 · 13 comments
Milestone

Comments

@owahlberg
Copy link

I have 2 out of 10+ file systems in an otherwise healthy pool that cause an SPL PANIC on mount.
I can mount all other file systems by hand and 8TB mirrored pool scrubs fine.
The server is running with Ubuntu Precise and ZoL from zfs-native/stable PPA.
All packages are up-to-date.

Trying to mount either of the two file systems cause the same oops:

[ 2683.054010] ZFS: Invalid mode: 0x7
[ 2683.054018] VERIFY(0) failed
[ 2683.054089] SPLError: 21787:0:(zfs_znode.c:340:zfs_inode_set_ops()) SPL PANIC
[ 2683.054203] SPL: Showing stack for process 21787
[ 2683.054212] Pid: 21787, comm: mount.zfs Tainted: P O 3.2.0-23-generic #36-Ubuntu
[ 2683.054218] Call Trace:
[ 2683.054259] [] spl_debug_dumpstack+0x27/0x40 [spl]
[ 2683.054281] [] spl_debug_bug+0x82/0xe0 [spl]
[ 2683.054385] [] zfs_znode_alloc+0x336/0x540 [zfs]
[ 2683.054465] [] zfs_zget+0x168/0x200 [zfs]
[ 2683.054543] [] zfs_root+0x83/0xc0 [zfs]
[ 2683.054620] [] zfs_domount+0x1f2/0x270 [zfs]
[ 2683.054632] [] ? sget+0x1c5/0x240
[ 2683.054640] [] ? get_anon_bdev+0x110/0x110
[ 2683.054714] [] ? zpl_mount+0x30/0x30 [zfs]
[ 2683.054789] [] zpl_fill_super+0xe/0x20 [zfs]
[ 2683.054798] [] mount_nodev+0x5d/0xc0
[ 2683.054874] [] zpl_mount+0x25/0x30 [zfs]
[ 2683.054883] [] mount_fs+0x43/0x1b0
[ 2683.054895] [] vfs_kern_mount+0x6a/0xc0
[ 2683.054905] [] do_kern_mount+0x54/0x110
[ 2683.054914] [] do_mount+0x1a4/0x260
[ 2683.054923] [] sys_mount+0x90/0xe0
[ 2683.054932] [] system_call_fastpath+0x16/0x1b
[ 2881.612107] INFO: task mount.zfs:21787 blocked for more than 120 seconds.
[ 2881.612238] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2881.612362] mount.zfs D ffffffff81806240 0 21787 21786 0x00000000
[ 2881.612377] ffff88031562d9e8 0000000000000086 000000000000000a ffff88031563c000
[ 2881.612389] ffff88031562dfd8 ffff88031562dfd8 ffff88031562dfd8 0000000000013780
[ 2881.612400] ffff8805e85696f0 ffff88032c4fade0 ffffffff81c0f6c7 0000000000000000
[ 2881.612410] Call Trace:
[ 2881.612425] [] schedule+0x3f/0x60
[ 2881.612466] [] spl_debug_bug+0xb5/0xe0 [spl]
[ 2881.612564] [] zfs_znode_alloc+0x336/0x540 [zfs]
[ 2881.612644] [] zfs_zget+0x168/0x200 [zfs]
[ 2881.612722] [] zfs_root+0x83/0xc0 [zfs]
[ 2881.612799] [] zfs_domount+0x1f2/0x270 [zfs]
[ 2881.612810] [] ? sget+0x1c5/0x240
[ 2881.612817] [] ? get_anon_bdev+0x110/0x110
[ 2881.612892] [] ? zpl_mount+0x30/0x30 [zfs]
[ 2881.612967] [] zpl_fill_super+0xe/0x20 [zfs]
[ 2881.612976] [] mount_nodev+0x5d/0xc0
[ 2881.613053] [] zpl_mount+0x25/0x30 [zfs]
[ 2881.613062] [] mount_fs+0x43/0x1b0
[ 2881.613074] [] vfs_kern_mount+0x6a/0xc0
[ 2881.613084] [] do_kern_mount+0x54/0x110
[ 2881.613093] [] do_mount+0x1a4/0x260
[ 2881.613101] [] sys_mount+0x90/0xe0
[ 2881.613110] [] system_call_fastpath+0x16/0x1b

This is the second time I have run into this issue in the last 3 months.
Last time I recovered by rolling back to a previous snapshot.
This time I am unable to do so on one of the two affected file systems as it no longer has any snapshots.

These file systems have had ACL's on them, but I think what has triggered this at two separate times has been extensive use of snapshots through zfs-auto-snapshot.

I have tried defaulting the offending inode to a directory type and a file type as in #709, but neither gets me any further.

I am going to have to rollback the big dataset (~3TB) to a working snapshot tonight, but I have saved a copy of the smaller dataset of 70Gb that I cannot manage to get access to and I would really appreciate if someone has an idea suggestions/tips on how I could recover that data as well.

@dweeezil
Copy link
Contributor

dweeezil commented Apr 1, 2014

@owahlberg Please apply this patch:

diff --git a/module/zfs/zfs_znode.c b/module/zfs/zfs_znode.c
index 5e99410..96043fa 100644
--- a/module/zfs/zfs_znode.c
+++ b/module/zfs/zfs_znode.c
@@ -336,7 +336,7 @@ zfs_inode_set_ops(zfs_sb_t *zsb, struct inode *ip)
                break;

        default:
-               printk("ZFS: Invalid mode: 0x%x\n", ip->i_mode);
+               printk("ZFS: Invalid mode: 0x%x ino=%lu\n", ip->i_mode, ip->i_ino);
                VERIFY(0);
        }
 }

and then run zdb -ddddd <pool>/<fs> <ino> on the offending file. Hopefully this will give you a valid path as well and some idea as to what has happened to that particular file/directory over time.

As much as I hate to say it, I've got a filesystem laying around with a handful of these problems but haven't yet had enough time to analyze it. I'm concerned there may be a concurrent access issue with the mode that causes some of the bits to be cleared. In one of my own examples, I've got a file/dir with a mode of 070 which is also rather legitimate looking for the lower bits.

@owahlberg
Copy link
Author

@dweeezil thanks for the suggestion.
I currently have a different patch applied (forcing any unknown modes to be directories), but it will give us the inode number:

diff -aNur a/module/zfs/zfs_znode.c b/module/zfs/zfs_znode.c
--- a/module/zfs/zfs_znode.c    2013-08-22 13:33:26.000000000 -0700
+++ b/module/zfs/zfs_znode.c    2014-03-31 20:58:29.743979202 -0700
@@ -336,8 +336,13 @@
        break;

    default:
-       printk("ZFS: Invalid mode: 0x%x\n", ip->i_mode);
-       VERIFY(0);
+               printk("ZFS: Inode %ld has Invalid mode: 0x%x assuming directory\n",
+                   ip->i_ino, ip->i_mode);
+               ip->i_mode |= S_IFDIR;
+               ip->i_op = &zpl_dir_inode_operations;
+               ip->i_fop = &zpl_dir_file_operations;
+               ITOZ(ip)->z_zn_prefetch = B_TRUE;
+               break;
    }
 }
# zfs mount -v data/music

Due to the patch the above does not cause an SPL PANIC and does yield the inode number of the offender in the kernel log:

  [204635.929853] ZFS: Inode 4 has Invalid mode: 0x7 assuming directory

Inode 4 seems to be the root directory of this dataset:

# zdb -ddddd data/music 4 
Dataset data/music [ZPL], ID 73, cr_txg 5185, 68.8G, 14532 objects, rootbp DVA[0]=<2:1511af3000:1000> DVA[1]=<0:214cb277600:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=5511367L/5511367P fill=14532 cksum=150d55b7c0:72a1e19d98a:14a489e7c6ecb:29ae78b7840ed7

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         4    1    16K    512     1K    512  100.00  ZFS directory
                                        312   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 0
    path    ???<object#4>
    uid     500
    gid     4
    atime   Fri Feb 12 12:15:10 1988
    mtime   Wed Apr 25 09:40:18 1990
    ctime   Wed Apr 25 09:40:18 1990
    crtime  Wed Apr  3 23:03:01 1991
    gen 500
    mode    7
    size    5185
    parent  4432406249794
    links   8
    pflags  5169d944
    microzap: 512 bytes, 5 entries

        music = 10 (type: Directory)
        iphone backup = 16692 (type: Directory)
        mp3_128kbit = 9 (type: Directory)
        ._.DS_Store = 64 (type: Regular File)
        UNTAGGED = 8 (type: Directory)
Indirect blocks:
               0 L0 1:afd3205600:200 200L/200P F=1 B=1664611/1664611

        segment [0000000000000000, 0000000000000200) size   512

I can confirm that those are the top level directories in this dataset so at least some of the references remain correct...

@dweeezil
Copy link
Contributor

dweeezil commented Apr 1, 2014

Correct, object 4 will always be the root directory in the current ZFS implementation.

The SA is totally screwed up, it's not just the mode that's bad. It would appear that every single field is bogus: the links should be 6 rather than 8, the pflags are bogus, the parent should be 4, the size should likely not be 5185 and I suspect the timestamps are all bogus. Defaulting it to be a directory is not likely to help matters too much.

The question is how did this happen? There are a couple of other issues open right now that have got messed up SAs but without a reliable way to reproduce the problem, it's going to be very difficult to fix. Do you know how much stuff was supposed to be in the root directory? Are the 5 items listed in the microzap the whole contents or were there supposed to be a lot more things there?

I just realized you mentioned ACLs. Are there POSIX ACLs on them? Is there supposed to be one on the root directory? Do you have xattr=sa set?

This could be another manifestation of #2214.

@owahlberg
Copy link
Author

I noticed that the timestamps where all messed up, but did not have enough knowledge to comment on anything else.
The five items in the microzap is what I would expect to see in the root directory, no more, no less.

In all honesty, if there's nothing obvious that stands out from this data it will be next to impossible to trace the problem from my pool. My pool has a long and sordid history.
It was originally created on FreeBSD 8 (freenas), later migrated to OmniOS (Illumos) and used with ACL's prior for probably at least a year prior to being moved to Ubuntu 12.04 a couple of months ago when I felt ZoL had become stable enough.
Not exactly an ideal candidate for a root cause analysis.

At this point I will keep the dataset around if anybody feels like digging in to it, but I will likely write it off as lost.
I am also considering breaking my mirror and rebuilding this pool from scratch to start with a clean slate in case I run into future problems as the pool history would make any serious troubleshooting very difficult.

@owahlberg
Copy link
Author

I am in the process of trying to get Illumos to work in a KVM guest. If I get that to work I'll pass through the drives.
I did that with FreeBSD 9.2 and I can import the pool, but when I mount the file system it fails with "Bad File descriptor".

@dweeezil
Copy link
Contributor

dweeezil commented Apr 2, 2014

@owahlberg Can you run zdb -ddddd on either or both of FreeBSD and Illumos? It's still not clear to me whether you used acltype=posixacl and/or xattr=sa under ZoL. Both FreeBSD and Illumos will use the native ACL mechanism (which unfortunately still isn't handled by zdb).

@behlendorf behlendorf added this to the 0.7.0 milestone Apr 4, 2014
@behlendorf behlendorf added the Bug label Apr 4, 2014
@owahlberg
Copy link
Author

I have been unable to get Illumos running well in a VM, but have a 50Mb worth of zdb -dddd output from FreeBSD... Any preference on how to provide access if you have time to take a look?
Pool was used with native ACL's on Illumos, but when moved to ZoL I worked around the need for ACL's for my particular config, but I did this after experimenting with xattr=sa.

behlendorf pushed a commit that referenced this issue May 19, 2014
In the case where a variable-sized SA overlaps the spill block pointer and
a new variable-sized SA is being added, the header size was improperly
calculated to include the to-be-moved SA.  This problem could be
reproduced when xattr=sa enabled as follows:

	ln -s $(perl -e 'print "x" x 120') blah
	setfattr -n security.selinux -v blahblah -h blah

The symlink is large enough to interfere with the spill block pointer and
has a typical SA registration as follows (shown in modified "zdb -dddd"
<SA attr layout obj> format):

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ]

Adding the SA xattr will attempt to extend the registration to:

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ]

but since the ZPL_SYMLINK SA interferes with the spill block pointer, it
must also be moved to the spill block which will have a registration of:

	[ ZPL_SYMLINK ZPL_DXATTR ]

This commit updates extra_hdrsize when this condition occurs, allowing
hdrsize to be subsequently decreased appropriately.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue #2214
Issue #2228
Issue #2316
Issue #2343
@behlendorf
Copy link
Contributor

A potential fix for this has been merged. Please let us know if you're able to recreate this issue using a pool created from the latest master source which includes commit 83021b4.

@behlendorf behlendorf modified the milestones: 0.6.4, 0.7.0 May 19, 2014
FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue May 20, 2014
In the case where a variable-sized SA overlaps the spill block pointer and
a new variable-sized SA is being added, the header size was improperly
calculated to include the to-be-moved SA.  This problem could be
reproduced when xattr=sa enabled as follows:

	ln -s $(perl -e 'print "x" x 120') blah
	setfattr -n security.selinux -v blahblah -h blah

The symlink is large enough to interfere with the spill block pointer and
has a typical SA registration as follows (shown in modified "zdb -dddd"
<SA attr layout obj> format):

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ]

Adding the SA xattr will attempt to extend the registration to:

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ]

but since the ZPL_SYMLINK SA interferes with the spill block pointer, it
must also be moved to the spill block which will have a registration of:

	[ ZPL_SYMLINK ZPL_DXATTR ]

This commit updates extra_hdrsize when this condition occurs, allowing
hdrsize to be subsequently decreased appropriately.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue openzfs#2214
Issue openzfs#2228
Issue openzfs#2316
Issue openzfs#2343
@owahlberg
Copy link
Author

Upgraded to 0.6.3 and retried importing the original fs.
Machine no longer crashes, but I'm still unable to mount file system.

# zfs mount -o ro data/music
filesystem 'data/music' can not be mounted due to error 20
cannot mount 'data/music': Invalid argument 

Unfortunately I don't have an easy way to reproduce the problem on a newly created pool.

console log also displays the following

[21353.411990] ZFS: Inode 4 has Invalid mode: 0x7 assuming file

@dweeezil
Copy link
Contributor

@owahlberg If you're seeing "Inode 4 has Invalid mode", your kernel module is not stock 0.6.3 but must have some sort of a patch on it. That said, keep in mind that if you've got any corrupted dnodes, a 0.6.3 upgrade will not fix them. The only recourse (without extreme measures) to deal with corrupted dnodes is to re-create the filesystem on which the corruption exists.

@owahlberg
Copy link
Author

Good catch, I should have triggered on that myself since it's from the patch I was using on 0.6.2.
My on disk modules updated, but grub-probe caused a failure to build a new initrd image.
I understand any fix won't fix previous corruptions, mostly it was a check to see what would happen since I don't have any easy way to reproduce the original issue anymore.

With 0.6.3 it prints a SPLError/SPL_PANIC and a call trace when trying to import the pool with corrupt data.
[40001.062599] ZFS: Invalid mode: 0x7
[40001.062610] VERIFY(0) failed
[40001.062680] SPLError: 8169:0:(zfs_znode.c:340:zfs_inode_set_ops()) SPL PANIC
[40001.062797] SPL: Showing stack for process 8169
....

At this point I'm going to toss my old damaged data and hope the original bug got fixed in 83021b4.

@behlendorf
Copy link
Contributor

Closing the root cause here was addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@behlendorf @dweeezil @owahlberg and others