-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lockup on inode lookup with xattr=sa and acltype=posixacl #2214
Comments
Oh, sorry, this is with the packages from @dajhorn 's repository; after
|
@akorn Were the files also created with this version of ZFS? There have been a number of SA (system attribute) fixes added since 0.6.2, however, the version you're currently running does appear to have them. I'm wondering if an earlier version of ZFS might have created some bogus SAs. |
This was with a new pool I created with this zfs version, so yes, the files were created by the same zfs version as well. Meanwhile, a different box where I stored backups of this pool also locked up while retrieving the backups:
Setting xattr=off and actlype=noacl allowed me to read the files. |
@akorn If you can identify any specific files or directories that cause the problem, could you please find their inode number with |
I suppose so, but I'll need to put a test rig together for that. I'm afraid it will take a long time. |
Finding a file with a corrupted SA and running |
Here is one:
strace ls -lad foo-coverage-html stops at:
And the kernel logs:
|
@akorn That pretty much confirms a corrupted SA. Could you please build ZFS from dweeezil/zfs@9888f3c which is a current master with some extra zdb SA debugging. Then run the zdb from that build (you can run it directly from the build area as |
This is with dweeezil@9888f3c:
The kernel says the following about the segfault:
A gdb backtrace looks like this:
|
@akorn It looks like I'm going to have to improve my SA debugging version of zdb a bit more. We're trying to get a dump of the corrupted SA which would allow me to manually decode it and, hopefully, figure out what it's corrupted. I'll try to get a more robust debugging zdb prepared today and will send a note to this issue when it's ready. |
I tried a build with --enable-debug and yes, I do hit an ASSERT() before the crash:
(Some of the lines are sent by syslog, some by netconsole, but they end up on the same logserver, hence the duplication.) Instead of crashing, my ls(1) process now hangs in spl_debug_bug. |
It's starting to look like your SA got sufficiently large that a spill block was needed. Since you've got the name of the corrupted file, have you got any idea what operations may have been performed on it over its lifetime? I'm looking to be able to reproduce this locally if possible. EDIT: Do you know if these files have also got or are supposed to have selinux xattrs? |
This specific corrupted entry is a directory with the following ACL:
Other than the ACL, it is not supposed to have any other extended attributes, selinux or otherwise. |
@akorn Could you please try reproducing this by hand in a fresh filesystem? Make one with something like Here was my test:
I then proceeded to remove the first xattr with I've not yet gotten a chance to re-work my debugging version of zdb but at the very least, under a 3.13 kernel with current master code (and the kthread_create fix), it's not a problem in an of itself to create a directory and apply the ACL you mentioned. I will try this same test under a 3.10 kernel but I'll have to build it first. |
My directory wasn't immediately corrupted either, just some time (hours, but less than a day) later, for an unknown reason; however, it kept happening again when I restored from backup and re-applied the ACL. Also, I don't know if it's relevant at all, but all inodes on the fs had a similar ACL, created by:
There are/were about 220k inodes in use in total. I will try to re-create the problem under controlled circumstances but I'm not very hopeful. |
In the case where a variable-sized SA overlaps the spill block pointer and a new variable-sized SA is being added, the header size was improperly calculated to include the to-be-moved SA. This problem could be reproduced when xattr=sa enabled as follows: ln -s $(perl -e 'print "x" x 120') blah setfattr -n security.selinux -v blahblah -h blah The symlink is large enough to interfere with the spill block pointer and has a typical SA registration as follows (shown in modified "zdb -dddd" <SA attr layout obj> format): [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ] Adding the SA xattr will attempt to extend the registration to: [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ] but since the ZPL_SYMLINK SA interferes with the spill block pointer, it must also be moved to the spill block which will have a registration of: [ ZPL_SYMLINK ZPL_DXATTR ] This commit updates extra_hdrsize when this condition occurs, allowing hdrsize to be subsequently decreased appropriately. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue #2214 Issue #2228 Issue #2316 Issue #2343
A potential fix for this has been merged. Please let us know if you're able to recreate this issue using a pool created from the latest master source which includes commit 83021b4. |
In the case where a variable-sized SA overlaps the spill block pointer and a new variable-sized SA is being added, the header size was improperly calculated to include the to-be-moved SA. This problem could be reproduced when xattr=sa enabled as follows: ln -s $(perl -e 'print "x" x 120') blah setfattr -n security.selinux -v blahblah -h blah The symlink is large enough to interfere with the spill block pointer and has a typical SA registration as follows (shown in modified "zdb -dddd" <SA attr layout obj> format): [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ] Adding the SA xattr will attempt to extend the registration to: [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ] but since the ZPL_SYMLINK SA interferes with the spill block pointer, it must also be moved to the spill block which will have a registration of: [ ZPL_SYMLINK ZPL_DXATTR ] This commit updates extra_hdrsize when this condition occurs, allowing hdrsize to be subsequently decreased appropriately. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue openzfs#2214 Issue openzfs#2228 Issue openzfs#2316 Issue openzfs#2343
Closing issue. This is believed to have been resolved in master. |
Hi,
I have a zfs instance with a few thousand inodes in use, all of which have POSIX ACLs. The pool uses xattr=sa and acltype=posixacl.
After a while, corruption occurs such that looking up specific inodes (e.g. by way of "ls -l" in a directory) results in a crash.
I have collected some stack traces via netconsole (which perhaps explains while the lines are a bit disjointed):
This is what happened when I tried to zfs destroy the affected dataset:
I have to resort to destroying the pool to get rid of the error.
Rolling back to earlier snapshots didn't help (but maybe the snapshots weren't early enough).
The text was updated successfully, but these errors were encountered: