Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sa_find_idx_tab() ASSERTION failed ,spl panic,lead that ls hung #2801

Closed
inevity opened this issue Oct 15, 2014 · 10 comments
Closed

sa_find_idx_tab() ASSERTION failed ,spl panic,lead that ls hung #2801

inevity opened this issue Oct 15, 2014 · 10 comments

Comments

@inevity
Copy link

inevity commented Oct 15, 2014

Using zol master , zfs git zfs e82cdc3 spl de2a22f

  KERNEL: /usr/lib/debug/lib/modules/2.6.32-358.el6.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2014-10-14-23:49:02/vmcore  [PARTIAL DUMP]
    CPUS: 24
    DATE: Tue Oct 14 23:48:23 2014
  UPTIME: 10:00:01

LOAD AVERAGE: 1.11, 1.47, 1.37
TASKS: 623
NODENAME: CNC-LQ-o-9ED
RELEASE: 2.6.32-358.el6.x86_64
VERSION: #1 SMP Fri Feb 22 00:31:26 UTC 2013
MACHINE: x86_64 (2000 Mhz)
MEMORY: 64 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
PID: 257
COMMAND: "khungtaskd"
TASK: ffff880873ac8aa0 [THREAD_INFO: ffff8808713de000]
CPU: 19
STATE: TASK_RUNNING (PANIC)

ZFS: Unloaded module v0.6.3-1
SPL: Unloaded module v0.6.3-1
SPL: Loaded module v0.6.3-12_gde2a22f (DEBUG mode) //
ZFS: Loaded module v0.6.3-113_ge82cdc3 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
SPL: using hostid 0x00000000
SPLError: 190622:0:(sa.c:1538:sa_find_idx_tab()) ASSERTION((IS_SA_BONUSTYPE(bonustype) && SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb)) || !IS_SA_BONUSTYPE(bonustype) || (IS_SA_BONUSTYPE(bonustype) && hdr->sa_layout_info == 0)) failed
SPLError: 190622:0:(sa.c:1538:sa_find_idx_tab()) SPL PANIC
SPL: Showing stack for process 190622
Pid: 190622, comm: ls Tainted: P --------------- 2.6.32-358.el6.x86_64 #1
Call Trace:
[] ? spl_debug_dumpstack+0x46/0x60 [spl]
[] ? spl_debug_bug+0x81/0xd0 [spl]
[] ? spl_PANIC+0xba/0xf0 [spl]
[] ? submit_bio+0x8d/0x120
[] ? avl_find+0x65/0x100 [zavl]
[] ? sa_find_idx_tab+0x227/0x2e0 [zfs]
[] ? __cv_init+0x89/0x1f0 [spl]
[] ? zio_cons+0x47/0x120 [zfs]
[] ? sa_build_index+0x93/0x1b0 [zfs]
[] ? sa_handle_get_from_db+0x11c/0x160 [zfs]
[] ? zfs_znode_sa_init+0x144/0x200 [zfs]
[] ? zfs_znode_alloc+0x177/0x6c0 [zfs]
[] ? zio_wait+0x22b/0x3d0 [zfs]
[] ? dbuf_read+0x640/0xcd0 [zfs]
[] ? mutex_lock+0x1e/0x50
[] ? refcount_remove+0x16/0x20 [zfs]
[] ? mutex_lock+0x1e/0x50
[] ? dmu_object_info_from_dnode+0x129/0x200 [zfs]
[] ? zfs_zget+0x260/0x300 [zfs]
[] ? zfs_dirent_lock+0x560/0x670 [zfs]
[] ? zfs_dirlook+0x93/0x2c0 [zfs]
[] ? zfs_zaccess+0xa0/0x4b0 [zfs]
[] ? zfs_lookup+0x2ee/0x340 [zfs]
[] ? zpl_lookup+0x78/0x130 [zfs]
[] ? do_lookup+0x1a5/0x230
[] ? __link_path_walk+0x734/0x1030
[] ? path_walk+0x6a/0xe0
[] ? do_path_lookup+0x5b/0xa0
[] ? user_path_at+0x57/0xa0
[] ? putname+0x35/0x50
[] ? user_path_at+0x62/0xa0
[] ? vfs_fstatat+0x3c/0x80
[] ? _atomic_dec_and_lock+0x55/0x80
[] ? vfs_lstat+0x1e/0x20
[] ? sys_newlstat+0x24/0x50
[] ? path_put+0x31/0x40
[] ? sys_lgetxattr+0x61/0x80
[] ? system_call_fastpath+0x16/0x1b

INFO: task ls:190622 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ls D 0000000000000007 0 190622 190452 0x00000000
ffff880fa71473f8 0000000000000082 ffff880fa71473c0 ffff880fa71473bc
ffff880fa7147388 ffff88087fe82c00 ffff88089c4f6700 0000000000000400
ffff880faf66faf8 ffff880fa7147fd8 000000000000fb88 ffff880faf66faf8
Call Trace:
[] spl_debug_bug+0xa5/0xd0 [spl]
[] spl_PANIC+0xba/0xf0 [spl]
[] ? submit_bio+0x8d/0x120
[] ? avl_find+0x65/0x100 [zavl]
[] sa_find_idx_tab+0x227/0x2e0 [zfs]
[] ? __cv_init+0x89/0x1f0 [spl]
[] ? zio_cons+0x47/0x120 [zfs]
[] sa_build_index+0x93/0x1b0 [zfs]
[] sa_handle_get_from_db+0x11c/0x160 [zfs]
[] zfs_znode_sa_init+0x144/0x200 [zfs]
[] zfs_znode_alloc+0x177/0x6c0 [zfs]
[] ? zio_wait+0x22b/0x3d0 [zfs]
[] ? dbuf_read+0x640/0xcd0 [zfs]
[] ? mutex_lock+0x1e/0x50
[] ? refcount_remove+0x16/0x20 [zfs]
[] ? mutex_lock+0x1e/0x50
[] ? dmu_object_info_from_dnode+0x129/0x200 [zfs]
[] zfs_zget+0x260/0x300 [zfs]
[] zfs_dirent_lock+0x560/0x670 [zfs]
[] zfs_dirlook+0x93/0x2c0 [zfs]
[] ? zfs_zaccess+0xa0/0x4b0 [zfs]
[] zfs_lookup+0x2ee/0x340 [zfs]
[] zpl_lookup+0x78/0x130 [zfs]
[] do_lookup+0x1a5/0x230
[] __link_path_walk+0x734/0x1030
[] path_walk+0x6a/0xe0
[] do_path_lookup+0x5b/0xa0
[] user_path_at+0x57/0xa0
[] ? putname+0x35/0x50
[] ? user_path_at+0x62/0xa0
[] vfs_fstatat+0x3c/0x80
[] ? _atomic_dec_and_lock+0x55/0x80
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x24/0x50
[] ? path_put+0x31/0x40
[] ? sys_lgetxattr+0x61/0x80
[] system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 257, comm: khungtaskd Tainted: P --------------- 2.6.32-358.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? watchdog+0x217/0x220
[] ? watchdog+0x0/0x220
[] ? kthread+0x96/0xa0
[] ? child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20

crash> ps |grep UN
190622 190452 7 ffff880faf66f540 UN 0.0 141460 24804 ls
crash> bt 190622
PID: 190622 TASK: ffff880faf66f540 CPU: 7 COMMAND: "ls"
#0 [ffff880fa7147338] schedule at ffffffff8150d692
#1 [ffff880fa7147400] spl_debug_bug at ffffffffa0565df5 [spl]
#2 [ffff880fa7147430] spl_PANIC at ffffffffa057553a [spl]
#3 [ffff880fa71475d0] sa_find_idx_tab at ffffffffa0669b67 [zfs]
#4 [ffff880fa71476a0] sa_build_index at ffffffffa066a8c3 [zfs]
#5 [ffff880fa71476e0] sa_handle_get_from_db at ffffffffa066cedc [zfs]
#6 [ffff880fa7147760] zfs_znode_sa_init at ffffffffa06d7284 [zfs]
#7 [ffff880fa71477b0] zfs_znode_alloc at ffffffffa06d8dd7 [zfs]
#8 [ffff880fa7147990] zfs_zget at ffffffffa06d9580 [zfs]
#9 [ffff880fa7147a50] zfs_dirent_lock at ffffffffa06b4d00 [zfs]
#10 [ffff880fa7147b00] zfs_dirlook at ffffffffa06b5023 [zfs]
#11 [ffff880fa7147b80] zfs_lookup at ffffffffa06d263e [zfs]
#12 [ffff880fa7147bf0] zpl_lookup at ffffffffa06f40a8 [zfs]
#13 [ffff880fa7147c40] do_lookup at ffffffff81190405
#14 [ffff880fa7147ca0] __link_path_walk at ffffffff81190bc4
#15 [ffff880fa7147d60] path_walk at ffffffff8119174a
#16 [ffff880fa7147da0] do_path_lookup at ffffffff8119191b
#17 [ffff880fa7147dd0] user_path_at at ffffffff811925a7
#18 [ffff880fa7147ea0] vfs_fstatat at ffffffff811869bc
#19 [ffff880fa7147ee0] vfs_lstat at ffffffff81186a6e
#20 [ffff880fa7147ef0] sys_newlstat at ffffffff81186a94
#21 [ffff880fa7147f80] system_call_fastpath at ffffffff8100b072

RIP: 00000032936dae05  RSP: 00007fff7d4a57b0  RFLAGS: 00000202
RAX: 0000000000000006  RBX: ffffffff8100b072  RCX: 0000000000000000
RDX: 00007f1a6f2e8320  RSI: 00007f1a6f2e8320  RDI: 00007fff7d4a57d0
RBP: 00007fff7d4a5bd0   R8: 0000000000d91740   R9: 32613630322d6162
R10: 0031376631356232  R11: 0000000000000246  R12: 0000000000d990cb
R13: 00007f1a6f2e8310  R14: 0000000000d99093  R15: 00000032a3203350
ORIG_RAX: 0000000000000006  CS: 0033  SS: 002b

crash> files 190622
PID: 190622 TASK: ffff880faf66f540 CPU: 7 COMMAND: "ls"
ROOT: / CWD: /root/src/zfs
FD FILE DENTRY INODE TYPE PATH
0 ffff88106f80e5c0 ffff8808467c8d80 ffff88086ff0a838 CHR /dev/pts/5
1 ffff88106f80e5c0 ffff8808467c8d80 ffff88086ff0a838 CHR /dev/pts/5
2 ffff8807934232c0 ffff880844e22780 ffff880844c82a78 REG /root/src/zfs/err1.txt
3 ffff881070bfa0c0 ffff880f8df69740 ffff880663c83c98 DIR /mnt/zpool/zfs/.glusterfs/40/92

[root@CNC-LQ-o-9ED ~]# stat /mnt/zpool/zfs/.glusterfs/40/92
File: `/mnt/zpool/zfs/.glusterfs/40/92'
Size: 16 Blocks: 29 IO Block: 1024 directory
Device: 13h/19d Inode: 63662 Links: 2
Access: (0700/drwx------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-06-20 02:13:36.032128000 +0800
Modify: 2014-10-01 22:38:31.143065000 +0800
Change: 2014-10-01 22:38:31.143065000 +0800
[root@CNC-LQ-o-9ED ~]# zdb -vvvv zpool/zfs 63662
Dataset zpool/zfs [ZPL], ID 41, cr_txg 6, 34.0T, 1714220 objects, rootbp DVA[0]=<0:20c049484000:2000> DVA[1]=<0:27c124424000:2000> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=1959830L/1959830P fill=1714220 cksum=19eaa47e24:89d712eb261:18ef85bb41fe9:33776644b0976d

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 63662    1    16K     1K  14.0K     1K  100.00  ZFS directory
                                    168   bonus  System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path    /.glusterfs/40/92
uid     0
gid     0
atime   Fri Jun 20 02:13:36 2014
mtime   Wed Oct  1 22:38:31 2014
ctime   Wed Oct  1 22:38:31 2014
crtime  Fri Jun 20 02:13:36 2014
gen 132229
mode    40700
size    16
parent  596
links   2
pflags  40800000044
microzap: 1024 bytes, 14 entries

    4092ee9f-b12e-410f-aafe-68307d3292ef = 4284603 (type: Regular File)
    40923e03-8797-4365-92ba-206a22b51f71 = 576515 (type: Regular File)
    4092b499-a013-44fd-a992-2d5fd979f17e = 3793886 (type: Symbolic Link)
    4092a609-60fd-478c-ac41-5fdf7b3d0bb0 = 3676089 (type: Regular File)
    40926ad9-44df-4b75-bd14-d25b6d012877 = 4836189 (type: Regular File)
    4092d09a-aa12-4cb5-bcbd-65b2ea69a05c = 2171404 (type: Regular File)
    409209e2-ce3a-4a31-805c-02b244d72f09 = 2646476 (type: Regular File)
    4092b3e7-aac1-4e54-a3ea-ff3f30f2798c = 63663 (type: Symbolic Link)
    4092c6f1-d52a-41f3-9bf3-be1ad045f0b1 = 4246629 (type: Symbolic Link)
    40920466-cab6-42ec-89d1-1efa57de4cc0 = 5360121 (type: Regular File)
    409213f3-1486-41d5-bb3a-2a6b2759ada6 = 5360470 (type: Regular File)
    409282ac-973e-4022-8e8d-bdb8d07612f4 = 5384130 (type: Regular File)
    4092c109-82e8-4943-b164-107f57890b11 = 5381386 (type: Regular File)
    40921e40-e04d-47fa-8b94-1c1dafd05eeb = 1103483 (type: Regular File)

Indirect blocks:
0 L0 0:27c140742000:2000 400L/400P F=1 B=1691223/1691223

    segment [0000000000000000, 0000000000000400) size    1K

since then we found the wrong dir,I found that below

ls -lahR /mnt/zpool/zfs/.glusterfs/40/92
/mnt/zpool/zfs/.glusterfs/40/92:

Message from syslogd@、 at Oct 15 14:08:06 ...
kernel:SPLError: 99053:0:(sa.c:1538:sa_find_idx_tab()) ASSERTION((IS_SA_BONUSTYPE(bonustype) && SA_HDR_SIZE_MATCH_LAYOUT(hdr, tb)) || !IS_SA_BONUSTYPE(bonustype) || (IS_SA_BONUSTYPE(bonustype) && hdr->sa_layout_info == 0)) failed

Message from syslogd@Oct 15 14:08:06 ...
kernel:SPLError: 99053:0:(sa.c:1538:sa_find_idx_tab()) SPL PANIC
ls hung!

@dweeezil
Copy link
Contributor

@inevity Have you got either or both of xattr=sa and acltype=posixacl set? The problem is likely occurring when doing a stat(2) on one of the files within the .../40/92 directory. You should be able to run either stat or ls -ld on each one individually to see which one is corrupted. Once you find the specific file with the corruption, could you run zdb -ddddddd zpool/zfs <inode_of_corrupted_file> on it using the zdb from dweeezil/zfs@ce58fc1. This version of zdb is in a branch named zdb and has a bunch of additional SA-related debugging. Note that you can simply run it from the build directory as cmd/zdb/zdb ... without installing it.

@inevity
Copy link
Author

inevity commented Oct 15, 2014

i set xattr=sa;and the corrupted file was create on the zfs 0.6.0-RC14
now upgrade to zfs 0.6.3 master latest.
stat and ls -i /mnt/zpool/zfs/.glusterfs/40/92/4092b499-a013-44fd-a992-2d5fd979f17e lead to this issue.
but i cannot find the inode of the file,so zdb cannot be used .

so next should i do ?

fortunately i have a replication of the file on other machine,the two same file got write by the glusterfs afr .
on that machine using 0.6.0-RC14
[root@CNC-LQ-o-9EE ~]# stat /mnt/zpool/zfs/.glusterfs/40/92/4092b499-a013-44fd-a992-2d5fd979f17e
stat: cannot stat `/mnt/zpool/zfs/.glusterfs/40/92/4092b499-a013-44fd-a992-2d5fd979f17e': No such file or directory
[root@CNC-LQ-o-9EE ~]# ls /mnt/zpool/zfs/.glusterfs/40/92/4092b499-a013-44fd-a992-2d5fd979f17e
ls: cannot access /mnt/zpool/zfs/.glusterfs/40/92/4092b499-a013-44fd-a992-2d5fd979f17e: No such file or directory
[root@CNC-LQ-o-9EE ~]# ls /mnt/zpool/zfs/.glusterfs/40/92/ -la
ls: cannot access /mnt/zpool/zfs/.glusterfs/40/92/4092b499-a013-44fd-a992-2d5fd979f17e: No such file or directory
total 142536
drwx------ 2 root root 16 Oct 1 21:20 .
drwx------ 258 root root 258 Jul 16 10:02 ..
-rw-r--r-- 2 root root 0 Aug 28 04:06 40920466-cab6-42ec-89d1-1efa57de4cc0
-rw-rw-rw- 2 root root 1189 Aug 3 15:31 409209e2-ce3a-4a31-805c-02b244d72f09
-rw-r--r-- 2 root root 32 Sep 11 10:45 409213f3-1486-41d5-bb3a-2a6b2759ada6
-rw-r--r-- 2 root root 95318260 May 26 22:27 40921e40-e04d-47fa-8b94-1c1dafd05eeb
-rw-r--r-- 2 root root 2545552 Apr 3 2014 40923e03-8797-4365-92ba-206a22b51f71
-rw-r--r-- 2 root root 14482037 Jul 28 03:16 40926ad9-44df-4b75-bd14-d25b6d012877
-rw-r--r-- 2 root root 0 Aug 27 15:53 409282ac-973e-4022-8e8d-bdb8d07612f4
-rw-r--r-- 2 2000 2000 860664 Apr 1 2014 4092a609-60fd-478c-ac41-5fdf7b3d0bb0
lrwxrwxrwx 1 root root 81 Jun 20 02:13 4092b3e7-aac1-4e54-a3ea-ff3f30f2798c -> ../../83/64/8364dc57-105f-4f68-b8fb-48fb3c4156ab/MaSy7WyBwYfLlBdlP07DsYVdmndL91uq
?????????? ? ? ? ? ? 4092b499-a013-44fd-a992-2d5fd979f17e

@dweeezil
Copy link
Contributor

The inode of interest is 3793886 as shown in your dump of the directory above. While you're at it, could you also do zdb -dddd zpool/zfs 5 6 (we're looking for the "SA attr registration" and the "SA attr layouts" objects; they're usually in object 5 and 6). I'm a little concerned that your corrupted file is a symlink. Does your system have ECC memory? My patched version of ZDB should be able to tell us whether there's a single-bit error or whether the corruption is more severe.

@behlendorf
Copy link
Contributor

i set xattr=sa;and the corrupted file was create on the zfs 0.6.0-RC14

@dweeezil This sounds a lot like the variable length SA issue which impacted symlinks and was fixed in 0.6.3. It sounds like this file was created with 0.6.0-rc14 so that may explain how this happened.

@inevity
Copy link
Author

inevity commented Oct 15, 2014

[root@CNC-LQ-o-9ED zfs-ce58fc178bd5c6e8d462c21f1b8952685d2f852d]# ./cmd/zdb/zdb -ddddddd zpool/zfs 3793886
Dataset zpool/zfs [ZPL], ID 41, cr_txg 6, 34.0T, 1714220 objects, rootbp DVA[0]=<0:20c00018e000:2000> DVA[1]=<0:27c000180000:2000> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=1960079L/1960079P fill=1714220 cksum=189ddaf308:80f8261722e:16e714a6102cf:2e4c6c53246430

Object  lvl   iblk   dblk  dsize  lsize   %full  type

3793886 1 16K 512 14.0K 512 0.00 ZFS plain file (K=inherit) (Z=inherit)
192 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED SPILL_BLKPTR
dnode maxblkid: 0

Spill blkptr:
    0:23c4a715a000:2000 0:d98f143a000:2000 200L/200P F=1 B=1380494/1380494
Spill blkptr dump: \020\000\000\000\000\000\000\000\320\212\123\342\021\000\000\000\020\000\000\000\000\000\000\000\320\241\170\314\006\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\002\007\054\200\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\216\020\025\000\000\000\000\000\001\000\000\000\000\000\000\000\035\045\302\004\014\000\000\000\054\252\106\044\270\004\000\000\203\170\036\225\105\363\000\000\256\351\055\002\341\157\041\000
SA hdrsize 8
SA layout 4
path    ???<object#3793886>

lt-zdb: ../../cmd/zdb/zdb.c:1539: Assertion `zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES, 8, 1, &fuid_obj) == 0' failed.
Aborted

[root@CNC-LQ-o-9ED zfs-ce58fc178bd5c6e8d462c21f1b8952685d2f852d]# zdb -dddd zpool/zfs 5 6
Dataset zpool/zfs [ZPL], ID 41, cr_txg 6, 34.0T, 1714220 objects, rootbp DVA[0]=<0:20c00018e000:2000> DVA[1]=<0:27c000180000:2000> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=1960079L/1960079P fill=1714220 cksum=189ddaf308:80f8261722e:16e714a6102cf:2e4c6c53246430

Object  lvl   iblk   dblk  dsize  lsize   %full  type
     5    1    16K  1.50K  14.0K  1.50K  100.00  SA attr registration
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
microzap: 1536 bytes, 21 entries

    ZPL_FLAGS =  800000b : [8:0:11]
    ZPL_DACL_ACES =  40013 : [0:4:19]
    ZPL_SYMLINK =  30011 : [0:3:17]
    ZPL_MTIME =  10000001 : [16:0:1]
    ZPL_GID =  800000d : [8:0:13]
    ZPL_CTIME =  10000002 : [16:0:2]
    ZPL_PAD =  2000000e : [32:0:14]
    ZPL_SIZE =  8000006 : [8:0:6]
    ZPL_DACL_COUNT =  8000010 : [8:0:16]
    ZPL_XATTR =  8000009 : [8:0:9]
    ZPL_MODE =  8000005 : [8:0:5]
    ZPL_LINKS =  8000008 : [8:0:8]
    ZPL_ATIME =  10000000 : [16:0:0]
    ZPL_UID =  800000c : [8:0:12]
    ZPL_DXATTR =  30014 : [0:3:20]
    ZPL_SCANSTAMP =  20030012 : [32:3:18]
    ZPL_CRTIME =  10000003 : [16:0:3]
    ZPL_RDEV =  800000a : [8:0:10]
    ZPL_ZNODE_ACL =  5803000f : [88:3:15]
    ZPL_GEN =  8000004 : [8:0:4]
    ZPL_PARENT =  8000007 : [8:0:7]

Object  lvl   iblk   dblk  dsize  lsize   %full  type
     6    1    16K    16K  28.5K    32K  100.00  SA attr layouts
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 1
Fat ZAP stats:
    Pointer table:
        1024 elements
        zt_blk: 0
        zt_numblks: 0
        zt_shift: 10
        zt_blks_copied: 0
        zt_nextblk: 0
    ZAP entries: 7
    Leaf blocks: 1
    Total blocks: 2
    zap_block_type: 0x8000000000000001
    zap_magic: 0x2f52ab2ab
    zap_salt: 0x1846f29e0d
    Leafs with 2^n pointers:
          9:      1 *
    Blocks with n*5 entries:
          1:      1 *
    Blocks n/10 full:
          1:      1 *
    Entries with n chunks:
          3:      3 ***
          4:      4 ****
    Buckets with n entries:
          0:    505 ****************************************
          1:      7 *

    6 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19  17  20 ]
    2 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19 ]
    8 = [ 17 ]
    5 = [ 20 ]
    7 = [ 17  20 ]
    4 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19  17 ]
    3 = [ 5  6  4  12  13  7  11  0  1  2  3  8  16  19  20 ]

ECC memory,how to check whether we have?

some other info for your referencr
stat to crash :
crash> bt -f
#15 [ffff8808712cdc40] do_lookup at ffffffff81190405
ffff8808712cdc48: 0000020000000000 ffff8810519d6b60
ffff8808712cdc58: ffff88086bdf98c0 ffff880871d48b00
ffff8808712cdc68: ffff8808712cdc78 ffff880870eb7044
ffff8808712cdc78: ffff8808712cddd8 ffff8808712cdd08
ffff8808712cdc88: ffff8808712cdd18 ffff8808712cdd18
ffff8808712cdc98: ffff8808712cdd58 ffffffff81190bc4
#16 [ffff8808712cdca0] __link_path_walk at ffffffff81190bc4
ffff8808712cdca8: ffff8808712cdd88 fffffffffffffff3
ffff8808712cdcb8: 0000000000000000 ffff88086f2eb500
ffff8808712cdcc8: ffff88086f2eb500 ffff88086f2eb500
ffff8808712cdcd8: ffff88086f2eb500 ffff88086f470958
ffff8808712cdce8: ffff88086f2eb500 000000008116087a
ffff8808712cdcf8: 8000000869599067 ffff8808712cdd18
ffff8808712cdd08: 00000024ad794eaa ffff880870eb7020
ffff8808712cdd18: ffff880874fae480 ffff880871d48b00
ffff8808712cdd28: ffff880874fae080 ffff8808712cddd8
ffff8808712cdd38: ffff880870eb7000 00000000ffffff9c
ffff8808712cdd48: 0000000000000000 0000000000000000
ffff8808712cdd58: ffff8808712cdd98 ffffffff8119174a

crash> struct nameidata ffff8808712cdd88
struct nameidata {
path = {
mnt = 0xffff880870eb7000,
dentry = 0xffffff9c
},
last = {
hash = 1898765768,
len = 4294936584,
name = 0xffffffff8119191b "\205\300u\300eH\213\024%\300", <incomplete sequence \313>
},
root = {
mnt = 0xffff8808712cdeb8,
dentry = 0xffff8808712cdeb8
},
flags = 1894477824,
last_type = -30712,
depth = 1898766008,
saved_names = {0xffff8808712cde98 "\330\336,q\b\210\377\377\274i\030\201\377\377\377\377", 0xffffffff811925a7 "H\211\337A\211\305\350\036\305\377\377E\205\355u\027H\213\205@\377\377\377I\211\004$H\213\205H\377\377\377I\211D$\bD\211\350H\213]\340L\213e\350L\213m\360L\213u\370\311\303A\211\305\353\346\017\v\353\376\017\037@", 0xffff880874fae480 "\202\372t\b\210\377\377\202\372t\b\210\377\377\200\342\372t\b\210\377\377", 0xffff880871d48b00 "\a", 0xffff8808708134a8 "\300\067\242n\b\210\377\377", 0x14 <Address 0x14 out of bounds>, 0xffff880874fae080 "\240\216\372t\b\210\377\377\240\216\372t\b\210\377\377\200\356\372t\b\210\377\377@\021\200t\020\210\377\377\300\346\327o\b\210\377\377", 0xffff88086fd7e6c0 "o", 0x100000000 <Address 0x100000000 out of bounds>},
intent = {
open = {
flags = 0,
create_mode = -30712,
file = 0xffff8808712cdf58
}
}
}

crash> struct dentry ffffff9c //this address or command corrcet?
struct: invalid kernel virtual address: 0xffffff9c

@inevity
Copy link
Author

inevity commented Oct 15, 2014

What i need to know not only how this happed,also that avoiding the crash on the fs not removing the file or recreate a fs.
Thank your help.wish i can help lot.

@dweeezil
Copy link
Contributor

@inevity Oops, @behlendorf's comment made me realize I overlooked the fact that your files may have been created pre 0.6.3. I was concerned that you may have been seeing the current elusive problem in which the SA layout is incorrect. In your case, however, the layout is correct. The cause of your corruption should have been fixed the trio of 83021b4, 5d862cb and 472e7c6. You're likely going to have to recreate the filesystem(s) with this corruption.

@inevity
Copy link
Author

inevity commented Oct 15, 2014

I know the three sa related issues have force me to upgrade zfs to 0.6.3 from 0.6.0-RC14.,but i came across the issue #2597 at 0.6.3.i think we should apply the patch 'Add object type checking to zap_lockdir() '.so i finally use the zfs master.

how can I recreate fs while keep original file? or is there a patch we can apply to work around the corrupted file same as the method issue 2597 using. In #2597 i apply the patch,the ls no long hang ,only return invalid argument when ls one corrupted file.

@kernelOfTruth
Copy link
Contributor

Matching commits and issue tracker entries in Illumos:

https://illumos.org/issues/6434 sa_find_sizes() may compute wrong SA header size
http://lists.open-zfs.org/pipermail/developer/2013-November/000306.html
illumos/illumos-gate@3502ed6

@behlendorf
Copy link
Contributor

Closing, all of these issues have been addressed in ZoL and we'd pushed the fix upstream to illumos.

openzfs/openzfs#24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants