Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS Invalid mode: 0x50 - SPL PANIC #2343

Closed
greg-hydrogen opened this issue May 18, 2014 · 11 comments
Closed

ZFS Invalid mode: 0x50 - SPL PANIC #2343

greg-hydrogen opened this issue May 18, 2014 · 11 comments
Milestone

Comments

@greg-hydrogen
Copy link

Hello Everyone,

Not sure if this should go in the SPL issue tracker rather than ZFS so feel free to move it.

Running on the lastest HEAD I am receiving this error on a Fedora 20 (x64) running the latest kernel 3.14.4-20

I was able to capture both stacks from two different circumstances

first stack is from the Fedora box:

May 18 11:32:03 localhost kernel: [ 3457.431066] ZFS: Invalid mode: 0x50
May 18 11:32:03 localhost kernel: [ 3457.431070] VERIFY(0) failed
May 18 11:32:03 localhost kernel: [ 3457.431074] SPLError: 9457:0:(zfs_znode.c:340:zfs_inode_set_ops()) SPL PANIC
May 18 11:32:03 localhost kernel: [ 3457.431075] SPL: Showing stack for process 9457
May 18 11:32:03 localhost kernel: [ 3457.431080] CPU: 6 PID: 9457 Comm: mandb Tainted: PF O 3.14.4-200.fc20.x86_64 #1
May 18 11:32:03 localhost kernel: [ 3457.431082] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97, BIOS 1208 04/18/2012
May 18 11:32:03 localhost kernel: [ 3457.431083] 0000000000000000 000000008a4d5e5e ffff88051792f7d8 ffffffff816ef1d2
May 18 11:32:03 localhost kernel: [ 3457.431090] 0000000000000000 ffff88051792f7e8 ffffffffa00004b7 ffff88051792f810
May 18 11:32:03 localhost kernel: [ 3457.431096] ffffffffa00016d8 ffffffffa00157c1 000000000000000a ffff880513690040
May 18 11:32:03 localhost kernel: [ 3457.431099] Call Trace:
May 18 11:32:03 localhost kernel: [ 3457.431108] [] dump_stack+0x45/0x56
May 18 11:32:03 localhost kernel: [ 3457.431132] [] spl_debug_dumpstack+0x27/0x40 [spl]
May 18 11:32:03 localhost kernel: [ 3457.431141] [] spl_debug_bug+0x78/0xe0 [spl]
May 18 11:32:03 localhost kernel: [ 3457.431184] [] zfs_znode_alloc+0x48c/0x5a0 [zfs]
May 18 11:32:03 localhost kernel: [ 3457.431215] [] zfs_zget+0x181/0x1f0 [zfs]
May 18 11:32:03 localhost kernel: [ 3457.431243] [] zfs_dirent_lock+0x446/0x580 [zfs]
May 18 11:32:03 localhost kernel: [ 3457.431274] [] zfs_dirlook+0x90/0x380 [zfs]
May 18 11:32:03 localhost kernel: [ 3457.431292] [] ? dbuf_dirty+0x3d5/0x850 [zfs]
May 18 11:32:03 localhost kernel: [ 3457.431323] [] zfs_lookup+0x306/0x360 [zfs]
May 18 11:32:03 localhost kernel: [ 3457.431353] [] zpl_lookup+0x93/0x100 [zfs]
May 18 11:32:03 localhost kernel: [ 3457.431357] [] lookup_real+0x1d/0x50
May 18 11:32:03 localhost kernel: [ 3457.431359] [] __lookup_hash+0x42/0x60
May 18 11:32:03 localhost kernel: [ 3457.431362] [] lookup_slow+0x42/0xa7
May 18 11:32:03 localhost kernel: [ 3457.431365] [] path_lookupat+0x73d/0x7a0
May 18 11:32:03 localhost kernel: [ 3457.431368] [] ? kmem_cache_alloc+0x35/0x1f0
May 18 11:32:03 localhost kernel: [ 3457.431370] [] ? getname_flags+0x4f/0x1a0
May 18 11:32:03 localhost kernel: [ 3457.431373] [] filename_lookup+0x2b/0xc0
May 18 11:32:03 localhost kernel: [ 3457.431376] [] user_path_at_empty+0x67/0xc0
May 18 11:32:03 localhost kernel: [ 3457.431380] [] ? from_kgid_munged+0x12/0x20
May 18 11:32:03 localhost kernel: [ 3457.431383] [] ? cp_new_stat+0x14f/0x180
May 18 11:32:03 localhost kernel: [ 3457.431385] [] user_path_at+0x11/0x20
May 18 11:32:03 localhost kernel: [ 3457.431388] [] vfs_fstatat+0x63/0xc0
May 18 11:32:03 localhost kernel: [ 3457.431391] [] SYSC_newlstat+0x31/0x60
May 18 11:32:03 localhost kernel: [ 3457.431394] [] ? __audit_syscall_exit+0x1f6/0x2a0
May 18 11:32:03 localhost kernel: [ 3457.431397] [] SyS_newlstat+0xe/0x10
May 18 11:32:03 localhost kernel: [ 3457.431400] [] system_call_fastpath+0x16/0x1b
May 18 11:32:03 localhost kernel: ZFS: Invalid mode: 0x50
May 18 11:32:03 localhost kernel: VERIFY(0) failed
May 18 11:32:03 localhost kernel: SPLError: 9457:0:(zfs_znode.c:340:zfs_inode_set_ops()) SPL PANIC
May 18 11:32:03 localhost kernel: SPL: Showing stack for process 9457
May 18 11:32:03 localhost kernel: CPU: 6 PID: 9457 Comm: mandb Tainted: PF O 3.14.4-200.fc20.x86_64 #1
May 18 11:32:03 localhost kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97, BIOS 1208 04/18/2012
May 18 11:32:03 localhost kernel: 0000000000000000 000000008a4d5e5e ffff88051792f7d8 ffffffff816ef1d2
May 18 11:32:03 localhost kernel: 0000000000000000 ffff88051792f7e8 ffffffffa00004b7 ffff88051792f810
May 18 11:32:03 localhost kernel: ffffffffa00016d8 ffffffffa00157c1 000000000000000a ffff880513690040
May 18 11:32:03 localhost kernel: Call Trace:
May 18 11:32:03 localhost kernel: [] dump_stack+0x45/0x56
May 18 11:32:03 localhost kernel: [] spl_debug_dumpstack+0x27/0x40 [spl]
May 18 11:32:03 localhost kernel: [] spl_debug_bug+0x78/0xe0 [spl]
May 18 11:32:03 localhost kernel: [] zfs_znode_alloc+0x48c/0x5a0 [zfs]
May 18 11:32:03 localhost kernel: [] zfs_zget+0x181/0x1f0 [zfs]
May 18 11:32:03 localhost kernel: [] zfs_dirent_lock+0x446/0x580 [zfs]
May 18 11:32:03 localhost kernel: [] zfs_dirlook+0x90/0x380 [zfs]
May 18 11:32:03 localhost kernel: [] ? dbuf_dirty+0x3d5/0x850 [zfs]
May 18 11:32:03 localhost kernel: [] zfs_lookup+0x306/0x360 [zfs]
May 18 11:32:03 localhost kernel: [] zpl_lookup+0x93/0x100 [zfs]
May 18 11:32:03 localhost kernel: [] lookup_real+0x1d/0x50
May 18 11:32:03 localhost kernel: [] __lookup_hash+0x42/0x60
May 18 11:32:03 localhost kernel: [] lookup_slow+0x42/0xa7
May 18 11:32:03 localhost kernel: [] path_lookupat+0x73d/0x7a0
May 18 11:32:03 localhost kernel: [] ? kmem_cache_alloc+0x35/0x1f0
May 18 11:32:03 localhost kernel: [] ? getname_flags+0x4f/0x1a0
May 18 11:32:03 localhost kernel: [] filename_lookup+0x2b/0xc0
May 18 11:32:03 localhost kernel: [] user_path_at_empty+0x67/0xc0
May 18 11:32:03 localhost kernel: [] ? from_kgid_munged+0x12/0x20
May 18 11:32:03 localhost kernel: [] ? cp_new_stat+0x14f/0x180
May 18 11:32:03 localhost kernel: [] user_path_at+0x11/0x20
May 18 11:32:03 localhost kernel: [] vfs_fstatat+0x63/0xc0
May 18 11:32:03 localhost kernel: [] SYSC_newlstat+0x31/0x60
May 18 11:32:03 localhost kernel: [] ? __audit_syscall_exit+0x1f6/0x2a0
May 18 11:32:03 localhost kernel: [] SyS_newlstat+0xe/0x10
May 18 11:32:03 localhost kernel: [] system_call_fastpath+0x16/0x1b

Second one is from a chroot environment (from SL 6.5)
May 17 12:21:48 greg-lnxworkstn kernel: ZFS: Invalid mode: 0x2c
May 17 12:21:48 greg-lnxworkstn kernel: VERIFY(0) failed
May 17 12:21:48 greg-lnxworkstn kernel: SPLError: 7318:0:(zfs_znode.c:340:zfs_inode_set_ops()) SPL PANIC
May 17 12:21:48 greg-lnxworkstn kernel: SPL: Showing stack for process 7318
May 17 12:21:48 greg-lnxworkstn kernel: CPU: 3 PID: 7318 Comm: systemctl Tainted: P O 3.14.1 #1
May 17 12:21:48 greg-lnxworkstn kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97, BIOS 1208 04/18/2012
May 17 12:21:48 greg-lnxworkstn kernel: ffff88065b666580 ffff88065ba6b7a8 ffffffff815ae9f8 ffffffff81db0840
May 17 12:21:48 greg-lnxworkstn kernel: 0000000000000000 ffff88065ba6b7b8 ffffffffa00895a7 ffff88065ba6b7f8
May 17 12:21:48 greg-lnxworkstn kernel: ffffffffa008ac9a ffffffffa009ee8c ffff88083eccd1e8 ffff88083eccecf0
May 17 12:21:48 greg-lnxworkstn kernel: Call Trace:
May 17 12:21:48 greg-lnxworkstn kernel: [] dump_stack+0x49/0x61
May 17 12:21:48 greg-lnxworkstn kernel: [] spl_debug_dumpstack+0x27/0x40 [spl]
May 17 12:21:48 greg-lnxworkstn kernel: [] spl_debug_bug+0x7a/0xe0 [spl]
May 17 12:21:48 greg-lnxworkstn kernel: [] zfs_inode_set_ops+0x83/0x180 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] zfs_znode_alloc+0x2cf/0x3d0 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] ? dmu_object_info_from_dnode+0x91/0xf0 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] zfs_zget+0x178/0x1e0 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] ? zap_lookup_norm+0xbe/0x190 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] zfs_dirent_lock+0x482/0x550 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] zfs_dirlook+0x8b/0x270 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] ? mutex_lock+0x16/0x40
May 17 12:21:48 greg-lnxworkstn kernel: [] zfs_lookup+0x2ee/0x340 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] zpl_lookup+0x78/0xe0 [zfs]
May 17 12:21:48 greg-lnxworkstn kernel: [] ? d_alloc+0x4d/0x60
May 17 12:21:48 greg-lnxworkstn kernel: [] lookup_real+0x1d/0x60
May 17 12:21:48 greg-lnxworkstn kernel: [] __lookup_hash+0x38/0x50
May 17 12:21:48 greg-lnxworkstn kernel: [] lookup_slow+0x4c/0xc0
May 17 12:21:48 greg-lnxworkstn kernel: [] path_lookupat+0x631/0x6a0
May 17 12:21:48 greg-lnxworkstn kernel: [] filename_lookup+0x34/0xc0
May 17 12:21:48 greg-lnxworkstn kernel: [] user_path_at_empty+0x59/0xa0
May 17 12:21:48 greg-lnxworkstn kernel: [] ? touch_atime+0x5b/0x160
May 17 12:21:48 greg-lnxworkstn kernel: [] SyS_readlinkat+0x5e/0x130
May 17 12:21:48 greg-lnxworkstn kernel: [] SyS_readlink+0x1b/0x20
May 17 12:21:48 greg-lnxworkstn kernel: [] system_call_fastpath+0x16/0x1b
May 17 12:23:26 greg-lnxworkstn automount[4873]: create_client: hostname lookup failed: Name or service not known
May 17 12:23:26 greg-lnxworkstn automount[4873]: create_client: hostname lookup failed: Name or service not known
May 17 12:23:26 greg-lnxworkstn automount[4873]: get_exports: lookup(hosts): exports lookup failed for Mike's Music
May 17 12:23:26 greg-lnxworkstn automount[4873]: key "Mike's Music" not found in map source(s).
May 17 12:23:27 greg-lnxworkstn automount[4873]: create_client: hostname lookup failed: Name or service not known
May 17 12:23:27 greg-lnxworkstn automount[4873]: create_client: hostname lookup failed: Name or service not known
May 17 12:23:27 greg-lnxworkstn automount[4873]: get_exports: lookup(hosts): exports lookup failed for Robyn Models
May 17 12:23:27 greg-lnxworkstn automount[4873]: key "Robyn Models" not found in map source(s).
May 17 12:24:18 greg-lnxworkstn ntpd[5085]: 0.0.0.0 c612 02 freq_set kernel -2.104 PPM
May 17 12:24:18 greg-lnxworkstn ntpd[5085]: 0.0.0.0 c615 05 clock_sync
May 17 12:24:30 greg-lnxworkstn kernel: INFO: task systemctl:7318 blocked for more than 120 seconds.
May 17 12:24:30 greg-lnxworkstn kernel: Tainted: P O 3.14.1 #1
May 17 12:24:30 greg-lnxworkstn kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 12:24:30 greg-lnxworkstn kernel: systemctl D ffffffff8161b6c0 0 7318 1 0x00000004
May 17 12:24:30 greg-lnxworkstn kernel: ffff88065ba6b7a8 0000000000000046 ffff88065b666580 ffff88065ba6a010
May 17 12:24:30 greg-lnxworkstn kernel: 0000000000012cc0 0000000000012cc0 ffff880671ba3050 ffff880816aa5150
May 17 12:24:30 greg-lnxworkstn kernel: 0000000000000000 0000000000000000 ffff88065b666580 ffff880709b04000
May 17 12:24:30 greg-lnxworkstn kernel: Call Trace:
May 17 12:24:30 greg-lnxworkstn kernel: [] schedule+0x29/0x70
May 17 12:24:30 greg-lnxworkstn kernel: [] spl_debug_bug+0xad/0xe0 [spl]
May 17 12:24:30 greg-lnxworkstn kernel: [] zfs_inode_set_ops+0x83/0x180 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] zfs_znode_alloc+0x2cf/0x3d0 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] ? dmu_object_info_from_dnode+0x91/0xf0 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] zfs_zget+0x178/0x1e0 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] ? zap_lookup_norm+0xbe/0x190 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] zfs_dirent_lock+0x482/0x550 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] zfs_dirlook+0x8b/0x270 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] ? mutex_lock+0x16/0x40
May 17 12:24:30 greg-lnxworkstn kernel: [] zfs_lookup+0x2ee/0x340 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] zpl_lookup+0x78/0xe0 [zfs]
May 17 12:24:30 greg-lnxworkstn kernel: [] ? d_alloc+0x4d/0x60
May 17 12:24:30 greg-lnxworkstn kernel: [] lookup_real+0x1d/0x60
May 17 12:24:30 greg-lnxworkstn kernel: [] __lookup_hash+0x38/0x50
May 17 12:24:30 greg-lnxworkstn kernel: [] lookup_slow+0x4c/0xc0
May 17 12:24:30 greg-lnxworkstn kernel: [] path_lookupat+0x631/0x6a0
May 17 12:24:30 greg-lnxworkstn kernel: [] filename_lookup+0x34/0xc0
May 17 12:24:30 greg-lnxworkstn kernel: [] user_path_at_empty+0x59/0xa0
May 17 12:24:30 greg-lnxworkstn kernel: [] ? touch_atime+0x5b/0x160
May 17 12:24:30 greg-lnxworkstn kernel: [] SyS_readlinkat+0x5e/0x130
May 17 12:24:30 greg-lnxworkstn kernel: [] SyS_readlink+0x1b/0x20
May 17 12:24:30 greg-lnxworkstn kernel: [] system_call_fastpath+0x16/0x1b
May 17 12:26:30 greg-lnxworkstn kernel: INFO: task systemctl:7318 blocked for more than 120 seconds.
May 17 12:26:30 greg-lnxworkstn kernel: Tainted: P O 3.14.1 #1
May 17 12:26:30 greg-lnxworkstn kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 12:26:30 greg-lnxworkstn kernel: systemctl D ffffffff8161b6c0 0 7318 1 0x00000004
May 17 12:26:30 greg-lnxworkstn kernel: ffff88065ba6b7a8 0000000000000046 ffff88065b666580 ffff88065ba6a010
May 17 12:26:30 greg-lnxworkstn kernel: 0000000000012cc0 0000000000012cc0 ffff880671ba3050 ffff880816aa5150
May 17 12:26:30 greg-lnxworkstn kernel: 0000000000000000 0000000000000000 ffff88065b666580 ffff880709b04000
May 17 12:26:30 greg-lnxworkstn kernel: Call Trace:
May 17 12:26:30 greg-lnxworkstn kernel: [] schedule+0x29/0x70
May 17 12:26:30 greg-lnxworkstn kernel: [] spl_debug_bug+0xad/0xe0 [spl]
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_inode_set_ops+0x83/0x180 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_znode_alloc+0x2cf/0x3d0 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? dmu_object_info_from_dnode+0x91/0xf0 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_zget+0x178/0x1e0 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? zap_lookup_norm+0xbe/0x190 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_dirent_lock+0x482/0x550 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_dirlook+0x8b/0x270 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? mutex_lock+0x16/0x40
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_lookup+0x2ee/0x340 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] zpl_lookup+0x78/0xe0 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? d_alloc+0x4d/0x60
May 17 12:26:30 greg-lnxworkstn kernel: [] lookup_real+0x1d/0x60
May 17 12:26:30 greg-lnxworkstn kernel: [] __lookup_hash+0x38/0x50
May 17 12:26:30 greg-lnxworkstn kernel: [] lookup_slow+0x4c/0xc0
May 17 12:26:30 greg-lnxworkstn kernel: [] path_lookupat+0x631/0x6a0
May 17 12:26:30 greg-lnxworkstn kernel: [] filename_lookup+0x34/0xc0
May 17 12:26:30 greg-lnxworkstn kernel: [] user_path_at_empty+0x59/0xa0
May 17 12:26:30 greg-lnxworkstn kernel: [] ? touch_atime+0x5b/0x160
May 17 12:26:30 greg-lnxworkstn kernel: [] SyS_readlinkat+0x5e/0x130
May 17 12:26:30 greg-lnxworkstn kernel: [] SyS_readlink+0x1b/0x20
May 17 12:26:30 greg-lnxworkstn kernel: [] system_call_fastpath+0x16/0x1b
May 17 12:26:30 greg-lnxworkstn kernel: INFO: task yum:7434 blocked for more than 120 seconds.
May 17 12:26:30 greg-lnxworkstn kernel: Tainted: P O 3.14.1 #1
May 17 12:26:30 greg-lnxworkstn kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 12:26:30 greg-lnxworkstn kernel: yum D ffffffff8161b6c0 0 7434 7367 0x00000004
May 17 12:26:30 greg-lnxworkstn kernel: ffff88072d4c98b8 0000000000000046 0000000000000000 ffff88072d4c8010
May 17 12:26:30 greg-lnxworkstn kernel: 0000000000012cc0 0000000000012cc0 ffff88065d2ff050 ffff880816aa5150
May 17 12:26:30 greg-lnxworkstn kernel: 0000000000000000 ffff880709b05390 ffff88065d2ff050 ffff880709b05394
May 17 12:26:30 greg-lnxworkstn kernel: Call Trace:
May 17 12:26:30 greg-lnxworkstn kernel: [] schedule+0x29/0x70
May 17 12:26:30 greg-lnxworkstn kernel: [] schedule_preempt_disabled+0xe/0x10
May 17 12:26:30 greg-lnxworkstn kernel: [] __mutex_lock_slowpath+0x183/0x220
May 17 12:26:30 greg-lnxworkstn kernel: [] ? dbuf_rele_and_unlock+0x169/0x220 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] mutex_lock+0x23/0x40
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_zget+0xfb/0x1e0 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? zap_lookup_norm+0xbe/0x190 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_dirent_lock+0x482/0x550 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_dirlook+0x8b/0x270 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? mutex_lock+0x16/0x40
May 17 12:26:30 greg-lnxworkstn kernel: [] zfs_lookup+0x2ee/0x340 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] zpl_lookup+0x78/0xe0 [zfs]
May 17 12:26:30 greg-lnxworkstn kernel: [] ? d_alloc+0x4d/0x60
May 17 12:26:30 greg-lnxworkstn kernel: [] lookup_real+0x1d/0x60
May 17 12:26:30 greg-lnxworkstn kernel: [] __lookup_hash+0x38/0x50
May 17 12:26:30 greg-lnxworkstn kernel: [] lookup_slow+0x4c/0xc0
May 17 12:26:30 greg-lnxworkstn kernel: [] path_lookupat+0x631/0x6a0
May 17 12:26:30 greg-lnxworkstn kernel: [] ? release_pages+0x7b/0x1f0
May 17 12:26:30 greg-lnxworkstn kernel: [] filename_lookup+0x34/0xc0
May 17 12:26:30 greg-lnxworkstn kernel: [] user_path_at_empty+0x59/0xa0
May 17 12:26:30 greg-lnxworkstn kernel: [] ? list_lru_add+0xb8/0x100
May 17 12:26:30 greg-lnxworkstn kernel: [] ? cp_new_stat+0x149/0x170
May 17 12:26:30 greg-lnxworkstn kernel: [] user_path_at+0x11/0x20
May 17 12:26:30 greg-lnxworkstn kernel: [] vfs_fstatat+0x59/0xb0
May 17 12:26:30 greg-lnxworkstn kernel: [] vfs_stat+0x1b/0x20
May 17 12:26:30 greg-lnxworkstn kernel: [] SYSC_newstat+0x1f/0x50
May 17 12:26:30 greg-lnxworkstn kernel: [] SyS_newstat+0xe/0x10
May 17 12:26:30 greg-lnxworkstn kernel: [] system_call_fastpath+0x16/0x1b

I am going to downgrade to an earlier version (april 6) but I can keep around this instance for testing...

Please let me know if you need additional info

Package list:
kmod-spl-3.11.10-301.fc20.x86_64.x86_64 0.6.2-38_gad3412e.fc20 @/kmod-spl-3.11.10-301.fc20.x86_64-0.6.2-38_gad3412e.fc20.x86_64
kmod-spl-3.14.4-200.fc20.x86_64.x86_64 0.6.2-38_gad3412e.fc20 @/kmod-spl-3.14.4-200.fc20.x86_64-0.6.2-38_gad3412e.fc20.x86_64
kmod-spl-devel.x86_64 0.6.2-38_gad3412e.fc20 @/kmod-spl-devel-0.6.2-38_gad3412e.fc20.x86_64
kmod-spl-devel-3.11.10-301.fc20.x86_64.x86_64 0.6.2-38_gad3412e.fc20 @/kmod-spl-devel-3.11.10-301.fc20.x86_64-0.6.2-38_gad3412e.fc20.x86_64
kmod-spl-devel-3.14.4-200.fc20.x86_64.x86_64 0.6.2-38_gad3412e.fc20 @/kmod-spl-devel-3.14.4-200.fc20.x86_64-0.6.2-38_gad3412e.fc20.x86_64
kmod-zfs-3.11.10-301.fc20.x86_64.x86_64 0.6.2-275_gbc25c93.fc20 @/kmod-zfs-3.11.10-301.fc20.x86_64-0.6.2-275_gbc25c93.fc20.x86_64
kmod-zfs-3.14.4-200.fc20.x86_64.x86_64 0.6.2-275_gbc25c93.fc20 @/kmod-zfs-3.14.4-200.fc20.x86_64-0.6.2-275_gbc25c93.fc20.x86_64
kmod-zfs-devel.x86_64 0.6.2-275_gbc25c93.fc20 @/kmod-zfs-devel-0.6.2-275_gbc25c93.fc20.x86_64
kmod-zfs-devel-3.11.10-301.fc20.x86_64.x86_64 0.6.2-275_gbc25c93.fc20 @/kmod-zfs-devel-3.11.10-301.fc20.x86_64-0.6.2-275_gbc25c93.fc20.x86_64
kmod-zfs-devel-3.14.4-200.fc20.x86_64.x86_64 0.6.2-275_gbc25c93.fc20 @/kmod-zfs-devel-3.14.4-200.fc20.x86_64-0.6.2-275_gbc25c93.fc20.x86_64
spl.x86_64 0.6.2-38_gad3412e.fc20 @/spl-0.6.2-38_gad3412e.fc20.x86_64
spl-debuginfo.x86_64 0.6.2-38_gad3412e.fc20 @/spl-debuginfo-0.6.2-38_gad3412e.fc20.x86_64
spl-kmod-debuginfo.x86_64 0.6.2-38_gad3412e.fc20 @/spl-kmod-debuginfo-0.6.2-38_gad3412e.fc20.x86_64
zfs.x86_64 0.6.2-275_gbc25c93.fc20 @/zfs-0.6.2-275_gbc25c93.fc20.x86_64
zfs-debuginfo.x86_64 0.6.2-275_gbc25c93.fc20 @/zfs-debuginfo-0.6.2-275_gbc25c93.fc20.x86_64
zfs-devel.x86_64 0.6.2-275_gbc25c93.fc20 @/zfs-devel-0.6.2-275_gbc25c93.fc20.x86_64
zfs-dracut.x86_64 0.6.2-275_gbc25c93.fc20 @/zfs-dracut-0.6.2-275_gbc25c93.fc20.x86_64
zfs-kmod-debuginfo.x86_64 0.6.2-275_gbc25c93.fc20 @/zfs-kmod-debuginfo-0.6.2-275_gbc25c93.fc20.x86_64
zfs-test.x86_64 0.6.2-275_gbc25c93.fc20 @/zfs-test-0.6.2-275_gbc25c93.fc20.x86_64

@dweeezil
Copy link
Contributor

@greg-hydrogen Is xattr=sa set? Is selinux enabled? I presume this occurred on a previously-created pool?

@greg-hydrogen
Copy link
Author

@dweeezil xattr is set to sa, selinux is set to permissive and you are correct, this was a previously created pool. I didn't hit this issue on the 3.11 kernel with the previous master (from March) unfortunately I can't get the version I was running....

@dweeezil
Copy link
Contributor

@greg-hydrogen My guess at the moment is that this is the same problem as #2316, but there could still be something different afoot here. Unfortunately, the filesystem on which the error occurred is likely corrupted to the point it will need to be destroyed or rolled back to a snapshot prior to the corruption.

The mandb program stats all the files under /usr/share/man (and likely other places) so I have a feeling your corrupted file is in one of those directories.

Could you try something like:

cd /usr/share/man
for i in *; do
 echo $i
 ls -ld $i/* > /dev/null
done

and maybe the same for one deeper level of directories and confirm that's where your corruption lies?

There have been various changes as to the handling of symlinks since 0.6.2 and I suspect some operation may have updated one of the symlinks to use an SA and then also did a xattr operation which corrupted it. If this is all true, it would have likely been fixed by the patch in #2321 but, as I mentioned earlier, it's likely too late for this filesystem.

@behlendorf behlendorf added this to the 0.7.0 milestone May 19, 2014
@behlendorf behlendorf added the Bug label May 19, 2014
behlendorf pushed a commit that referenced this issue May 19, 2014
In the case where a variable-sized SA overlaps the spill block pointer and
a new variable-sized SA is being added, the header size was improperly
calculated to include the to-be-moved SA.  This problem could be
reproduced when xattr=sa enabled as follows:

	ln -s $(perl -e 'print "x" x 120') blah
	setfattr -n security.selinux -v blahblah -h blah

The symlink is large enough to interfere with the spill block pointer and
has a typical SA registration as follows (shown in modified "zdb -dddd"
<SA attr layout obj> format):

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ]

Adding the SA xattr will attempt to extend the registration to:

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ]

but since the ZPL_SYMLINK SA interferes with the spill block pointer, it
must also be moved to the spill block which will have a registration of:

	[ ZPL_SYMLINK ZPL_DXATTR ]

This commit updates extra_hdrsize when this condition occurs, allowing
hdrsize to be subsequently decreased appropriately.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue #2214
Issue #2228
Issue #2316
Issue #2343
@behlendorf
Copy link
Contributor

A potential fix for this has been merged. Please let us know if you're able to recreate this issue using a pool created from the latest master source which includes commit 83021b4.

@behlendorf behlendorf modified the milestones: 0.6.4, 0.7.0 May 19, 2014
@greg-hydrogen
Copy link
Author

Ok, I managed to move the offending machine into a VM so I can do what ever testing is necessary

I have tracked down one of the offending directories/files, as soon as I do an ls on /usr/share/man/man1 SPL panics. Funny, performing a find . in the same directory works ok

There also seems to be an issue with /bin with some files as well, but doing an strace ls exits clean, but a simple ls /bin will cause the SPL to panic

Is there any other information that would help or has this system exhausted its usefulness?

Also, considering that this pool has xattr=sa set, does this mean I could not import in FreeBSD or an illumos build? just curious if there was a way to recover any data

I will also build the current version MASTER on a previous snapshot of the same pool I have and see if I can recreate the issue

@dweeezil
Copy link
Contributor

@greg-hydrogen Under Fedora, /usr/share/man/man does normally contain a bunch of symlinks which tend to be fodder for this problem because the symlink.

Here's a capsule summary of the symlink/SA issue

  • There was a known issue which caused problem with symlinks having an SA-based xattr so 6a7c0cc was committed to ball back to directory-style xattrs on symlinks. This was included in 0.6.2 which means that filesystems operated on by 0.6.2 would never have any SA-based xattrs on symlinks.
  • This problem was fixed in 472e7c6 and then in b6e335b, SA-based xattrs on symlinks were re-enabled. Neither of these patches were part of any official tag but it's likely they were included in various "daily" releases and other people's hand-build installations.
  • Unfortunately, that patch either introduced or exacerbated another SA-related bug for which the fix was commited today in 83021b4.

Any system running code >= b6e335b and < 83021b4 has a high probability of corrupting symlink dnodes when xattr=sa is in force and, in particular, when selinux is enabled (which forces an xattr on every file; including symlnks).

Once you have a filesystem which triggers the "invalid mode" error, it's likely permanently corrupted and no subsequent patch will fix it. You must try to recover the filesystem by copying as much as possible to another one or rolling it back before the corruption occurred. If the filesystem doesn't cause a crash under a different ZFS implementation, you're lucky, but the corruption is likely still there.

@greg-hydrogen
Copy link
Author

@dweeezil - Thank you very much for the detailed summary

last question - I want to check my remaining pools to see if they are affected, you mentioned the versions in between that will have this issue... how would I go about checking if a version is greater or less then a specific version?

Thank you very much for the help! (and of course the patch!)

FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue May 20, 2014
In the case where a variable-sized SA overlaps the spill block pointer and
a new variable-sized SA is being added, the header size was improperly
calculated to include the to-be-moved SA.  This problem could be
reproduced when xattr=sa enabled as follows:

	ln -s $(perl -e 'print "x" x 120') blah
	setfattr -n security.selinux -v blahblah -h blah

The symlink is large enough to interfere with the spill block pointer and
has a typical SA registration as follows (shown in modified "zdb -dddd"
<SA attr layout obj> format):

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ]

Adding the SA xattr will attempt to extend the registration to:

	[ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ]

but since the ZPL_SYMLINK SA interferes with the spill block pointer, it
must also be moved to the spill block which will have a registration of:

	[ ZPL_SYMLINK ZPL_DXATTR ]

This commit updates extra_hdrsize when this condition occurs, allowing
hdrsize to be subsequently decreased appropriately.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue openzfs#2214
Issue openzfs#2228
Issue openzfs#2316
Issue openzfs#2343
@dweeezil
Copy link
Contributor

@greg-hydrogen The way to determine whether your module contains a particular commit depends on the place from which you get your packages and/or the source you used to compile them. I'm mostly familiar with the Ubuntu packaging. Those, for example, are built from the "pkg-zfs" repository which, itself, uses the "zfs" repository as an upstream on to which it layers a handful of patches.

It's probably more useful to test each of your filesystems from each of your pools. If you can do an "ls -lZ" on each file/directory without hitting an assertion, you're likely OK. Another good test might be to run zdb -b <pool> on each of your pools (this could take a very long time).

@greg-hydrogen
Copy link
Author

Just wanted to give an update

I managed to revert the pool to a previous working snapshot. I have been able to ls all the previous offending directories and run mandb without any issues.

I compiled the latest HEAD (25 May) and performed the same tasks as above all worked correctly. I decided to try and similiate the failure by performing an selinux autorelabel.

The autorelabel process throws up a lot of errors and once it finally reboots I get the following stacked

screenshot-3

Followed by:

screenshot-5

This is very reproducible with the current pool. I will attempt to perform this on a clean install and see if I an reproduce the same issue.

Let me know if there is anything else you need that will be helpful, I can keep this VM for as long as needed

@behlendorf
Copy link
Contributor

Hopefully the last SA fix is in #3074 and will be merged for the 0.6.4 tag. If you're game it would be interesting to run your test case in the VM once 0.6.4 is released or after #3074 is merged.

@behlendorf behlendorf added Bug - Major and removed Bug labels Feb 6, 2015
@behlendorf
Copy link
Contributor

@greg-hydrogen #3074 has been merged as a62d1b0, it address a symlink problem which could cause any issue like this. Since we've gotten to a reasonable root cause on this issue I'm going to close it out. However, if you're able to recreate this issue with a new pool on the latest code we can definitely reopen it and have another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants