-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null pointer dereference in dva_get_dsize_sync #1707
Comments
This appears to be related to dedup. I tried changing the PREEMPT settings as was suggested in another hang, but it had no effect. Turning off dedup, however, seems to have made the problem go away. |
The would be consistent with the stack above. It appears there's something wrong with the zap used for deduplication. |
Well, it certainly seems to have gotten a lot better without the dedup, but now it happened again. Slightly different stack trace: [1457404.290472] CPU 4 |
Actually, after that error I'm now getting this same error whenever it tries to mount that volume. Does this mean my data are gone or is there something I can try? |
I tried to scrub the pool and it got to about 16% before it hung again with the same error, now with zpool status also saying that there were errors detected in the metadata. I ended up blowing the volume away. |
@lutorm The latest stacks suggest a problem with the existing dedup tables on disk. Unfortunately destroying and recreating the pool without deduplication is quickest way to entirely resolve the issue. |
The dva_get_dsize_sync() function incorrectly assumes that the call to vdev_lookup_top() cannot fail. However, the NULL dereference at clearly shows that under certain circumstances it is possible. Note that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio. BUG: unable to handle kernel NULL pointer dereference at 00000570 crash> struct -o vdev struct vdev { [0] uint64_t vdev_id; ... ... [1376] uint64_t vdev_deflate_ratio; Given that this can happen this patch add the required error handling. In the case where vdev_lookup_top() fails assume that no deflation will occur for the DVA and use the asize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#1707 Issue openzfs#1987 Issue openzfs#1891
The dva_get_dsize_sync() function incorrectly assumes that the call to vdev_lookup_top() cannot fail. However, the NULL dereference at clearly shows that under certain circumstances it is possible. Note that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio. BUG: unable to handle kernel NULL pointer dereference at 00000570 crash> struct -o vdev struct vdev { [0] uint64_t vdev_id; ... ... [1376] uint64_t vdev_deflate_ratio; Given that this can happen this patch add the required error handling. In the case where vdev_lookup_top() fails assume that no deflation will occur for the DVA and use the asize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#1707 Issue openzfs#1987 Issue openzfs#1891
For some reason, my system have begun to see a zfs lockup. It will happen within a minute with a particular workload. I've tried kernel 3.4.0 and zfs 0.6.1 and 0.6.2 and kernel 3.8.0 / zfs 0.6.2 and it seems to be the same. The dmesg is below.
Any ideas? I don't understand why it just started, I've been using it pretty stably for months.
cheers,
/Patrik
[ 1280.984093] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
[ 1280.984100] IP: [] dva_get_dsize_sync+0x4d/0x60 [zfs]
[ 1280.984136] PGD 5c01c8067 PUD 5c0237067 PMD 0
[ 1280.984139] Oops: 0000 [#1] SMP
[ 1280.984141] Modules linked in: zfs(POF) zcommon(POF) znvpair(POF) zavl(POF) zunicode(POF) spl(OF) zlib_deflate(F) openvswitch(F) autofs4(F) snd_hda_codec_hdmi(F) nvidia(POF) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) bnep(F) aes_x86_64(F) xts(F) gf128mul(F) rfcomm(F) gpio_ich(F) snd_hda_codec_realtek(F) bluetooth(F) hp_wmi(F) sparse_keymap(F) snd_hda_intel(F) snd_hda_codec(F) snd_hwdep(F) snd_pcm(F) psmouse(F) microcode(F) serio_raw(F) snd_seq_midi(F) parport_pc(F) snd_rawmidi(F) ppdev(F) snd_seq_midi_event(F) snd_seq(F) snd_timer(F) snd_seq_device(F) lpc_ich(F) i7core_edac(F) wmi(F) snd(F) edac_core(F) soundcore(F) snd_page_alloc(F) mac_hid(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) lockd(F) lp(F) parport(F) sunrpc(F) hid_generic(F) hid_logitech_dj(F) usbhid(F) hid(F) raid10(F) raid456(F) async_pq(F) async_xor(F) xor(F) async_memcpy(F) async_raid6_recov(F) firewire_ohci(F) firewire_core(F) crc_itu_t(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F) ahci(F) libahci(F) floppy(F) tg3(F) ptp(F) pps_core(F) raid6_pq(F) async_tx(F) raid1(F) raid0(F) multipath(F) linear(F)
[ 1280.984180] CPU 6
[ 1280.984183] Pid: 14252, comm: txg_sync Tainted: PF IO 3.8.0-30-generic #44~precise1-Ubuntu Hewlett-Packard HP Z800 Workstation/0AECh
[ 1280.984185] RIP: 0010:[] [] dva_get_dsize_sync+0x4d/0x60 [zfs]
[ 1280.984202] RSP: 0000:ffff8805fd2a5788 EFLAGS: 00010293
[ 1280.984203] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000002
[ 1280.984204] RDX: ffff880306f1e000 RSI: 0000000000000004 RDI: ffff8805e8e9b000
[ 1280.984205] RBP: ffff8805fd2a5798 R08: 02d354c6c0916998 R09: ffff880571df5a00
[ 1280.984206] R10: ffffffffa04b58cb R11: 0000000000000000 R12: ffffc9001adbee00
[ 1280.984207] R13: 0000000000000bf8 R14: 0000000000000003 R15: ffff8805c8cc8000
[ 1280.984208] FS: 0000000000000000(0000) GS:ffff88061fc00000(0000) knlGS:0000000000000000
[ 1280.984209] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1280.984210] CR2: 0000000000000560 CR3: 00000005bff07000 CR4: 00000000000007e0
[ 1280.984211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1280.984212] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1280.984214] Process txg_sync (pid: 14252, threadinfo ffff8805fd2a4000, task ffff8805de885d00)
[ 1280.984214] Stack:
[ 1280.984215] 0000000000000000 ffff8805e8e9b000 ffff8805fd2a57c8 ffffffffa0634255
[ 1280.984218] ffff880579bf4cc0 ffffc9001adbee00 ffff8802b38cde80 ffff880571df5a00
[ 1280.984220] ffff8805fd2a5848 ffffffffa05e5d45 00000000b434f1c0 ffff880200000000
[ 1280.984222] Call Trace:
[ 1280.984240] [] bp_get_dsize+0x55/0x90 [zfs]
[ 1280.984250] [] dbuf_dirty+0x705/0x810 [zfs]
[ 1280.984259] [] dmu_buf_will_dirty+0xaf/0x100 [zfs]
[ 1280.984276] [] zap_get_leaf_byblk+0x111/0x2d0 [zfs]
[ 1280.984293] [] ? zap_idx_to_blk+0xdf/0x140 [zfs]
[ 1280.984301] [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
[ 1280.984304] [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
[ 1280.984321] [] zap_deref_leaf+0x6d/0x80 [zfs]
[ 1280.984338] [] fzap_update+0xad/0x170 [zfs]
[ 1280.984354] [] zap_update_uint64+0x9b/0xe0 [zfs]
[ 1280.984364] [] ddt_zap_update+0x81/0xb0 [zfs]
[ 1280.984369] [] ? kfree+0x3b/0x150
[ 1280.984385] [] ? dva_get_dsize_sync+0x4d/0x60 [zfs]
[ 1280.984395] [] ? ddt_stat_update+0x129/0x1b0 [zfs]
[ 1280.984405] [] ddt_sync+0x237/0x6b0 [zfs]
[ 1280.984411] [] ? __wake_up+0x53/0x70
[ 1280.984427] [] spa_sync+0x46e/0xae0 [zfs]
[ 1280.984431] [] ? ktime_get_ts+0x4c/0xe0
[ 1280.984447] [] txg_sync_thread+0x2df/0x540 [zfs]
[ 1280.984463] [] ? txg_init+0x250/0x250 [zfs]
[ 1280.984467] [] thread_generic_wrapper+0x78/0x90 [spl]
[ 1280.984471] [] ? __thread_create+0x310/0x310 [spl]
[ 1280.984474] [] kthread+0xc0/0xd0
[ 1280.984476] [] ? flush_kthread_worker+0xb0/0xb0
[ 1280.984480] [] ret_from_fork+0x7c/0xb0
[ 1280.984482] [] ? flush_kthread_worker+0xb0/0xb0
[ 1280.984483] Code: 74 0d 48 83 bf 28 09 00 00 00 48 89 c3 75 0f 48 83 c4 08 48 89 d8 5b 5d c3 0f 1f 44 00 00 48 c1 ee 20 48 c1 eb 09 e8 b3 5d 00 00 <48> 0f af 98 60 05 00 00 48 83 c4 08 48 89 d8 5b 5d c3 90 66 66
[ 1280.984501] RIP [] dva_get_dsize_sync+0x4d/0x60 [zfs]
[ 1280.984517] RSP
[ 1280.984518] CR2: 0000000000000560
[ 1280.984520] ---[ end trace d34fc7027f3e38a3 ]---
The text was updated successfully, but these errors were encountered: