Null pointer dereference in dva_get_dsize_sync #1707

lutorm · 2013-09-09T22:01:50Z

For some reason, my system have begun to see a zfs lockup. It will happen within a minute with a particular workload. I've tried kernel 3.4.0 and zfs 0.6.1 and 0.6.2 and kernel 3.8.0 / zfs 0.6.2 and it seems to be the same. The dmesg is below.

Any ideas? I don't understand why it just started, I've been using it pretty stably for months.

cheers,

/Patrik

[ 1280.984093] BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
[ 1280.984100] IP: [] dva_get_dsize_sync+0x4d/0x60 [zfs]
[ 1280.984136] PGD 5c01c8067 PUD 5c0237067 PMD 0
[ 1280.984139] Oops: 0000 [#1] SMP
[ 1280.984141] Modules linked in: zfs(POF) zcommon(POF) znvpair(POF) zavl(POF) zunicode(POF) spl(OF) zlib_deflate(F) openvswitch(F) autofs4(F) snd_hda_codec_hdmi(F) nvidia(POF) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) bnep(F) aes_x86_64(F) xts(F) gf128mul(F) rfcomm(F) gpio_ich(F) snd_hda_codec_realtek(F) bluetooth(F) hp_wmi(F) sparse_keymap(F) snd_hda_intel(F) snd_hda_codec(F) snd_hwdep(F) snd_pcm(F) psmouse(F) microcode(F) serio_raw(F) snd_seq_midi(F) parport_pc(F) snd_rawmidi(F) ppdev(F) snd_seq_midi_event(F) snd_seq(F) snd_timer(F) snd_seq_device(F) lpc_ich(F) i7core_edac(F) wmi(F) snd(F) edac_core(F) soundcore(F) snd_page_alloc(F) mac_hid(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) lockd(F) lp(F) parport(F) sunrpc(F) hid_generic(F) hid_logitech_dj(F) usbhid(F) hid(F) raid10(F) raid456(F) async_pq(F) async_xor(F) xor(F) async_memcpy(F) async_raid6_recov(F) firewire_ohci(F) firewire_core(F) crc_itu_t(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F) ahci(F) libahci(F) floppy(F) tg3(F) ptp(F) pps_core(F) raid6_pq(F) async_tx(F) raid1(F) raid0(F) multipath(F) linear(F)
[ 1280.984180] CPU 6
[ 1280.984183] Pid: 14252, comm: txg_sync Tainted: PF IO 3.8.0-30-generic #44~precise1-Ubuntu Hewlett-Packard HP Z800 Workstation/0AECh
[ 1280.984185] RIP: 0010:[] [] dva_get_dsize_sync+0x4d/0x60 [zfs]
[ 1280.984202] RSP: 0000:ffff8805fd2a5788 EFLAGS: 00010293
[ 1280.984203] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000002
[ 1280.984204] RDX: ffff880306f1e000 RSI: 0000000000000004 RDI: ffff8805e8e9b000
[ 1280.984205] RBP: ffff8805fd2a5798 R08: 02d354c6c0916998 R09: ffff880571df5a00
[ 1280.984206] R10: ffffffffa04b58cb R11: 0000000000000000 R12: ffffc9001adbee00
[ 1280.984207] R13: 0000000000000bf8 R14: 0000000000000003 R15: ffff8805c8cc8000
[ 1280.984208] FS: 0000000000000000(0000) GS:ffff88061fc00000(0000) knlGS:0000000000000000
[ 1280.984209] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1280.984210] CR2: 0000000000000560 CR3: 00000005bff07000 CR4: 00000000000007e0
[ 1280.984211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1280.984212] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1280.984214] Process txg_sync (pid: 14252, threadinfo ffff8805fd2a4000, task ffff8805de885d00)
[ 1280.984214] Stack:
[ 1280.984215] 0000000000000000 ffff8805e8e9b000 ffff8805fd2a57c8 ffffffffa0634255
[ 1280.984218] ffff880579bf4cc0 ffffc9001adbee00 ffff8802b38cde80 ffff880571df5a00
[ 1280.984220] ffff8805fd2a5848 ffffffffa05e5d45 00000000b434f1c0 ffff880200000000
[ 1280.984222] Call Trace:
[ 1280.984240] [] bp_get_dsize+0x55/0x90 [zfs]
[ 1280.984250] [] dbuf_dirty+0x705/0x810 [zfs]
[ 1280.984259] [] dmu_buf_will_dirty+0xaf/0x100 [zfs]
[ 1280.984276] [] zap_get_leaf_byblk+0x111/0x2d0 [zfs]
[ 1280.984293] [] ? zap_idx_to_blk+0xdf/0x140 [zfs]
[ 1280.984301] [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
[ 1280.984304] [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
[ 1280.984321] [] zap_deref_leaf+0x6d/0x80 [zfs]
[ 1280.984338] [] fzap_update+0xad/0x170 [zfs]
[ 1280.984354] [] zap_update_uint64+0x9b/0xe0 [zfs]
[ 1280.984364] [] ddt_zap_update+0x81/0xb0 [zfs]
[ 1280.984369] [] ? kfree+0x3b/0x150
[ 1280.984385] [] ? dva_get_dsize_sync+0x4d/0x60 [zfs]
[ 1280.984395] [] ? ddt_stat_update+0x129/0x1b0 [zfs]
[ 1280.984405] [] ddt_sync+0x237/0x6b0 [zfs]
[ 1280.984411] [] ? __wake_up+0x53/0x70
[ 1280.984427] [] spa_sync+0x46e/0xae0 [zfs]
[ 1280.984431] [] ? ktime_get_ts+0x4c/0xe0
[ 1280.984447] [] txg_sync_thread+0x2df/0x540 [zfs]
[ 1280.984463] [] ? txg_init+0x250/0x250 [zfs]
[ 1280.984467] [] thread_generic_wrapper+0x78/0x90 [spl]
[ 1280.984471] [] ? __thread_create+0x310/0x310 [spl]
[ 1280.984474] [] kthread+0xc0/0xd0
[ 1280.984476] [] ? flush_kthread_worker+0xb0/0xb0
[ 1280.984480] [] ret_from_fork+0x7c/0xb0
[ 1280.984482] [] ? flush_kthread_worker+0xb0/0xb0
[ 1280.984483] Code: 74 0d 48 83 bf 28 09 00 00 00 48 89 c3 75 0f 48 83 c4 08 48 89 d8 5b 5d c3 0f 1f 44 00 00 48 c1 ee 20 48 c1 eb 09 e8 b3 5d 00 00 <48> 0f af 98 60 05 00 00 48 83 c4 08 48 89 d8 5b 5d c3 90 66 66
[ 1280.984501] RIP [] dva_get_dsize_sync+0x4d/0x60 [zfs]
[ 1280.984517] RSP
[ 1280.984518] CR2: 0000000000000560
[ 1280.984520] ---[ end trace d34fc7027f3e38a3 ]---

lutorm · 2013-09-10T20:26:06Z

This appears to be related to dedup. I tried changing the PREEMPT settings as was suggested in another hang, but it had no effect. Turning off dedup, however, seems to have made the problem go away.

behlendorf · 2013-09-10T21:18:56Z

The would be consistent with the stack above. It appears there's something wrong with the zap used for deduplication.

lutorm · 2013-09-27T15:36:02Z

Well, it certainly seems to have gotten a lot better without the dedup, but now it happened again. Slightly different stack trace:

[1457404.290472] CPU 4
[1457404.290475] Pid: 13235, comm: as Tainted: PF O 3.9.0-0-generic #1~precise2 Hewlett-Packard HP Z800 Workstation/0AECh
[1457404.290477] RIP: 0010:[] [] vdev_readable+0x7/0x40 [zfs]
[1457404.290494] RSP: 0018:ffff88055997d600 EFLAGS: 00010246
[1457404.290495] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff8801d90a8de0
[1457404.290496] RDX: 0000000000000040 RSI: ffff8802fc6e6f80 RDI: 0000000000000000
[1457404.290497] RBP: ffff88055997d638 R08: 000000000ed5b291 R09: 000000006ad94887
[1457404.290498] R10: 00000000ec0780d5 R11: 00000000adfd4793 R12: ffff8802fc6e6fd0
[1457404.290499] R13: 0000000000000000 R14: ffff8802fc6e6f80 R15: 000000000070939f
[1457404.290500] FS: 00002aaaaaaf3280(0000) GS:ffff880313c80000(0000) knlGS:0000000000000000
[1457404.290501] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1457404.290502] CR2: 0000000000000040 CR3: 00000001f4b74000 CR4: 00000000000027e0
[1457404.290503] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1457404.290504] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1457404.290505] Process as (pid: 13235, threadinfo ffff88055997c000, task ffff8804749a9750)
[1457404.290506] Stack:
[1457404.290507] ffffffffa02d0142 ffff88055997d638 ffff8801d90a8d60 ffff8802fc6e6f80
[1457404.290509] ffff8801d90a8dc0 ffff880607c82000 0000000000000003 ffff88055997d6c8
[1457404.290511] ffffffffa02d07dc ffff88055997d668 ffffffff8170382d ffff88055997d678
[1457404.290514] Call Trace:
[1457404.290532] [] ? vdev_mirror_child_select.isra.1+0x62/0x110 [zfs]
[1457404.290549] [] vdev_mirror_io_start+0x38c/0x4e0 [zfs]
[1457404.290552] [] ? mutex_lock+0x1d/0x50
[1457404.290554] [] ? mutex_lock+0x1d/0x50
[1457404.290573] [] zio_vdev_io_start+0x237/0x2f0 [zfs]
[1457404.290590] [] zio_nowait+0xba/0x190 [zfs]
[1457404.290599] [] arc_read+0x573/0x860 [zfs]
[1457404.290611] [] ? ddt_zap_update+0xb0/0xb0 [zfs]
[1457404.290621] [] dbuf_prefetch+0x1b0/0x260 [zfs]
[1457404.290632] [] ? ddt_zap_update+0xb0/0xb0 [zfs]
[1457404.290643] [] dmu_prefetch+0x289/0x2d0 [zfs]
[1457404.290654] [] ? ddt_zap_update+0xb0/0xb0 [zfs]
[1457404.290671] [] fzap_prefetch+0x5d/0x70 [zfs]
[1457404.290688] [] zap_prefetch_uint64+0x76/0xa0 [zfs]
[1457404.290699] [] ddt_zap_prefetch+0x13/0x20 [zfs]
[1457404.290710] [] ddt_prefetch+0xbc/0xc0 [zfs]
[1457404.290713] [] ? trace_save_cmdline+0x7a/0xf0
[1457404.290717] [] ? read_tsc+0x9/0x20
[1457404.290719] [] ? default_spin_lock_flags+0x9/0x10
[1457404.290722] [] ? __kmalloc+0x14c/0x190
[1457404.290724] [] ? mutex_lock+0x1d/0x50
[1457404.290733] [] ? add_reference.isra.7+0xe6/0x120 [zfs]
[1457404.290736] [] ? mutex_lock+0x1d/0x50
[1457404.290746] [] ? dbuf_read+0x5cb/0x860 [zfs]
[1457404.290748] [] ? mutex_lock+0x1d/0x50
[1457404.290750] [] ? mutex_lock+0x1d/0x50
[1457404.290766] [] ? spa_config_exit+0x7d/0xc0 [zfs]
[1457404.290780] [] dsl_dataset_block_freeable+0x40/0x60 [zfs]
[1457404.290793] [] dmu_tx_hold_free+0x44c/0x8e0 [zfs]
[1457404.290804] [] dmu_free_long_range_impl+0x175/0x260 [zfs]
[1457404.290815] [] dmu_free_long_range+0x4b/0x70 [zfs]
[1457404.290832] [] zfs_rmnode+0x60/0x330 [zfs]
[1457404.290849] [] zfs_zinactive+0x84/0xd0 [zfs]
[1457404.290866] [] zfs_inactive+0x7c/0x230 [zfs]
[1457404.290869] [] ? truncate_pagecache+0x5e/0x70
[1457404.290886] [] zpl_evict_inode+0x28/0x30 [zfs]
[1457404.290889] [] evict+0xc0/0x1d0
[1457404.290892] [] iput_final+0xe2/0x170
[1457404.290894] [] iput+0x3e/0x50
[1457404.290896] [] do_unlinkat+0x201/0x270
[1457404.290899] [] ? sys_newlstat+0x2a/0x40
[1457404.290901] [] ? do_page_fault+0x39/0x50
[1457404.290903] [] sys_unlink+0x16/0x20
[1457404.290906] [] system_call_fastpath+0x1a/0x1f
[1457404.290907] Code: 01 00 00 00 48 89 e5 76 17 48 83 bf 70 05 00 00 00 75 0d 31 c0 48 81 7f 50 c0 dd 31 a0 0f 94 c0 5d c3 66 90 66 66 66 66 90 31 c0 <48> 83 7f 40 05 55 48 89 e5 76 20 48 83 bf 70 05 00 00 00 75 16
[1457404.290924] RIP [] vdev_readable+0x7/0x40 [zfs]
[1457404.290940] RSP
[1457404.290941] CR2: 0000000000000040
[1457404.290949] ---[ end trace f236b950474b59de ]---

lutorm · 2013-09-27T16:22:20Z

Actually, after that error I'm now getting this same error whenever it tries to mount that volume. Does this mean my data are gone or is there something I can try?

lutorm · 2013-09-27T18:30:57Z

I tried to scrub the pool and it got to about 16% before it hung again with the same error, now with zpool status also saying that there were errors detected in the metadata. I ended up blowing the volume away.

behlendorf · 2013-10-11T19:50:35Z

@lutorm The latest stacks suggest a problem with the existing dedup tables on disk. Unfortunately destroying and recreating the pool without deduplication is quickest way to entirely resolve the issue.

The dva_get_dsize_sync() function incorrectly assumes that the call to vdev_lookup_top() cannot fail. However, the NULL dereference at clearly shows that under certain circumstances it is possible. Note that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio. BUG: unable to handle kernel NULL pointer dereference at 00000570 crash> struct -o vdev struct vdev { [0] uint64_t vdev_id; ... ... [1376] uint64_t vdev_deflate_ratio; Given that this can happen this patch add the required error handling. In the case where vdev_lookup_top() fails assume that no deflation will occur for the DVA and use the asize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#1707 Issue openzfs#1987 Issue openzfs#1891

behlendorf modified the milestones: 0.6.3, 0.6.4 May 6, 2014

behlendorf closed this as completed in 2c33b91 May 6, 2014

behlendorf mentioned this issue Jun 26, 2014

Detect damaged DVA earlier #2426

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Null pointer dereference in dva_get_dsize_sync #1707

Null pointer dereference in dva_get_dsize_sync #1707

lutorm commented Sep 9, 2013

lutorm commented Sep 10, 2013

behlendorf commented Sep 10, 2013

lutorm commented Sep 27, 2013

lutorm commented Sep 27, 2013

lutorm commented Sep 27, 2013

behlendorf commented Oct 11, 2013

Null pointer dereference in dva_get_dsize_sync #1707

Null pointer dereference in dva_get_dsize_sync #1707

Comments

lutorm commented Sep 9, 2013

lutorm commented Sep 10, 2013

behlendorf commented Sep 10, 2013

lutorm commented Sep 27, 2013

lutorm commented Sep 27, 2013

lutorm commented Sep 27, 2013

behlendorf commented Oct 11, 2013