Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL pointer dereference in dva_get_dsize_sync #1891

Closed
nwf opened this issue Nov 21, 2013 · 7 comments
Closed

NULL pointer dereference in dva_get_dsize_sync #1891

nwf opened this issue Nov 21, 2013 · 7 comments
Milestone

Comments

@nwf
Copy link
Contributor

nwf commented Nov 21, 2013

Running ubuntu's nightly tree for saucy (i.e. spl 0.6.2-2saucy2.gbpc60af6
and zfs 0.6.2-2saucy2.gbp46f6df), I tripped over

[34969.434129] BUG: unable to handle kernel NULL pointer dereference at 0000000000000570
[34969.434175] IP: [<ffffffffa026a92d>] dva_get_dsize_sync+0x3d/0x50 [zfs]
[34969.434300] PGD 0 
[34969.434316] Oops: 0000 [#1] SMP 
[34969.434337] Modules linked in: 8021q(F) garp(F) stp(F) mrp(F) llc(F) psmouse(F) serio_raw(F) joydev(F) amd64_edac_mod i2c_amd756 amd_rng shpchp i2c_amd8111 k8temp edac_core edac_mce_amd mac_hid lp(F) parport(F) zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) zlib_deflate(F) hid_generic usbhid hid raid10(F) raid456(F) async_raid6_recov(F) async_memcpy(F) async_pq(F) async_xor(F) async_tx(F) xor(F) pata_acpi raid6_pq(F) e1000(F) raid1(F) pata_amd ohci_pci raid0(F) sata_mv multipath(F) linear(F)
[34969.434662] CPU: 1 PID: 603 Comm: z_wr_iss/1 Tainted: PF          O 3.11.0-13-generic #20-Ubuntu
[34969.434700] Hardware name: Sun Microsystems Sun Fire X4500/Sun Fire X4500, BIOS 0ABIG02 02/13/2009
[34969.434739] task: ffff8801f33a2ee0 ti: ffff8801f0e82000 task.ti: ffff8801f0e82000
[34969.434771] RIP: 0010:[<ffffffffa026a92d>]  [<ffffffffa026a92d>] dva_get_dsize_sync+0x3d/0x50 [zfs]
[34969.434838] RSP: 0018:ffff8801f0e83ce0  EFLAGS: 00010246
[34969.434865] RAX: 0000000000000000 RBX: 0000000000657375 RCX: ffff8803ec6f43e8
[34969.434940] RDX: 0000000000000000 RSI: 000000007065632e RDI: ffff8801f255c000
[34969.435011] RBP: ffff8801f0e83ce8 R08: 0000000000000000 R09: 000001c000000000
[34969.435080] R10: 0000012317f60000 R11: 0000000000000000 R12: 0000000000000000
[34969.435149] R13: ffff88017d6564b0 R14: ffff8801f255c000 R15: 0000000000000728
[34969.435221] FS:  00007f50e3b10700(0000) GS:ffff8801f7f00000(0000) knlGS:0000000000000000
[34969.435337] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[34969.435401] CR2: 0000000000000570 CR3: 00000001d1e73000 CR4: 00000000000007e0
[34969.435475] Stack:
[34969.435526]  0000000000000020 ffff8801f0e83d18 ffffffffa026a96c ffff88017d656340
[34969.435651]  ffffc9004a90cf80 ffff8801f255c000 ffff88031637d608 ffff8801f0e83d58
[34969.435773]  ffffffffa021dfd9 ffff88017971cbe0 ffff880314e2e970 ffff880232292c70
[34969.435892] Call Trace:
[34969.435967]  [<ffffffffa026a96c>] bp_get_dsize_sync+0x2c/0x50 [zfs]
[34969.436016]  [<ffffffffa021dfd9>] dbuf_write_ready+0x59/0x1a0 [zfs]
[34969.436016]  [<ffffffffa0215246>] arc_write_ready+0x26/0x90 [zfs]
[34969.436016]  [<ffffffffa02b709c>] zio_ready+0x6c/0x400 [zfs]
[34969.436016]  [<ffffffffa02b3a96>] zio_execute+0xa6/0x140 [zfs]
[34969.436016]  [<ffffffffa00ae6c7>] taskq_thread+0x237/0x4b0 [spl]
[34969.436016]  [<ffffffff81091420>] ? finish_task_switch+0x50/0xf0
[34969.436016]  [<ffffffff81094940>] ? wake_up_state+0x20/0x20
[34969.436016]  [<ffffffffa00ae490>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
[34969.436016]  [<ffffffff810847b0>] kthread+0xc0/0xd0
[34969.436016]  [<ffffffff810846f0>] ? kthread_create_on_node+0x120/0x120
[34969.436016]  [<ffffffff816f51ec>] ret_from_fork+0x7c/0xb0
[34969.436016]  [<ffffffff810846f0>] ? kthread_create_on_node+0x120/0x120
[34969.436016] Code: ff ff ff 00 48 c1 e3 09 48 85 db 74 0a 48 83 bf 78 09 00 00 00 75 07 48 89 d8 5b 5d c3 90 48 c1 ee 20 48 c1 eb 09 e8 d3 60 00 00 <48> 0f af 98 70 05 00 00 48 89 d8 5b 5d c3 0f 1f 44 00 00 66 66 
[34969.436016] RIP  [<ffffffffa026a92d>] dva_get_dsize_sync+0x3d/0x50 [zfs]
[34969.436016]  RSP <ffff8801f0e83ce0>
[34969.436016] CR2: 0000000000000570
[34969.438291] ---[ end trace 57a5902fd6f421a7 ]---

The machine is a Sun X4500 with 16G of RAM and the pool in question is a
RAIDZ1 on 11 spinning-rust 1TB disks with 2 hot spares.

Possibly relevant is that the pool underlies a ceph osd and that the dataset
had been filled with xattr=on and had just recently been flipped to
xattr=sa. The other three pools in the machine have had xattr similarly
changed and have yet to hit this, it seems, so it's not a certain thing.

If this is the wrong place for such a bug report (I did not immediately find
an Ubuntu-specific one) please flame gently. :)

@dweeezil
Copy link
Contributor

At first glance, this would certainly seem to indicate a corrupted pool. It looks like one of the DVAs refers to a bogus vdev or is just totally corrupted in some other way. Have you tried scrubbing the pool? Does zpool status show any checksum errrors?

@nwf
Copy link
Contributor Author

nwf commented Nov 24, 2013

scrubbing the pool panic'd the kernel in an (at least superficially) identical way. zpool status never complained, but it may not have gotten a chance to.

I have had to nuke the pool to get our machines back into production, but I can try to reproduce it on a different set of hardware if that'd be desirable.

@dweeezil
Copy link
Contributor

Does this machine have ECC memory? This is likely a case in which vdev_lookup_top() is returning NULL because the DVA has a bogus vdev id. I presume you ran into this problem across reboots which means the corruption is on-disk as opposed to in-memory. Unfortunately, tracking down this type of problem would likely require adding some debugging code and a bit of time.

@nwf
Copy link
Contributor Author

nwf commented Nov 28, 2013

The machine does have ECC, yes. And yes, this survived reboot, so it's an on-disk thing.

I don't have a minimal test case yet but the trigger seems to involve switching from xattr=on to xattr=sa with an already-riddled-with-xattrs filesystem and doing a lot more xattr work (we were using these nodes to back Ceph, which makes (indefensible, IMHO, but) extensive use of xattrs).

@eric0e
Copy link

eric0e commented Dec 17, 2013

I am seeing the same issues on one of my systems. I built ZFS from the git tree to support the 3.12.1 kernel running on Ubuntu 13.10. It reports during boot: ZFS: Loaded module v0.6.2-111_g119a394,

My system is a hp dl360G5 with ECC memory. Like nfw, the problem survives a reboot, and I am also using it for Ceph with the -o xattr=sa -o atime=off options. The ZFS file system is on a single SAS drive.

I tried the zpool scrub command on this pool, which hung the terminal session and I could not get out of it with a ctrl-c or ctrl-z. I can scrub other pools on this system.

zpool status shows:

pool: tca18_wwn-0x600508b1001034313320202020200008
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tca18_wwn-0x600508b1001034313320202020200008 ONLINE 0 0 0
wwn-0x600508b1001034313320202020200008 ONLINE 0 0 0

@Lalufu
Copy link
Contributor

Lalufu commented Mar 8, 2014

I've seen this today on a machine running 3.13.5-200.fc20.x86_64 with v0.6.2-195_g0ad85ed, one RAIDZ2 pool with 6 disks. The only significant recent change was that I enabled acltype=posixacl on a dataset and started using them (setfacl). The parent dataset uses xattr=sa. The machine does not have ECC.

The pool was imported successfully after a reset, scrub went through without any errors.

@utopiabound
Copy link
Contributor

I have just run across this issue testing current master with lustre master (http://review.whamcloud.com/#/c/8979/6). Crash occurred on MDS https://maloo.whamcloud.com/test_sets/c6c513c0-cacc-11e3-8f53-52540035b04c

11:41:03:BUG: unable to handle kernel NULL pointer dereference at 0000000000000560
11:41:03:IP: [<ffffffffa023e3b3>] dva_get_dsize_sync+0x53/0x70 [zfs]
11:41:03:PGD 0 
11:41:03:Oops: 0000 [#1] SMP 
11:41:03:last sysfs file: /sys/devices/system/cpu/online
11:41:03:CPU 0 
11:41:03:Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) nodemap(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: llog_test]
11:41:03:
11:41:03:Pid: 11388, comm: z_wr_iss/0 Tainted: P           ---------------    2.6.32-431.11.2.el6_lustre.gd151dcf.x86_64 #1 Red Hat KVM
11:41:03:RIP: 0010:[<ffffffffa023e3b3>]  [<ffffffffa023e3b3>] dva_get_dsize_sync+0x53/0x70 [zfs]
11:41:03:RSP: 0018:ffff880073d85cc0  EFLAGS: 00010206
11:41:03:RAX: 0000000000757274 RBX: 00000000eae4e800 RCX: 0000000000000000
11:41:03:RDX: 0000000000000000 RSI: 000000002e646574 RDI: ffff880079888000
11:41:04:RBP: ffff880073d85cd0 R08: ffffc90002c8e780 R09: 0000000000000000
11:41:04:R10: 0000000000000012 R11: 0000000000000377 R12: ffff88007d784b88
11:41:04:R13: 0000000000000000 R14: ffffc90002c8e780 R15: ffff880079888000
11:41:04:FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
11:41:04:CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
11:41:04:CR2: 0000000000000560 CR3: 000000007c075000 CR4: 00000000000006f0
11:41:04:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
11:41:04:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
11:41:04:Process z_wr_iss/0 (pid: 11388, threadinfo ffff880073d84000, task ffff88007d6eaae0)
11:41:04:Stack:
11:41:04: 0000000007fb8000 ffff880079888000 ffff880073d85d00 ffffffffa023e415
11:41:04:<d> 0000023073d85d20 ffff88007d784a18 ffff880042386a40 0000000000000400
11:41:04:<d> ffff880073d85d50 ffffffffa01f4ba2 ffff880073d85de0 ffff8800744a9af0
11:41:04:Call Trace:
11:41:04: [<ffffffffa023e415>] bp_get_dsize_sync+0x45/0x60 [zfs]
11:41:04: [<ffffffffa01f4ba2>] dbuf_write_ready+0x62/0x1a0 [zfs]
11:41:04: [<ffffffffa01ebd91>] arc_write_ready+0x31/0x90 [zfs]
11:41:04: [<ffffffffa028be3d>] zio_ready+0x6d/0x470 [zfs]
11:41:04: [<ffffffffa0288a0c>] ? zio_taskq_member+0x7c/0xc0 [zfs]
11:41:04: [<ffffffffa0288de3>] zio_execute+0xb3/0x140 [zfs]
11:41:04: [<ffffffffa013f317>] taskq_thread+0x1e7/0x3f0 [spl]
11:41:04: [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
11:41:04: [<ffffffffa013f130>] ? taskq_thread+0x0/0x3f0 [spl]
11:41:04: [<ffffffff8109aee6>] kthread+0x96/0xa0
11:41:04: [<ffffffff8100c20a>] child_rip+0xa/0x20
11:41:04: [<ffffffff8109ae50>] ? kthread+0x0/0xa0
11:41:04: [<ffffffff8100c200>] ? child_rip+0x0/0x20
11:41:04:Code: 28 09 00 00 00 48 89 d8 75 0e 48 83 c4 08 5b c9 c3 0f 1f 80 00 00 00 00 48 c1 ee 20 e8 37 60 00 00 48 89 c2 48 89 d8 48 c1 e8 09 <48> 0f af 82 60 05 00 00 48 83 c4 08 5b c9 c3 66 66 66 66 66 2e

behlendorf added a commit to behlendorf/zfs that referenced this issue May 5, 2014
The dva_get_dsize_sync() function incorrectly assumes that the call
to vdev_lookup_top() cannot fail.  However, the NULL dereference at
clearly shows that under certain circumstances it is possible.  Note
that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio.

  BUG: unable to handle kernel NULL pointer dereference at 00000570

  crash> struct -o vdev
  struct vdev {
       [0] uint64_t vdev_id;
       ... ...
    [1376] uint64_t vdev_deflate_ratio;

Given that this can happen this patch add the required error handling.
In the case where vdev_lookup_top() fails assume that no deflation
will occur for the DVA and use the asize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#1707
Issue openzfs#1987
Issue openzfs#1891
behlendorf added a commit to behlendorf/zfs that referenced this issue May 5, 2014
The dva_get_dsize_sync() function incorrectly assumes that the call
to vdev_lookup_top() cannot fail.  However, the NULL dereference at
clearly shows that under certain circumstances it is possible.  Note
that offset 0x570 (1376) maps as expected to vd->vdev_deflate_ratio.

  BUG: unable to handle kernel NULL pointer dereference at 00000570

  crash> struct -o vdev
  struct vdev {
       [0] uint64_t vdev_id;
       ... ...
    [1376] uint64_t vdev_deflate_ratio;

Given that this can happen this patch add the required error handling.
In the case where vdev_lookup_top() fails assume that no deflation
will occur for the DVA and use the asize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#1707
Issue openzfs#1987
Issue openzfs#1891
@behlendorf behlendorf modified the milestones: 0.6.3, 0.6.4 May 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants