NULL pointer dereference in __mutex_unlock_slowpath #2939

bkus · 2014-11-28T14:45:21Z

[617214.295684] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[617214.295814] IP: [] __mutex_unlock_slowpath+0x25/0x40
[617214.295920] PGD 12dc65067 PUD 26afe7067 PMD 0
[617214.295998] Oops: 0000 [#1] SMP
[617214.296053] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfsd auth_rpcgss oid_registry nfs_acl nfs lockd sunrpc fscache binfmt_misc ext4

crc16 mbcache jbd2 iTCO_wdt iTCO_vendor_support intel_powerclamp radeon coretemp kvm_intel ttm drm_kms_helper kvm evdev drm psmouse serio_raw pcspkr

i2c_algo_bit hpilo i2c_core hpwdt lpc_ich i7core_edac mfd_core edac_core ipmi_si ipmi_msghandler acpi_power_meter button shpchp processor autofs4 zfs(PO)

zunicode(PO) zcommon(PO) znvpair(PO) zavl(PO) spl(O) sha256_ssse3 sha256_generic algif_skcipher af_alg dm_crypt dm_mod raid1 raid0 md_mod sd_mod crc_t10dif

ses enclosure crct10dif_generic sg sr_mod cdrom ata_generic hid_generic usbhid hid crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel

ghash_clmulni_intel ata_piix aesni_intel aes_x86_64 lrw
[617214.297440] gf128mul bfa glue_helper ablk_helper mpt2sas cryptd libata uhci_hcd ehci_pci scsi_transport_fc raid_class scsi_tgt ehci_hcd

scsi_transport_sas usbcore bnx2 thermal usb_common scsi_mod bna thermal_sys
[617214.297820] CPU: 5 PID: 4463 Comm: z_rd_int/0 Tainted: P IO 3.16.0-4-amd64 #1 Debian 3.16.7-2
[617214.297954] Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011
[617214.298042] task: ffff880c02aa2c20 ti: ffff880bbfca4000 task.ti: ffff880bbfca4000
[617214.298149] RIP: 0010:[] [] __mutex_unlock_slowpath+0x25/0x40
[617214.298281] RSP: 0018:ffff880bbfca7c88 EFLAGS: 00010217
[617214.298356] RAX: 0000000000000000 RBX: ffff88085a0d3040 RCX: 0000000000000000
[617214.298458] RDX: ffff88085a0d3048 RSI: 0000000000000246 RDI: ffff88085a0d3044
[617214.298576] RBP: 0000000000000000 R08: ffffffff8160dd48 R09: 0000000000000001
[617214.298704] R10: 0000000000014240 R11: 0000000000000010 R12: 0000000000000000
[617214.298806] R13: 0000000000200000 R14: ffff88085a0d3040 R15: 0000000000000000
[617214.298908] FS: 0000000000000000(0000) GS:ffff880c1fa40000(0000) knlGS:0000000000000000
[617214.299022] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[617214.299105] CR2: 0000000000000010 CR3: 00000004910b1000 CR4: 00000000000007e0
[617214.299207] Stack:
[617214.299237] ffff88085a0d2d30 ffffffffa0533cea ffffffffa037f607 0000000000000000
[617214.299361] ffff88085a0d3040 ffff88085a0d2d30 ffff8808d27a19d8 0000000000000000
[617214.299483] 0000000000000000 0000000000200000 ffff880c02aa2c20 ffff88085a0d2d30
[617214.299605] Call Trace:
[617214.299682] [] ? zio_done+0x61a/0xcf0 [zfs]
[617214.299773] [] ? spl_kmem_cache_free+0x137/0x3b0 [spl]
[617214.299877] [] ? zio_done+0x7a1/0xcf0 [zfs]
[617214.299974] [] ? spa_config_exit+0x69/0x90 [zfs]
[617214.300069] [] ? zio_done+0x7a1/0xcf0 [zfs]
[617214.300161] [] ? zio_done+0x7a1/0xcf0 [zfs]
[617214.300252] [] ? zio_wait_for_children+0x4e/0x60 [zfs]
[617214.300357] [] ? zio_execute+0xa7/0x140 [zfs]
[617214.300445] [] ? taskq_thread+0x224/0x490 [spl]
[617214.300534] [] ? wake_up_state+0x10/0x10
[617214.300614] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[617214.300708] [] ? kthread+0xbd/0xe0
[617214.300812] [] ? kthread_create_on_node+0x180/0x180
[617214.300908] [] ? ret_from_fork+0x7c/0xb0
[617214.300987] [] ? kthread_create_on_node+0x180/0x180
[617214.305465] Code: 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb c7 07 01 00 00 00 48 8d 7f 04 e8 e8 15 00 00 48 8b 43 08 48 8d 53 08 48 39 d0 74 09 <48> 8b

78 10 e8 e2 98 b8 ff 66 83 43 04 01 5b c3 66 66 2e 0f 1f
[617214.315137] RIP [] __mutex_unlock_slowpath+0x25/0x40
[617214.319811] RSP
[617214.324396] CR2: 0000000000000010
[617214.488943] ---[ end trace 63cdc141ec3acdb5 ]---

Version info:
[ 43.290429] ZFS: Loaded module v0.6.3-21~~7b2d78~~wheezy, ZFS pool version 5000, ZFS filesystem version 5

Using the debian daily build via apt-get. Refreshed about 4 days ago. Two zpools: root is on 3-way ZFS mirror composed of dm-crypt vdevs, mass data is on 24-way raidz3 composed of dm-crypt vdevs, which are in turn composed of single-drive md-raid0 arrays. Most IO activity was on the large zpool, so that's probably where this bug was triggered.

behlendorf · 2014-12-02T19:57:49Z

Thanks for filing this. This is a duplicate of #2523 which is being worked.

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #421

behlendorf · 2014-12-19T19:04:07Z

This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article.

openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races

Commit: openzfs/zfs@a3c1eb7 From: Chunwei Chen <tuxoko@gmail.com> Date: Fri, 19 Dec 2014 11:31:59 +0800 Subject: mutex: force serialization on mutex_exit() to fix races It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Backported-by: Darik Horn <dajhorn@vanadac.com> Closes #421 Conflicts: include/sys/mutex.h

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #421

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes openzfs#421 Conflicts: include/sys/mutex.h

behlendorf closed this as completed Dec 2, 2014

behlendorf added this to the 0.6.4 milestone Dec 2, 2014

behlendorf added Bug - Blocker labels Dec 2, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NULL pointer dereference in __mutex_unlock_slowpath #2939

NULL pointer dereference in __mutex_unlock_slowpath #2939

bkus commented Nov 28, 2014

behlendorf commented Dec 2, 2014

behlendorf commented Dec 19, 2014

NULL pointer dereference in __mutex_unlock_slowpath #2939

NULL pointer dereference in __mutex_unlock_slowpath #2939

Comments

bkus commented Nov 28, 2014

behlendorf commented Dec 2, 2014

behlendorf commented Dec 19, 2014