createmany test failure #401

behlendorf · 2011-09-16T17:19:31Z

Andreas Dilger: to cause this I ran lustre/tests/createmany -m /ztest/test/foo 2000000 (approximately) to create the original reported number of free inodes, and then it still reported about 450k free inodes (I assume because of metadnode compression using less than 512 bytes/dnode), so I ran createmany -m /ztest/test/f 450000 and it is now hung at around 390000 files, with df -i reporting still 189000 inodes/ 94500kB free. The pool is only 1GB in size, so it is currently 91% full. I had it 100% full yesterday, though there were a lot more files with data - this was purely zero-length inodes.

Pid: 21433, comm: z_ioctl_int/0 Tainted: P 2.6.32.26-175.fc12.x86_64 #1 Dell DXP051                  
RIP: 0010:[] [] _spin_lock+0x1a/0x25
Call Trace:
[] ?  __mutex_unlock_slowpath+0x1a/0x3e
[] ? mutex_unlock+0x1b/0x1d
[] ? zio_done+0xa0a/0xa29 [zfs]
[] ? zio_done+0x9a8/0xa29 [zfs]
[] ? zio_execute+0xb1/0xd5 [zfs]
[] ? taskq_thread+0x271/0x2b3 [spl]
[] ?  default_wake_function+0x0/0x14
[] ? taskq_thread+0x0/0x2b3 [spl]
[] ? kthread+0x7f/0x87
[] ? child_rip+0xa/0x20
[] ? kthread+0x0/0x87
[] ? child_rip+0x0/0x20

adilger · 2011-09-16T19:12:11Z

The above stack trace was from the watchdog timer. Running sysrq-t the stack was slightly different:

z_ioctl_int/0 R running task 0 21433 2 0x00000088
Call Trace:
[] zio_done+0x9a8/0xa29 [zfs]
[] zio_execute+0xb1/0xd5 [zfs]
[] taskq_thread+0x271/0x2b3 [spl]
[] ? default_wake_function+0x0/0x14
[] ? taskq_thread+0x0/0x2b3 [spl]
[] kthread+0x7f/0x87
[] child_rip+0xa/0x20
[] ? kthread+0x0/0x87
[] ? child_rip+0x0/0x20

so at this instant it didn't appear to be in any locking primitives, but still busy spinning 100% CPU. Other threads that had interesting stacks include:

z_rd_int/0 R running task 0 212992 0x00000080
Call Trace:
[] ? zio_destroy+0x91/0x95 [zfs]
[] taskq_thread+0x101/0x2b3 [spl]
[] ? default_wake_function+0x0/0x14
[] ? taskq_thread+0x0/0x2b3 [spl]
[] kthread+0x7f/0x87
[] child_rip+0xa/0x20
[] ? kthread+0x0/0x87
[] ? child_rip+0x0/0x20

txg_sync D ffffc90012775880 0 214372 0x00000080
Call Trace:
[] __mutex_lock_common+0x12d/0x194
[] __mutex_lock_slowpath+0x19/0x1b
[] mutex_lock+0x36/0x50
[] cv_wait_common+0x7a/0xb9 [spl]
[] ? autoremove_wake_function+0x0/0x39
[] __cv_wait+0x13/0x15 [spl]
[] zio_wait+0xf6/0x123 [zfs]
[] vdev_uberblock_sync_list+0xb4/0xcd [zfs]
[] vdev_config_sync+0x10c/0x136 [zfs]
[] spa_sync+0x68f/0x85d [zfs]
[] ? autoremove_wake_function+0x16/0x39
[] ? __wake_up+0x44/0x4d
[] txg_sync_thread+0x1c5/0x33e [zfs]
[] ? txg_sync_thread+0x0/0x33e [zfs]
[] thread_generic_wrapper+0x61/0x6b [spl]
[] ? thread_generic_wrapper+0x0/0x6b [spl]
[] kthread+0x7f/0x87
[] child_rip+0xa/0x20
[] ? kthread+0x0/0x87
[] ? child_rip+0x0/0x20

createmany D ffff88000c4681c0 0 2218425584 0x00000084
Call Trace:
[] ? try_to_wake_up+0x33b/0x34d
[] cv_wait_common+0x72/0xb9 [spl]
[] ? autoremove_wake_function+0x0/0x39
[] __cv_wait+0x13/0x15 [spl]
[] txg_wait_open+0x79/0x99 [zfs]
[] dmu_tx_wait+0xa4/0xad [zfs]
[] zfs_create+0x391/0x5f3 [zfs]
[] zpl_create+0x9d/0xc9 [zfs]
[] vfs_create+0x78/0x9a
[] sys_mknodat+0x143/0x1fd
[] ? apic_timer_interrupt+0xe/0x20
[] ? path_put+0x22/0x26
[] ? audit_syscall_entry+0x11e/0x14a
[] sys_mknod+0x1a/0x1c
[] system_call_fastpath+0x16/0x1b

behlendorf · 2011-10-18T23:57:02Z

It appears the 100% cpu time can be attributed to the global virtual address space spin lock used by vmalloc(). Althought which exact call path is to blame wasn't clear.

gunnarbeutner · 2011-10-20T17:31:29Z

zio_wait() hangs because the ZIO it's waiting for just doesn't complete. Most likely it has been set to ZIO_REEXECUTE_SUSPEND by zio_done().

This can have several reasons:

pool faulted
- top-level vdevs inaccessible
- uncorrectable read/write error
pool full
- metaslab allocator returned ENOSPC (which is most likely what happened here)
possibly some bug(s) we haven't identified yet

Using 'zpool events' you should see a number of I/O failures like this (this one was generated by running the 'createmany' test until the zpool was completely filled - printks confirm that metaslab_alloc_dva returned ENOSPC in my tests):

root@zt:~# zpool events
TIME                           CLASS
Oct 20 2011 19:10:39.037314000 resource.fs.zfs.statechange
Oct 20 2011 19:10:39.368225000 resource.fs.zfs.statechange
Oct 20 2011 19:11:31.923079000 ereport.fs.zfs.data
Oct 20 2011 19:11:31.923155000 ereport.fs.zfs.data
Oct 20 2011 19:11:31.923177000 ereport.fs.zfs.data
Oct 20 2011 19:11:31.923226000 ereport.fs.zfs.data
Oct 20 2011 19:11:31.923266000 ereport.fs.zfs.data
Oct 20 2011 19:11:31.923434000 ereport.fs.zfs.data
Oct 20 2011 19:11:31.923485000 ereport.fs.zfs.data
Oct 20 2011 19:11:31.980107000 ereport.fs.zfs.data
Oct 20 2011 19:11:32.042742000 ereport.fs.zfs.io_failure
root@zt:~#

The txg_wait_*() hang is just a secondary effect because it's waiting for a TXG to complete which just won't happen because one of the I/Os the TXG depends on has been suspended.

behlendorf · 2014-10-03T23:40:12Z

This is very plausibly the same as #2523 and is particularly interesting because it includes a simple reproducer.

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #421

behlendorf · 2014-12-19T18:59:58Z

This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article.

openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races

Commit: openzfs/zfs@a3c1eb7 From: Chunwei Chen <tuxoko@gmail.com> Date: Fri, 19 Dec 2014 11:31:59 +0800 Subject: mutex: force serialization on mutex_exit() to fix races It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Backported-by: Darik Horn <dajhorn@vanadac.com> Closes #421 Conflicts: include/sys/mutex.h

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #421

ryao · 2015-02-10T18:38:19Z

@behlendorf This is not a duplicate of #2523. Instead, it is a duplicate of #3091.

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes openzfs#421 Conflicts: include/sys/mutex.h

Because we can have concurrent "read block" requests for the same block, we can also have concurrent accesses to the same block of the zettacache. In particular, we may have 2 "read block"s of the same block, each of which will lookup() in the zettacache, find the entry not there, and then insert() to the zettacache. This can lead to several problems, especially in subtle cases like where one of the insert()'s has been flushed to the index before the 2nd insert() is called. This PR solves this issue by adding per-block locking to the zettacache. A failed lookup() will return with the entry locked, blocking concurrent lookup()'s of the same block until the first thread either insert()'s the value or drop()'s the LockedKey.

)" (openzfs#401) This reverts commit 40ca9f1.

behlendorf added Bug - Blocker and removed Bug labels Oct 3, 2014

behlendorf closed this as completed Dec 19, 2014

ryao mentioned this issue Feb 10, 2015

ZFS lockup 0.6.3.1 #3091

Closed

sdimitro pushed a commit to sdimitro/zfs that referenced this issue May 23, 2022

Revert "Update makefile to use cargo dependency information (openzfs#397

fc17e95

)" (openzfs#401) This reverts commit 40ca9f1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

createmany test failure #401

createmany test failure #401

behlendorf commented Sep 16, 2011

adilger commented Sep 16, 2011

behlendorf commented Oct 18, 2011

gunnarbeutner commented Oct 20, 2011

behlendorf commented Oct 3, 2014

behlendorf commented Dec 19, 2014

ryao commented Feb 10, 2015

createmany test failure #401

createmany test failure #401

Comments

behlendorf commented Sep 16, 2011

adilger commented Sep 16, 2011

behlendorf commented Oct 18, 2011

gunnarbeutner commented Oct 20, 2011

behlendorf commented Oct 3, 2014

behlendorf commented Dec 19, 2014

ryao commented Feb 10, 2015